Skip to content

Different lists

Published:

We all know lists. Most if not all pro­gram­ming lan­guages pro­vide an im­ple­men­ta­tion of a list-​like data struc­ture. Lists may be im­ple­mented in some dif­fer­ent ways, for in­stance as singly-​linked lists or ar­rays. Dif­fer­ent con­crete data struc­tures may be used under the name list, as long as the im­ple­men­ta­tion rep­re­sents a fi­nite num­ber of or­dered val­ues and pro­vide some com­mon list op­er­a­tions.

As we’ll see, dif­fer­ent im­ple­men­ta­tions yield highly dif­fer­ent per­for­mance in some op­er­a­tions. The de­ci­sion on which con­crete data struc­ture to use as a list is also cou­pled with the lan­guage de­sign it­self. I’ll com­pare lists in two lan­guages: Elixir and Go.

Lists in Go (slices)

Let’s im­ple­ment a sim­ple bench­mark func­tion:

package list_fun

import (
	"fmt"
	"math/rand/v2"
	"testing"
)

func createList(length int) []int {
	list := make([]int, 0)
	for i := 0; i < length; i++ {
		list = append(list, i)
	}
	return list
}

func BenchmarkSlice(b *testing.B) {
	lengths := []int{10_000, 100_000, 1_000_000}
	for _, length := range lengths {
		list := createList(length)
		idx := rand.Perm(length)[:100]
		b.Run(fmt.Sprintf("random access %d", length), func(b *testing.B) {
			var _ int
			for _, i := range idx {
				_ = list[i]
			}
		})
		list = createList(length)
		b.Run(fmt.Sprintf("iterate %d", length), func(b *testing.B) {
			var _ int
			for i := 0; i < len(list); i++ {
				_ = list[i]
			}
		})
		list = createList(length)
		b.Run(fmt.Sprintf("iterate reverse %d", length), func(b *testing.B) {
			var _ int
			for i := len(list) - 1; i >= 0; i-- {
				_ = list[i]
			}
		})
		list = createList(length)
		b.Run(fmt.Sprintf("append %d", length), func(b *testing.B) {
			for i := 0; i < 100; i++ {
				list = append(list, i)
			}
		})
		list = createList(length)
		b.Run(fmt.Sprintf("prepend %d", length), func(b *testing.B) {
			for i := 0; i < 100; i++ {
				list = append([]int{i}, list...)
			}
		})
	}
}

The bench­mark did not run in an iso­lated en­vi­ron­ment (e.g., a server with the least pos­si­ble amount of run­ning processes), so num­bers should be taken with a grain of salt. These are the re­sults:

                                 │ benchmark.log  │
                                 │     sec/op     │
Slice/random_access_10000-16       0.01805n ± 39%
Slice/iterate_10000-16              0.2885n ±  2%
Slice/iterate_reverse_10000-16      0.2545n ± 22%
Slice/append_10000-16              0.02655n ± 25%
Slice/prepend_10000-16               84.29n ± 26%
Slice/random_access_100000-16      0.01700n ± 71%
Slice/iterate_100000-16              3.048n ± 25%
Slice/iterate_reverse_100000-16      2.432n ± 29%
Slice/append_100000-16             0.02505n ± 24%
Slice/prepend_100000-16              740.1n ± 22%
Slice/random_access_1000000-16     0.02105n ± 24%
Slice/iterate_1000000-16             26.45n ± 23%
Slice/iterate_reverse_1000000-16     22.20n ±  6%
Slice/append_1000000-16            0.02905n ± 21%
Slice/prepend_1000000-16             9.463µ ± 11%

A re­fresher on time com­plex­ity

The time com­plex­ity of an al­go­rithm or data struc­ture op­er­a­tion refers to how the time it takes to ex­e­cute such task scales as the input size grows. Typ­i­cally, the worse-​case time com­plex­ity is con­sid­ered.

This re­la­tion­ship is com­monly ex­pressed using big O no­ta­tion. All con­stant fac­tors are dropped. For in­stance, if the time a task takes doesn’t de­pend on input size at all, it has an O(1) or con­stant time com­plex­ity. If the time scales lin­early with the input size, it has lin­ear time com­plex­ity or O(N), where N is the input size.

Even though we drop all con­stant fac­tors, they should not be for­got­ten. An O(1) al­go­rithm can be slower than an O(N) one due to these con­stant fac­tors. How­ever, as input size grows, the O(N) al­go­rithm will even­tu­ally be­come slower.

Time com­plex­ity is im­por­tant to un­der­stand how well an al­go­rithm scales. Some time com­plex­i­ties can re­sult in pro­hib­i­tively slow per­for­mance (such as O(N!), that’s N fac­to­r­ial). Here, we’ll use big O no­ta­tion to grasp the strengths and weak­ness of each list im­ple­men­ta­tion.

Back to Go slices

It­er­at­ing over the list scales roughly lin­early with input size. Ran­dom ac­cess doesn’t scale with input size. This is be­cause Go lists, called slices, are im­ple­mented as point­ers to ar­rays. Since ar­rays are stored in con­tigu­ous mem­ory block, we can infer the exact mem­ory lo­ca­tion of all el­e­ments. More­over, ar­rays may store point­ers to val­ues, al­low­ing the pro­gram to fol­low these point­ers and read the val­ues. As a re­sult, ran­dom ac­cess doesn’t de­pend on the size of the array, mak­ing it O(1) and quite fast.

How­ever, prepend­ing an el­e­ment to the list is dif­fer­ent. Since ar­rays have fixed sizes, it’s not pos­si­ble to prepend or ap­pend val­ues with­out copy­ing the en­tire array. This means that each prepend op­er­a­tion runs in O(N) time, where N is the size of the array.

But why is ap­pend­ing at least one order of mag­ni­tude faster than prepend­ing? Then an­swer lies in Go’s slice im­ple­men­ta­tion using dy­namic ar­rays. As the doc­u­men­ta­tion ex­plains, the slice con­sists of a pointer to an array, the length of the array and the ca­pac­ity. The draw­ing below il­lus­trates how Go slices work under the hood:

Illustration of Go slices

When there’s no ca­pac­ity left, the append func­tion cre­ates a new array with spare ca­pac­ity, that is, with some empty val­ues at the right end. This al­lows it to sim­ply set the ap­pended value in mem­ory and in­crease the slice length. There­fore, most ap­pend op­er­a­tions hap­pen in O(1) time, oc­ca­sion­ally re­quir­ing a new array cre­ation in O(N) time.

This choice of list im­ple­men­ta­tion is com­mon across lan­guages like Python, C, Rust, C++, Java, and oth­ers. Dy­namic ar­rays over­come the lim­i­ta­tion of fixed size ar­rays, but usu­ally still use an array to store the val­ues. This de­sign is often in­flu­enced by his­tor­i­cal fac­tors and the need for ef­fi­cient ran­dom ac­cess op­er­a­tions.

Go’s choice of im­ple­ment­ing lists as dy­namic ar­rays re­flects its im­per­a­tive pro­gram­ming na­ture. Constant-​time ran­dom ac­cess al­lows pro­grams to read from or set a value at spe­cific po­si­tions ef­fi­ciently, mak­ing Go con­ducive to writ­ing code that re­lies heav­ily on these op­er­a­tions.

But it doesn’t have to be like this. Other pro­gram­ming lan­guages take a dif­fer­ent ap­proach to lists, which we’ll ex­plore next.

Lists in Elixir

Elixir is a func­tional lan­guage that runs on the BEAM vir­tual ma­chine, which comes from the Er­lang world. The List mod­ule pro­vides a singly-​linked list im­ple­men­ta­tion, which is ex­plained in the doc­u­men­ta­tion. The draw­ing below rep­re­sents Elixir Lists:

Illustration of Elixir Lists

Each entry in the list has a value (the yel­low square) and a pointer to the next entry. The list can be rep­re­sented as head and tail pairs. In the draw­ing, the out­er­most rec­tan­gle rep­re­sents the first pair of head and tail. The head con­sists of the first value, while the tail con­tains the rest of the list. After fol­low­ing the pointer to the next value, we can once again rep­re­sent the re­main­ing of the list as a head and tail pair (the gray rec­tan­gle). This can be done re­cur­sively until the end of the list.

Let’s write some sim­ple func­tions to bench­mark some com­mon list op­er­a­tions. We’ll use Benchee to bench­mark our func­tions.

Benchee.run(
  %{
    "random access" => fn input ->
      Enum.to_list(1..100) |> Enum.shuffle() |> Enum.map(&Enum.at(input, &1))
    end,
    "iterate" => fn input ->
      for n <- input, do: n
    end,
    "iterate reverse" => fn input ->
      for n <- Enum.reverse(input), do: n
    end,
    "prepend" => fn input ->
      Enum.reduce(Enum.to_list(1..100), input, fn x, acc -> [x | acc] end)
    end,
    "append" => fn input ->
      Enum.reduce(Enum.to_list(1..100), input, fn x, acc -> acc ++ [x] end)
    end
  },
  inputs: %{
    "small" => Enum.to_list(1..100),
    "medium" => Enum.to_list(1..1_000),
    "large" => Enum.to_list(1..10_000)
  }
)

Re­sults:

##### With input large #####
Name                      ips        average  deviation         median         99th %
prepend             1316.27 K        0.76 μs  ±4792.55%        0.65 μs        1.04 μs
random access         50.60 K       19.76 μs    ±61.95%       19.43 μs       25.04 μs
iterate                7.62 K      131.23 μs     ±5.51%      131.00 μs      149.42 μs
iterate reverse        6.53 K      153.17 μs     ±9.10%      153.32 μs      180.12 μs
append                 0.45 K     2229.11 μs     ±5.03%     2218.95 μs     2385.88 μs

Comparison:
prepend             1316.27 K
random access         50.60 K - 26.01x slower +19.00 μs
iterate                7.62 K - 172.73x slower +130.47 μs
iterate reverse        6.53 K - 201.61x slower +152.41 μs
append                 0.45 K - 2934.12x slower +2228.35 μs

##### With input medium #####
Name                      ips        average  deviation         median         99th %
prepend             1321.36 K        0.76 μs  ±4824.20%        0.65 μs        1.03 μs
iterate              109.81 K        9.11 μs   ±104.18%        8.84 μs       13.44 μs
iterate reverse       91.57 K       10.92 μs    ±57.64%        9.97 μs       19.54 μs
random access         50.40 K       19.84 μs    ±19.88%       19.52 μs       25.23 μs
append                 7.80 K      128.17 μs    ±17.34%      125.64 μs      157.04 μs

Comparison:
prepend             1321.36 K
iterate              109.81 K - 12.03x slower +8.35 μs
iterate reverse       91.57 K - 14.43x slower +10.16 μs
random access         50.40 K - 26.22x slower +19.08 μs
append                 7.80 K - 169.36x slower +127.41 μs

##### With input small #####
Name                      ips        average  deviation         median         99th %
prepend             1277.00 K        0.78 μs  ±4718.68%        0.65 μs        1.09 μs
iterate              979.84 K        1.02 μs  ±2432.19%        0.92 μs        1.36 μs
iterate reverse      867.27 K        1.15 μs  ±2445.71%        1.04 μs        1.55 μs
append                56.23 K       17.78 μs    ±11.47%       17.05 μs       26.01 μs
random access         50.28 K       19.89 μs    ±43.72%       19.54 μs       25.26 μs

Comparison:
prepend             1277.00 K
iterate              979.84 K - 1.30x slower +0.24 μs
iterate reverse      867.27 K - 1.47x slower +0.37 μs
append                56.23 K - 22.71x slower +17.00 μs
random access         50.28 K - 25.40x slower +19.11 μs

Prepend and it­er­at­ing are the fastest op­er­a­tions. It­er­at­ing is quite sim­ple, you start with the head value and fol­low the tail pointer to the next head re­cur­sively.

Ran­dom ac­cess, how­ever, is slow. Since the list is not stored in con­tigu­ous mem­ory block, ac­cess­ing an el­e­ment at a spe­cific index re­quires it­er­at­ing over the list up to that index. This re­sults in lin­ear time com­plex­ity (O(N)). Sim­i­larly, ap­pend­ing el­e­ments to the end of a list re­quires find­ing the last el­e­ment by it­er­at­ing over the list. This op­er­a­tion also has a time com­plex­ity of O(N).

In fact, the Er­lang doc­u­men­ta­tion ex­plic­itly ad­vises against ap­pend­ing el­e­ments to the end of a list re­cur­sively. Ap­pend­ing cre­ates an en­tirely new copy of the list. It would be pos­si­ble to ap­pend with­out copy­ing, but this would break im­mutabil­ity, which we’ll ex­plore below. This be­hav­ior is com­pletely dif­fer­ent from what we saw ear­lier in Go, and it is so by de­sign.

Elixir and Er­lang are func­tional lan­guages and, as such, are very con­ducive to the use of re­cur­sion and higher-​order func­tions, which makes the use of linked lists more suit­able than ar­rays. When call­ing a re­cur­sive func­tion with a list as ar­gu­ment, the func­tion may use the head value and pass the tail as an ar­gu­ment to the next call. Ar­rays also pro­vide ef­fi­cient it­er­a­tion over all el­e­ments, but they fail in a very im­por­tant as­pect when it comes to functional-​style pro­gram­ming.

All data struc­tures in Elixir are im­mutable by de­fault. Singly-​linked lists are friend­lier to­wards im­mutabil­ity than ar­rays. Prepend­ing an el­e­ment does not re­quire mov­ing any val­ues in mem­ory; it only re­quires cre­at­ing the new head and point it to the tail (the en­tire list prior to the prepend op­er­a­tion). Since all val­ues are im­mutable, you don’t need to worry about any of the other val­ues chang­ing. There­fore, prepends can al­ways be ex­e­cuted in O(1) time.

Im­mutable ar­rays would have quite poor per­for­mance since every ap­pend or prepend op­er­a­tion would re­quire copy­ing the en­tire array to a new mem­ory lo­ca­tion. If you con­struct an en­tire array one el­e­ment at a time the time com­plex­ity be­comes O(N²), since N O(N) op­er­a­tions are done. It doesn’t take a huge num­ber of ap­pends be­fore im­mutable ar­rays be­come im­prac­ti­cally slow.

Slow ran­dom ac­cess is also not a major con­cern since Elixir code is usu­ally much more de­clar­a­tive than a sim­i­lar Go code, for in­stance. In­stead of writ­ing loops and mu­tat­ing list el­e­ments di­rectly, you use higher-​order func­tions such as map, filter and reduce to achieve the same re­sult. Also, due to im­mutabil­ity there is no point in using ran­dom ac­cess to mu­tate val­ues. Thus, by de­sign Elixir code re­quires ran­dom ac­cess much less often.

Con­clu­sion

I guess the con­clu­sion is that dif­fer­ent things are dif­fer­ent. Se­ri­ously though, the mo­ti­va­tion of this post was the 2023 edi­tion of Ad­vent of Code. I solved some puz­zles with Elixir and oth­ers with Go (or Python). The list is sim­ply a very fun­da­men­tal ab­stract data struc­ture that we take for granted in al­most all pro­gram­ming lan­guages. It still has major im­ple­men­ta­tion dif­fer­ences that re­flect the de­sign of each lan­guage. The in­flu­ence of pro­gram­ming lan­guage de­sign on code goes be­yond hard lim­i­ta­tions. You can­not mu­tate a value in Elixir, but you could still make mod­i­fied copies and some­how im­ple­ment an imperative-​ish so­lu­tion. Should you do it, though? Of course not, the code would be cum­ber­some and un­hinged.

Per­haps un­sur­pris­ingly, solv­ing the puz­zles in Elixir re­quires a think­ing process that is dis­tinct of that of using Go. That’s why I used the word con­ducive a few times in this post: even with­out hard lim­i­ta­tions, the pro­gram­ming lan­guage de­sign in­flu­ences the code you’ll write, and we should be aware of the major de­sign choice dif­fer­ences.


Previous Post
Regression to the mean
Next Post
Como criptografar seu computador com LUKS e TPM + senha