Is it better to call ToList() or ToArray() in LINQ queries?
I often run into the case where I want to eval a query right where I declare it. This is usually because I need to iterate over it multiple times and it is expensive to compute. For example:
string raw = "...";
var lines = (from l in raw.Split('n')
let ll = l.Trim()
where !string.IsNullOrEmpty(ll)
select ll).ToList();
This works fine. But if I am not going to modify the result, then I might as well call ToArray()
instead of ToList()
.
I wonder however whether ToArray() is implemented by first calling ToList() and is therefore less memory efficient than just calling ToList().
Am I crazy? Should I just call ToArray()
- safe and secure in the knowledge that the memory won't be allocated twice?
Unless you simply need an array to meet other constraints you should use ToList
. In the majority of scenarios ToArray
will allocate more memory than ToList
.
Both use arrays for storage, but ToList
has a more flexible constraint. It needs the array to be at least as large as the number of elements in the collection. If the array is larger, that is not a problem. However ToArray
needs the array to be sized exactly to the number of elements.
To meet this constraint ToArray
often does one more allocation than ToList
. Once it has an array that is big enough it allocates an array which is exactly the correct size and copies the elements back into that array. The only time it can avoid this is when the grow algorithm for the array just happens to coincide with the number of elements needing to be stored (definitely in the minority).
EDIT
A couple of people have asked me about the consequence of having the extra unused memory in the List<T>
value.
This is a valid concern. If the created collection is long lived, is never modified after being created and has a high chance of landing in the Gen2 heap then you may be better off taking the extra allocation of ToArray
up front.
In general though I find this to be the rarer case. It's much more common to see a lot of ToArray
calls which are immediately passed to other short lived uses of memory in which case ToList
is demonstrably better.
The key here is to profile, profile and then profile some more.
The performance difference will be insignificant, since List<T>
is implemented as a dynamically sized array. Calling either ToArray()
(which uses an internal Buffer<T>
class to grow the array) or ToList()
(which calls the List<T>(IEnumerable<T>)
constructor) will end up being a matter of putting them into an array and growing the array until it fits them all.
If you desire concrete confirmation of this fact, check out the implementation of the methods in question in Reflector -- you'll see they boil down to almost identical code.
(seven years later...)
A couple of other (good) answers have concentrated on microscopic performance differences that will occur.
This post is just a supplement to mention the semantic difference that exists between the IEnumerator<T>
produced by an array ( T[]
) as compared to that returned by a List<T>
.
Best illustrated with by example:
IList<int> source = Enumerable.Range(1, 10).ToArray(); // try changing to .ToList()
foreach (var x in source)
{
if (x == 5)
source[8] *= 100;
Console.WriteLine(x);
}
The above code will run with no exception and produces the output:
1 2 3 4 5 6 7 8 900 10
This shows that the IEnumarator<int>
returned by an int[]
does not keep track on whether the array has been modified since the creation of the enumerator.
Note that I declared the local variable source
as an IList<int>
. In that way I make sure the C# compiler does not optimze the foreach
statement into something which is equivalent to a for (var idx = 0; idx < source.Length; idx++) { /* ... */ }
loop. This is something the C# compiler might do if I use var source = ...;
instead. In my current version of the .NET framework the actual enumerator used here is a non-public reference-type System.SZArrayHelper+SZGenericArrayEnumerator`1[System.Int32]
but of course this is an implementation detail.
Now, if I change .ToArray()
into .ToList()
, I get only:
1 2 3 4 5
followed by a System.InvalidOperationException
blow-up saying:
Collection was modified; enumeration operation may not execute.
The underlying enumerator in this case is the public mutable value-type System.Collections.Generic.List`1+Enumerator[System.Int32]
(boxed inside an IEnumerator<int>
box in this case because I use IList<int>
).
In conclusion, the enumerator produced by a List<T>
keeps track on whether the list changes during enumeration, while the enumerator produced by T[]
does not. So consider this difference when choosing between .ToList()
and .ToArray()
.
People often add one extra .ToArray()
or .ToList()
to circumvent a collection that keeps track on whether it was modified during the life-time of an enumerator.
(If anybody wants to know how the List<>
keeps track on whether collection was modified, there is a private field _version
in this class which is changed everytime the List<>
is updated.)