C# – IEnumerable vs List – What to Use? How do they work

cienumerablelinqlist

I have some doubts over how Enumerators work, and LINQ. Consider these two simple selects:

List<Animal> sel = (from animal in Animals 
                    join race in Species
                    on animal.SpeciesKey equals race.SpeciesKey
                    select animal).Distinct().ToList();

or

IEnumerable<Animal> sel = (from animal in Animals 
                           join race in Species
                           on animal.SpeciesKey equals race.SpeciesKey
                           select animal).Distinct();

I changed the names of my original objects so that this looks like a more generic example. The query itself is not that important. What I want to ask is this:

foreach (Animal animal in sel) { /*do stuff*/ }
  1. I noticed that if I use IEnumerable, when I debug and inspect "sel", which in that case is the IEnumerable, it has some interesting members: "inner", "outer", "innerKeySelector" and "outerKeySelector", these last 2 appear to be delegates. The "inner" member does not have "Animal" instances in it, but rather "Species" instances, which was very strange for me. The "outer" member does contain "Animal" instances. I presume that the two delegates determine which goes in and what goes out of it?

  2. I noticed that if I use "Distinct", the "inner" contains 6 items (this is incorrect as only 2 are Distinct), but the "outer" does contain the correct values. Again, probably the delegated methods determine this but this is a bit more than I know about IEnumerable.

  3. Most importantly, which of the two options is the best performance-wise?

The evil List conversion via .ToList()?

Or maybe using the enumerator directly?

If you can, please also explain a bit or throw some links that explain this use of IEnumerable.

Best Answer

IEnumerable describes behavior, while List is an implementation of that behavior. When you use IEnumerable, you give the compiler a chance to defer work until later, possibly optimizing along the way. If you use ToList() you force the compiler to reify the results right away.

Whenever I'm "stacking" LINQ expressions, I use IEnumerable, because by only specifying the behavior I give LINQ a chance to defer evaluation and possibly optimize the program. Remember how LINQ doesn't generate the SQL to query the database until you enumerate it? Consider this:

public IEnumerable<Animals> AllSpotted()
{
    return from a in Zoo.Animals
           where a.coat.HasSpots == true
           select a;
}

public IEnumerable<Animals> Feline(IEnumerable<Animals> sample)
{
    return from a in sample
           where a.race.Family == "Felidae"
           select a;
}

public IEnumerable<Animals> Canine(IEnumerable<Animals> sample)
{
    return from a in sample
           where a.race.Family == "Canidae"
           select a;
}

Now you have a method that selects an initial sample ("AllSpotted"), plus some filters. So now you can do this:

var Leopards = Feline(AllSpotted());
var Hyenas = Canine(AllSpotted());

So is it faster to use List over IEnumerable? Only if you want to prevent a query from being executed more than once. But is it better overall? Well in the above, Leopards and Hyenas get converted into single SQL queries each, and the database only returns the rows that are relevant. But if we had returned a List from AllSpotted(), then it may run slower because the database could return far more data than is actually needed, and we waste cycles doing the filtering in the client.

In a program, it may be better to defer converting your query to a list until the very end, so if I'm going to enumerate through Leopards and Hyenas more than once, I'd do this:

List<Animals> Leopards = Feline(AllSpotted()).ToList();
List<Animals> Hyenas = Canine(AllSpotted()).ToList();