C# – Why can’t IEnumerator’s be cloned

.netc++ienumerable

In implementing a basic Scheme interpreter in C# I discovered, to my horror, the following problem:

IEnumerator doesn't have a clone method! (or more precisely, IEnumerable can't provide me with a "cloneable" enumerator).

What I'd like:

interface IEnumerator<T>
{
    bool MoveNext();
    T Current { get; }
    void Reset();
    // NEW!
    IEnumerator<T> Clone();
}

I cannot come up with an implementation of IEnumerable that would not be able to supply an efficiently cloneable IEnumerator (vectors, linked lists, etc. all would be able to provide a trivial implementation of IEnumerator's Clone() as specified above… it would be easier than providing a Reset() method anyway!).

The absence of the Clone method means that any functional/recursive idiom of enumerating over a sequence won't work.

It also means I can't "seamlessly" make IEnumerable's behave like Lisp "lists" (for which you use car/cdr to enumerate recursively). i.e. the only implemention of "(cdr some IEnumerable)" would be woefully inefficient.

Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?

Can anyone suggest a workaround?

Best Solution

The logic is inexorable! IEnumerable doesn't support Clone, and you need Clone, so you shouldn't be using IEnumerable.

Or more accurately, you shouldn't be using it as the fundamental basis for work on a Scheme interpreter. Why not make a trivial immutable linked list instead?

public class Link<TValue>
{
    private readonly TValue value;
    private readonly Link<TValue> next;

    public Link(TValue value, Link<TValue> next)
    {
        this.value = value;
        this.next = next;
    } 

    public TValue Value 
    { 
        get { return value; }
    }

    public Link<TValue> Next 
    {
        get { return next; }
    }

    public IEnumerable<TValue> ToEnumerable()
    {
        for (Link<TValue> v = this; v != null; v = v.next)
            yield return v.value;
    }
}

Note that the ToEnumerable method gives you convenient usage in the standard C# way.

To answer your question:

Can anyone suggest a realistic, useful, example of an IEnumerable object that wouldn't be able to provide an efficient "Clone()" method? Is it that there'd be a problem with the "yield" construct?

An IEnumerable can go anywhere in the world for its data. Here's an example that reads lines from the console:

IEnumerable<string> GetConsoleLines()
{
    for (; ;)
        yield return Console.ReadLine();
}

There are two problems with this: firstly, a Clone function would not be particularly straightforward to write (and Reset would be meaningless). Secondly, the sequence is infinite - which is perfectly allowable. Sequences are lazy.

Another example:

IEnumerable<int> GetIntegers()
{
    for (int n = 0; ; n++)
        yield return n;
}

For both these examples, the "workaround" you've accepted would not be much use, because it would just exhaust the available memory or hang up forever. But these are perfectly valid examples of sequences.

To understand C# and F# sequences, you need to look at lists in Haskell, not lists in Scheme.

In case you think the infinite stuff is a red herring, how about reading the bytes from a socket:

IEnumerable<byte> GetSocketBytes(Socket s)
{
    byte[] buffer = new bytes[100];
    for (;;)
    {
        int r = s.Receive(buffer);
        if (r == 0)
            yield break;

        for (int n = 0; n < r; n++)
            yield return buffer[n];       
    }
}

If there is some number of bytes being sent down the socket, this will not be an infinite sequence. And yet writing Clone for it would be very difficult. How would the compiler generate the IEnumerable implementation to do it automatically?

As soon as there was a Clone created, both instances would now have to work from a buffer system that they shared. It's possible, but in practice it isn't needed - this isn't how these kinds of sequences are designed to be used. You treat them purely "functionally", like values, applying filters to them recursively, rather than "imperatively" remembering a location within the sequence. It's a little cleaner than low-level car/cdr manipulation.

Further question:

I wonder, what's the lowest level "primitive(s)" I would need such that anything I might want to do with an IEnumerable in my Scheme interpreter could be implemented in scheme rather than as a builtin.

The short answer I think would be to look in Abelson and Sussman and in particular the part about streams. IEnumerable is a stream, not a list. And they describe how you need special versions of map, filter, accumulate, etc. to work with them. They also get onto the idea of unifying lists and streams in section 4.2.