How to convert tuple list to list tuple?

Question

For example, I have an IEnumerable<(int, char)> list. How to convert list into (IEnumerable<int>, IEnumerable<char>)?

Is there a fast way to do this? It would be better to work with System.Linq.

As a side note, calling a IEnumerable<(int, char)> "list" might be confusing. Lists are materialized collections in .NET. An IEnumerable<(int, char)> might well be a deferred enumerable sequence, containing elements that are not stored in the RAM, and instead they are fetched/generated one by one while the sequence is enumerated. A better name for the list variable would be sequence. — Theodor Zoulias
– Theodor Zoulias, Commented Aug 6, 2022 at 9:03

Enigmativity · Accepted Answer · 2022-08-05 06:31:59Z

3

It's quite simple with Aggregate:

IEnumerable<(int, char)> list = new[]
{
    (1, 'a'), (2, 'b'),
};

(List<int> ints, List<char> chars) =
    list.Aggregate((new List<int>(), new List<char>()), (a, x) =>
    {
        a.Item1.Add(x.Item1);
        a.Item2.Add(x.Item2);
        return a;
    });

That gives:

That's the fastest way, but this is simpler:

List<int> ints = list.Select(x => x.Item1).ToList();
List<char> chars = list.Select(x => x.Item2).ToList();

answered Aug 5, 2022 at 6:31

Enigmativity

117k12 gold badges101 silver badges184 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

BrandonStudio Over a year ago

Hmm, I don't think this is better than a foreach block

Yong Shun Over a year ago

Sorry Enigmativity, it's my bad, you are using ValueTuple, so it is fine.

Klaus Gütter Over a year ago

I wonder if there is a way to do this without having to materialize the source enumerable. Will probably require some custom enumerator.

Enigmativity Over a year ago

@KlausGütter - Isn't this exactly what the enumerator is for? And Aggregate only runs one, so it's fairly efficient.

Klaus Gütter Over a year ago

@Enigmativity what I mean is: Enumerable.Aggregate fully enumerates the source (runs through all the elements) before it returns its result. This would strictly speaking not be necessary to construct the desired tuple of projected enumerables. It is not as efficient as it could be in the case where the source enumerable contains lots of elements (which might be expecive to enumerate) but the user intends to enumerate only part of the result enumerables.

Matthew Watson · Accepted Answer · 2022-08-05 08:22:26Z

2

There are two issues to consider:

You don't want to iterate over the input more than once.
You want to size the returned lists to the correct length when creating them if possible, to avoid multiple list resizing.

To efficiently find the length of an IEnumerable<T> you can use the .NET 6 Enumerable.TryGetNonEnumeratedCount().

Note that of course this will not work for some IEnumerable types, but it will work in many cases.

Also note that for small list sizes, calling Enumerable.TryGetNonEnumeratedCount() will likely make things slower, since a default-sized list would probably already be big enough to prevent resizing.

A method using this would look something like this:

public static (IEnumerable<T>, IEnumerable<U>) Deconstruct<T,U>(IEnumerable<(T,U)> sequence)
{
    List<T> listT;
    List<U> listU;

    if (sequence.TryGetNonEnumeratedCount(out int count))
    {
        listT = new List<T>(count);
        listU = new List<U>(count);
    }
    else
    {
        listT = new List<T>();
        listU = new List<U>();
    }

    foreach (var item in sequence)
    {
        listT.Add(item.Item1);
        listU.Add(item.Item2);
    }

    return (listT, listU);
}

This code isn't very elegant because there's no short way of writing the code to initialise the lists to the correct size. But it is probably about as efficient as you are likely to get.

You could possibly make it slightly more performant by returning arrays rather than lists if you know the count:

public static (IEnumerable<T>, IEnumerable<U>) Deconstruct<T,U>(IEnumerable<(T,U)> sequence)
{
    if (sequence.TryGetNonEnumeratedCount(out int count))
    {
        var arrayT = new T[count];
        var arrayU = new U[count];

        int i = 0;

        foreach (var item in sequence)
        {
            arrayT[i] = item.Item1;
            arrayU[i] = item.Item2;
            ++i;
        }

        return (arrayT, arrayU);
    }
    else
    {
        var listT = new List<T>();
        var listU = new List<U>();

        foreach (var item in sequence)
        {
            listT.Add(item.Item1);
            listU.Add(item.Item2);
        }

        return (listT, listU);
    }
}

I would only go to such lengths if performance testing indicated that it's worth it!

edited Aug 5, 2022 at 8:22

answered Aug 5, 2022 at 8:05

Matthew Watson

111k12 gold badges179 silver badges301 bronze badges

3 Comments

BrandonStudio Over a year ago

That's inspiring. I've got another question for you: will it bring apparent extra costs if I use foreach (var (t, u) in sequence) instead?

Matthew Watson Over a year ago

I don't think using foreach (var (t, u) in sequence) will noticeably slow things down, (and may even make it faster) but if in doubt use Benchmark.Net to test it!

BrandonStudio Over a year ago

OK, I wondered because that will call the Deconstruct method with 2 out parameters.

Klaus Gütter · Accepted Answer · 2022-08-06 10:42:08Z

0

If the original is a materialized collection like List<(int, char)> or (int, char)[] you can do the following:

var result = (list.Select(i => i.Item1), list.Select(i => i.Item2));

If the original is just an IEnumerable<(int, char)>, you should convert it to a List first (otherwise the source will get enumerated twice):

var list = source.ToList();

There are cases where this (and all other answers up to now):

does not work at all: when the source is an infinite sequence
or is inefficient: if the source sequence is big, but you intend to enumerate only a few elements of each of the result enumerables

If this is of no concern for the use case given, stop reading here.

It is possible to overcome this restriction with some implementation effort. Basically, the "derived enumerables" have to be implemented in a way that they request just the required items from the source enumerable and no more.

The following solution uses a class TupleEnumerable to fetch only the required elements from the the source and remembering the fetched elements for use by the two derived enumerables.

public class TupleEnumerable<T1, T2> : IDisposable
{
    readonly IEnumerator<(T1, T2)> _source;
    readonly List<(T1, T2)> _preFetched = new();
    private bool _finished;

    public TupleEnumerable(IEnumerable<(T1, T2)> source)
    {
        _source = source.GetEnumerator();
    }

    public void Dispose()
    {
        _source.Dispose();
        _preFetched.Clear();
        _finished = true;
    }

    // Try to get the element if it already has been fetched
    // or otherwise use the source enumerator to fetch more.
    private bool TryGet(int index, out (T1, T2) tuple)
    {
        if (index < _preFetched.Count)
        {
            tuple = _preFetched[index];
            return true;
        }

        if (_finished)
        {
            tuple = default;
            return false;
        }

        _finished = !_source.MoveNext();
        if (_finished)
        {
            Console.WriteLine("**Source finished");
            tuple = default;
            return false;
        }
        Console.WriteLine($"**Source: {_source.Current}");

        _preFetched.Add(_source.Current);
        tuple = _source.Current;
        return true;
    }

    // This method returns a tuple of "derived" enumerables
    public (IEnumerable<T1>, IEnumerable<T2>) GetEnumerables()
        => (new ProjectedEnumerable<T1>(this, t => t.Item1),
            new ProjectedEnumerable<T2>(this, t => t.Item2));

    // This is our own implementation of IEnumerator<T>
    class ProjectedEnumerable<T> : IEnumerable<T>
    {
        private readonly TupleEnumerable<T1, T2> _tupleEnumerable;
        private readonly Func<(T1, T2), T> _projection;

        public ProjectedEnumerable(TupleEnumerable<T1, T2> tupleEnumerable, Func<(T1, T2), T> projection)
        {
            _tupleEnumerable = tupleEnumerable;
            _projection = projection;
        }

        public IEnumerator<T> GetEnumerator()
        {
            return new ProjectedEnumerator<T>(_tupleEnumerable, _projection);
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }

    // This is our own implementation of IEnumerable<T>
    class ProjectedEnumerator<T> : IEnumerator<T>
    {
        private readonly TupleEnumerable<T1, T2> _tupleEnumerable;
        private readonly Func<(T1, T2), T> _projection;
        private int _index;
        private T _current;

        public ProjectedEnumerator(TupleEnumerable<T1, T2> tupleEnumerable, Func<(T1, T2), T> projection)
        {
            _tupleEnumerable = tupleEnumerable;
            _projection = projection;
        }

        public bool MoveNext()
        {
            if (_tupleEnumerable.TryGet(_index, out var current))
            {
                _current = _projection(current);
                _index++;
                return true;
            }
            else
            {
                _current = default;
                return false;
            }
        }

        public void Reset()
        {
            _index = 0;
            _current = default;
        }

        public T Current => _current;

        object IEnumerator.Current => Current;

        public void Dispose()
        {
        }
    }
}

Usage:

IEnumerable<(int, char)> list = new[]
{
    (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')
};

using var c = new TupleEnumerable<int, char>(list);
var (enumerable1, enumerable2) = c.GetEnumerables();

Note: As Theodor Zoulia pointed out in the comments: the semantics of the TupleEnumerable<T1, T2> is different from a standard enumerable. Enumerating a TupleEnumerable<T1, T2> any number of times, will result in a single enumeration of the underlying source. It effectively doubles as a memoizer.

edited Aug 6, 2022 at 10:42

answered Aug 5, 2022 at 6:07

Klaus Gütter

12.2k7 gold badges35 silver badges43 bronze badges

5 Comments

BrandonStudio Over a year ago

This solution calls select twice, is it efficient?

Klaus Gütter Over a year ago

As list is assumed to be already materialized: yes. The Select is just a trivial member access to the tuple member in memory.

Theodor Zoulias Over a year ago

What's the problem that the TupleEnumerable<T1, T2> is attempting to solve? If the idea is to avoid storing everything in a List<(T1, T2)>, you are doing it anyway in the _preFetched field. You are just delaying the inevitable IMHO.

Klaus Gütter Over a year ago

@TheodorZoulias If the source enumerable can deliver one million elements and enumerating each element takes 1 ms, but in the end you want to enumerate only the first 5 elements of each of the derived enumerables, this makes a difference of 1000 seconds vs. 5 ms. So indeed, it makes no difference if the usage pattern is to complete enumerate everything, but that is not neccessarily so.

Theodor Zoulias Over a year ago

It should be noted that the semantics of the TupleEnumerable<T1, T2> are different from a standard enumerable. Enumerating a TupleEnumerable<T1, T2> any number of times, will result in a single enumeration of the underlying source. It effectively doubles as a memoizer.

Collectives™ on Stack Overflow

How to convert tuple list to list tuple?

3 Answers 3

5 Comments

3 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related