What is the fastest / most efficient way of getting all the distinct items from a list?
I have a List<string> that possibly has multiple repeating items in it and only want the unique values within the list.
You can use the Distinct method to return an IEnumerable<T> of distinct items:
var uniqueItems = yourList.Distinct();
And if you need the sequence of unique items returned as a List<T>, you can add a call to ToList:
var uniqueItemsList = yourList.Distinct().ToList();
yourList.Distinct().ToList() requires two full iterations over the enumerable, and additionally is based off IEqualityComparer, which is slower than GetHashCode.Use a HashSet<T>. For example:
var items = "A B A D A C".Split(' ');
var unique_items = new HashSet<string>(items);
foreach (string s in unique_items)
Console.WriteLine(s);
prints
A B D C
Apart from the Distinct extension method of LINQ, you could use a HashSet<T> object that you initialise with your collection. This is most likely more efficient than the LINQ way, since it uses hash codes (GetHashCode) rather than an IEqualityComparer).
In fact, if it's appropiate for your situation, I would just use a HashSet for storing the items in the first place.
HashSet won't maintain any ordering, which may or may not be an issue for the OP.Distinct...Distinct should/does iterate the list in order (although I'm not sure if that's actually guaranteed in any spec).HashSet is the way to go if you want good performance.In .Net 2.0 I`m pretty sure about this solution:
public IEnumerable<T> Distinct<T>(IEnumerable<T> source)
{
List<T> uniques = new List<T>();
foreach (T item in source)
{
if (!uniques.Contains(item)) uniques.Add(item);
}
return uniques;
}
source contains 100,000 items with many duplicates, then in every one of the 100,000 iterations you will be scanning a list on the order of 100,000 items, meaning you are scanning on the order of 100,000 * 100,000 items. Quadratic time complexity can become quite slow.
["A", "B", "C", "C", "D", "D"], unique items would return["A","B"], whereas distinct items would return["A", "B", "C", "D"].