I'm reading data from files (such as CSV and Excel) and need to ensure that each row in the file is unique.
Each row will be represented as an object[]. This cannot be changed due to current architecture. Each object in this array can be of different types (decimal, string, int etc).
A file can look like this:
foo 1 5 // Not unique
bar 1 5
bar 2 5
foo 1 5 // Not unique
A file is likely to have 200.000+ rows and 4-100 columns.
The code I have right now looks like this:
IList<object[]> rows = new List<object[]>();
using (var reader = _deliveryObjectReaderFactory.CreateReader(deliveryObject))
{
// Read the row.
while (reader.Read())
{
// Get the values from the file.
var values = reader.GetValues();
// Check uniqueness for row
foreach (var row in rows)
{
bool rowsAreDifferent = false;
// Check uniqueness for column.
for (int i = 0; i < row.Length; i++)
{
var earlierValue = row[i];
var newValue = values[i];
if (earlierValue.ToString() != newValue.ToString())
{
rowsAreDifferent = true;
break;
}
}
if(!rowsAreDifferent)
throw new Exception("Rows are not unique");
}
rows.Add(values);
}
}
So, my question, can this be done more efficiently? Such as using hashes, and check uniqueness of the hash instead?
due to current architecture