2

I want to compare two csv files and print the differences in a file. I currently use the code below to remove a row. Can I change this code so that it compares two csv files or is there a better way in c# to compare csv files?

  List<string> lines = new List<string>();
        using (StreamReader reader = new StreamReader(System.IO.File.OpenRead(path)))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                if (line.Contains(csvseperator))
                {
                     string[] split = line.Split(Convert.ToChar(scheidingsteken));

                    if (split[selectedRow] == value)
                    {

                    }
                    else
                    {
                        line = string.Join(csvseperator, split);
                        lines.Add(line);
                    }
                }

            }
        }

        using (StreamWriter writer = new StreamWriter(path, false))
        {
            foreach (string line in lines)
                writer.WriteLine(line);
        }
    }
5
  • 3
    If you want to find out added, deleted and changed lines, please have a look at the edit distance en.wikipedia.org/wiki/Edit_distance Commented Oct 11, 2017 at 12:48
  • I can't use that. Commented Oct 11, 2017 at 13:07
  • 2
    Why are you so sad? Why can't you use it? The easiest edit distance (Levenshtein one) is easy to implement en.wikipedia.org/wiki/Levenshtein_distance Commented Oct 11, 2017 at 13:09
  • You really shouldn't use empty if blocks in your code. Changing the condition solves this issue. Commented Oct 11, 2017 at 13:15
  • 2
    What do you want your program to output when two CSV files contain exactly the same data, but in a different order? Also, do records need to match 100%? Or is 1,Pete,2 equal to 1,"Pete",2? Commented Oct 11, 2017 at 13:30

2 Answers 2

3

Here is another way to find differences between CSV files, using Cinchoo ETL - an open source library

For the below sample CSV files

sample1.csv

id,name
1,Tom
2,Mark
3,Angie

sample2.csv

id,name
1,Tom
2,Mark
4,Lu

METHOD 1:

Using Cinchoo ETL, below code shows how to find differences between rows by all columns

var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray();
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray();

using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader())
{
    output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default));
    output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default));
}

sampleDiff.csv

id,name
3,Angie
4,Lu

Sample fiddle: https://dotnetfiddle.net/nwLeJ2

METHOD 2:

If you want to do the differences by id column,

var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray();
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray();

using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader())
{
    output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
    output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
}

Sample fiddle: https://dotnetfiddle.net/t6mmJW

Sign up to request clarification or add additional context in comments.

4 Comments

This is a great tool. Thank you. In my case I have a "master" CSV file and a "detail" CSV file. How can I do the same above where CSV 1 is the master and CSV 2 is the detail to ouput a file saying record '3' has been "deleted" and record "4" is new? Obviously a new column to show this status. Could you please add an example to your answer? Oh and my CSVs have a unique ID column like your samples. Appreciated.
@Chinchoo Where does the result file get stored to?
can be routed to file, stream etc.
0

If you only want to compare one column you can use this code:

                List<string> lines = new List<string>();
    List<string> lines2 = new List<string>();



    try
    {
        StreamReader reader = new StreamReader(System.IO.File.OpenRead(pad));
        StreamReader read = new StreamReader(System.IO.File.OpenRead(pad2));

        string line;
        string line2;

        //With this you can change the cells you want to compair
        int comp1 = 1;
        int comp2 = 1;

        while ((line = reader.ReadLine()) != null && (line2 = read.ReadLine()) != null)
        {           
            string[] split = line.Split(Convert.ToChar(seperator));
            string[] split2 = line2.Split(Convert.ToChar(seperator));

            if (line.Contains(seperator) && line2.Contains(seperator))
            {
                if (split[comp1] != split2[comp2])
                {
                    //It is not the same
                }
                else
                {
                    //It is the same

                }
            }
        }
        reader.Dispose();
        read.Dispose();
    }
    catch
    {

    }

2 Comments

This only checks the 2nd column of each line, and ignores lines if one CSV contains more lines than the other.
How can I fix this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.