0

I know there is more similar question but I was not able to find the answer to mine. I have two CSV files. Both files contain image metadata for the same images, however, the first file image IDs are outdated. So I need to take the IDs from the second file and replace outdated IDs with new ones. I was thinking to compare image Longitude, Latitude, and Altitude rows values, and where it matches in both files I take image id from the second file. The IDs would be used in the new object. And the sequence of lines in files is different and the first file contains more lines than the second one.

The files structure looks as follows:

First file:

ImgID,Longitude,Latitude,Altitude
01,44.7282372307,27.5786807185,14.1536407471
02,44.7287939869,27.5777060219,13.2340240479
03,44.7254687824,27.582636255,16.5887145996
04,44.7254294913,27.5826908925,16.5794525146
05,44.728785278,27.5777185252,13.2553100586
06,44.7282279311,27.5786933339,14.1576690674
07,44.7253847039,27.5827526969,16.6026000977
08,44.7287777782,27.5777295052,13.2788238525
09,44.7282196988,27.5787045314,14.1649169922
10,44.7253397041,27.5828151049,16.6300048828
11,44.728769439,27.5777417846,13.3072509766

Second file:

ImgID,Longitude,Latitude,Altitude
5702,44.7282372307,27.5786807185,14.1536407471
5703,44.7287939869,27.5777060219,13.2340240479
5704,44.7254687824,27.582636255,16.5887145996
5705,44.7254294913,27.5826908925,16.5794525146
5706,44.728785278,27.5777185252,13.2553100586
5707,44.7282279311,27.5786933339,14.1576690674

How this can be done in C#? Is there is some handy library to work with?

8
  • Do you have any code to go along with this scenario or are you in a planning phase? Commented Aug 6, 2020 at 18:49
  • Your question is very vague. What exactly don't you know how to do? As for libraries, I would just use System.IO.StreamReader to read files and string.Split(',') to get separated values. Commented Aug 6, 2020 at 18:51
  • How big are your files, generally speaking? Can they be GB+, or are they small enough for you to afford to read them all to memory? Commented Aug 6, 2020 at 19:09
  • They can be read in memory Commented Aug 6, 2020 at 19:20
  • Regarding a good library for CSV I recommend CSVHelper (which you can add to your project from NuGet). Regarding the question itself there are ambiguous things you need to clarify. 1) Do you want to simply replace IDs in old data with IDs in new data where other properties match? 2) If not, do you want to combine both, and replace IDs in old data? 3) What if there are coordinates in old data that has no matching coordinates in new data? In that case do you leave the old ID as it is? Commented Aug 6, 2020 at 19:22

2 Answers 2

2

I would use the CSVHelper library for CSV read/write as it is a complete nice library. For this, you should declare a class to hold your data, and its property names must match your CSV file's column names.

public class ImageData
{
    public int ImgID { get; set; }
    public double Longitude { get; set; }
    public double Latitude { get; set; }
    public double Altitude { get; set; }
}

Then to see if two lines are equal, what you need to do is see if each property in each line in one file matches the other. You could do this by simply comparing properties, but I'd rather write a comparer for this, like so:

public class ImageDataComparer : IEqualityComparer<ImageData>
{
    public bool Equals(ImageData x, ImageData y)
    {
        return (x.Altitude == y.Altitude && x.Latitude == y.Latitude && x.Longitude == y.Longitude);
    }

    public int GetHashCode(ImageData obj)
    {
        unchecked
        {
            int hash = (int)2166136261;
            hash = (hash * 16777619) ^ obj.Altitude.GetHashCode();
            hash = (hash * 16777619) ^ obj.Latitude.GetHashCode();
            hash = (hash * 16777619) ^ obj.Longitude.GetHashCode();
            return hash;
        }
    }
}

Simple explanation is that we override the Equals() method and dictate that two instances of ImageData class are equal if the three property values are matching. I will show the usage in a bit.

The CSV read/write part is pretty easy (the library's help page has some good examples and tips, please read it). I can write two methods for reading and writing like so:

public static List<ImageData> ReadCSVData(string filePath)
{
    List<ImageData> records;
    using (var reader = new StreamReader(filePath))
    {
        using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
        {
            csv.Configuration.HasHeaderRecord = true;
            records = csv.GetRecords<ImageData>().ToList();
        }
    }
    return records;
}

public static void WriteCSVData(string filePath, List<ImageData> records)
{
    using (var writer = new StreamWriter(filePath))
    {
        using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
        {
            csv.WriteRecords(records);
        }
    }
}

You can actually write generic <T> read/write methods so the two methods are usable with different classes, if that's something useful for you.

Next is the crucial part. First, read the two files to memory using the methods we just defined.

var oldData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"));
var newData = ReadCSVData(Path.Combine(Directory.GetCurrentDirectory(), "NewFile.csv"));

Now, I can go through each line in the 'old' data, and see if there's a corresponding record in 'new' data. If so, I grab the ID from the new data and replace the ID of old data with it. Notice the usage of the comparer we wrote.

foreach (var line in oldData)
{
    var replace = newData.FirstOrDefault(x => new ImageDataComparer().Equals(x, line));
    if (replace != null && replace.ImgID != line.ImgID)
    {
        line.ImgID = replace.ImgID;
    }
}

Next, simply overwrite the old data file.

WriteCSVData(Path.Combine(Directory.GetCurrentDirectory(), "OldFile.csv"), oldData);

Results

I'm using a simplified version of your data to easily verify our results.

Old Data

ImgID,Longitude,Latitude,Altitude
1,1,2,3
2,2,3,4
3,3,4,5
4,4,5,6
5,5,6,7
6,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13

New Data

ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8

Now our expected results should be that the first 6 lines of the old files should have the ids updated, and that's what we get:

Updated Old Data

ImgID,Longitude,Latitude,Altitude
5702,1,2,3
5703,2,3,4
5704,3,4,5
5705,4,5,6
5706,5,6,7
5707,6,7,8
7,7,8,9
8,8,9,10
9,9,10,11
10,10,11,12
11,11,12,13
Sign up to request clarification or add additional context in comments.

Comments

2

An alternate way to do it, if for some reason you didn't want to use the CSVHelper, is to write a method that compares two lines of data and determines if they're equal (by ignoring the first column data):

public static bool DataLinesAreEqual(string first, string second)
{
    if (first == null || second == null) return false;
    var xParts = first.Split(',');
    var yParts = second.Split(',');
    if (xParts.Length != 4 || yParts.Length != 4) return false;
    return xParts.Skip(1).SequenceEqual(yParts.Skip(1));
}

Then we can read all the lines from both files into arrays, and then we can update our first file lines with those from the second file if our method says they're equal:

var csvPath1 = @"c:\temp\csvData1.csv";
var csvPath2 = @"c:\temp\csvData2.csv";

// Read lines from both files
var first = File.ReadAllLines(csvPath1);
var second = File.ReadAllLines(csvPath2);

// Select the updated line where necessary
var updated = first.Select(f => second.FirstOrDefault(s => DataLinesAreEqual(f, s)) ?? f);

// Write the updated result back to the first file
File.WriteAllLines(csvPath1, updated);

2 Comments

Thank you very much for your help. I used the way with CSVHelper library.
Good, I think that's the best way to go... There are a lot of potential problems dealing with CSV files. Better not to reinvent the wheel!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.