0

I need to build a method to enhance one csv file with values from another. This method would need to:

  • take the "original" csv file
  • for each row from its column 0, look up for a matching record in column 0 of "enhancement" csv file
  • If there is a match, then for this row the record in column 1 of "original" file will get overwritten by corresponding record in column 1 of the "enhancement" file

I 'm trying the below pattern, which seems workable - but it is so slow that I'm not even able to check it. The size of the files should not be an issue, because one is 1MB, another 2MB, but I'm definitely taking some wrong assumptions to do this efficiently. What would be a better way of doing this?

public static string[] LoadReadyCsv()
        {
            string[] scr = System.IO.File.ReadAllLines(@Path...CsvScr);
            string[] aws = System.IO.File.ReadAllLines(@Path...CsvAws);
            Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");

            foreach (var s in scr)
            {
                string[] fieldsScr = CSVParser.Split(s);

                foreach (var a in aws)
                {
                    string[] fieldsAws = CSVParser.Split(a);

                    if (fieldsScr[0] == fieldsAws[0])
                    {
                        fieldsScr[1] = fieldsAws[1];
                    }
                }
            }

            return scr;
        }

EDIT: I add an example below, as requested

"Original file"

ean, skunum, prodname
111, empty, bread
222, empty, cheese

"Enhancement file"

ean, skunum, prodname
111, 555, foo
333, 444, foo

New "Original file"

ean,skunum,prodname
111, 555, bread
222, empty, cheese
9
  • What do you mean by matching record? What is the condition for overwriting - Value exists/ is greater? Commented Nov 28, 2015 at 11:34
  • Enhance will basically mean "overwrite" this record under any condition. Commented Nov 28, 2015 at 11:35
  • There is a reason for that complex regex expression? Help me understand what are you trying to achieve with that. Commented Nov 28, 2015 at 11:36
  • Can you give us a sample csv and enhancer csv and the expected output. A very short one will also help us understand. Commented Nov 28, 2015 at 11:37
  • 1
    Test it on smaller files, like the ones shown in your example. Commented Nov 28, 2015 at 13:08

1 Answer 1

1

You can read the csv using Oledb and load into a datatable. Then you can modify table and update which will save results back to file. Use code below

public class CSVReader
    {

        public DataSet ReadCSVFile(string fullPath, bool headerRow)
        {

            string path = fullPath.Substring(0, fullPath.LastIndexOf("\\") + 1);
            string filename = fullPath.Substring(fullPath.LastIndexOf("\\") + 1);
            DataSet ds = new DataSet();

            try
            {
                if (File.Exists(fullPath))
                {
                    string ConStr = string.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0}" + ";Extended Properties=\"Text;HDR={1};FMT=Delimited\\\"", path, headerRow ? "Yes" : "No");
                    string SQL = string.Format("SELECT * FROM {0}", filename);
                    OleDbDataAdapter adapter = new OleDbDataAdapter(SQL, ConStr);
                    adapter.Fill(ds, "TextFile");
                    ds.Tables[0].TableName = "Table1";
                }
                foreach (DataColumn col in ds.Tables["Table1"].Columns)
                {
                    col.ColumnName = col.ColumnName.Replace(" ", "_");
                }
            }

            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
            }
            return ds;
        }
    }​

To modify the two datatables use linq

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            DataColumn col = null;

            DataTable original = new DataTable();
            col = original.Columns.Add("ean", typeof(int));
            col.AllowDBNull = true;
            col = original.Columns.Add("skunum", typeof(int));
            col.AllowDBNull = true;
            col = original.Columns.Add("prodname", typeof(string));
            col.AllowDBNull = true;

            original.Rows.Add(new object[] {111, null, "bread"});
            original.Rows.Add(new object[] {222, null, "cheese"});

            DataTable enhancement = new DataTable();
            col = enhancement.Columns.Add("ean", typeof(int));
            col.AllowDBNull = true;
            col = enhancement.Columns.Add("skunum", typeof(int));
            col.AllowDBNull = true;
            col = enhancement.Columns.Add("prodname", typeof(string));
            col.AllowDBNull = true;

            enhancement.Rows.Add(new object[] {111, 555, "foo"});
            enhancement.Rows.Add(new object[] {333, 444, "foo"});

            var joinedObject = (from o in original.AsEnumerable()
                                join e in enhancement.AsEnumerable() on o.Field<int>("ean") equals e.Field<int>("ean")
                                select new { original = o, enhancement = e }).ToList();

            foreach (var row in joinedObject)
            {
                row.original["skunum"] = row.enhancement["skunum"];
                row.original["prodname"] = row.enhancement["prodname"];
            }
        }
    }
}
​
Sign up to request clarification or add additional context in comments.

3 Comments

thank you I upvote your answer for suggestion of using OleDB, which I need to try when I get some time. In the meantime the original solution works for me - it is slow, but so far I can afford it, I just switched from Regex to VisualBasic TextFieldParser library
I updated code to show how to use Linq Join to modify original table. You can the use the datatable update method to save back to original csv file.
I finally got around to try the OLEDB, this worked very good and speeded up the query by multiple times. Data sets now and becoming my good friend for data manipulation. Thank you jdweng!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.