4

When trying to merge multiple .csv files from a directory into one .csv file using CSVhelper. In the directory there are 50 .csv files, among these 50 files there are two sets of file structures, one with 7 columns and one with 6. Every file has the exact same first 5 headers however depending on the file the last two columns will change.

Example of CSVfile format 1: enter image description here

Example of CSVfile format 2: enter image description here

Every file in the directory will hold either of these structures with different data in the columns. The output of the new file will have data from all the columns bar Action, Code and Error Message. If i use files only with the structure of example 1 the file comes together perfectly. However, if i include files with both structures and try and use 'ErrorIPAddress' from example 2 in my new file i get the following error:

An unhandled exception of type 'CsvHelper.TypeConversion.CsvTypeConverterException' occurred in CsvHelper.dll

On this line: `IEnumerable dataRecord = reader.GetRecords().ToList();

My question is: How to use columns from one file thats not in the other? I have tried mapping it with the following:Map(m => m.ErrorIPAddress).Index(5); and i believe this is the line causing me the issue as if i comment it out the error doesn't persist however obviously i won't get the data i need into the new .csv. If i try and map by name with: Map( m => m.ErrorIPAddress ).Name( "ErrorIPAddress" ); I get the error message that ErrorIPAddress is not in the .csv file which it won't be as not all files have that column.

Output .csv format:enter image description here

The final column will be generated by the ErrorIPAddress column in format 2.

5
  • I suggest you separate the problem of reading different CSV formats from the problem of how to merge different data sets. For reading the CSV files, create 2 methods. Each of these methods is responsible to read only one type of CSV. (The merging of the CSV should not be part of these two methods). For each CSV file, try reading it with one method. If this method fails, try reading it with the second method. If you have read the relevant CSV files, use another method (you will have to implement) that will merge the data sets according to your requirements.. Commented May 7, 2017 at 12:29
  • What column format should the resulting merged CSV have? Commented May 7, 2017 at 12:40
  • @robaudas I have added to the question. The column will be generated by a simple count on ErrorIPaddress in example 2. Commented May 7, 2017 at 12:51
  • Ok I'm confused, you aren't merging CSV files. You are reading two similar CSV files and creating a report. Are you trying to write a single block of code that will handle multiple file formats and that is what is causing you problems? Commented May 7, 2017 at 12:55
  • @robaudas Ah okay, apologies. I'm taking two similar .csv files and taking info from both to make one .csv file. Similar to a report, condensing the data Commented May 7, 2017 at 12:58

2 Answers 2

2

I'm assuming you are using a single class definition with all the fields that looks something like this:

public class StudentWebAccess
{
    public int StudentID { get; set; }
    public string Gender { get; set; }
    public int Grade { get; set; }        
    public int IPAddress { get; set; } // Also ErrorIPAddress?
    public DateTime DateTime { get; set; }
    public string Action { get; set; }
    public string Code { get; set; } // Also ErrorMessage?
}

So in order to read file format 2 you are using CsvClassMap but aren't matching the properties and field names correctly. It should look something like this:

public class CsvFile2Map : CsvClassMap<StudentWebAccess>
{
    public CsvFile2Map()
    {            
        Map(m => m.IPAddress).Name("ErrorIPAddress");
        Map(m => m.Code).Name("ErrorMessage");
    }
}

If your class file uses ErrorIPAddress instead of IPAddress you have to reverse the mapping.

Map(m => m.ErrorIPAddress).Name("IPAddress");
Sign up to request clarification or add additional context in comments.

4 Comments

this was more the answer i was looking for. However can i ask some questions and make a few comments. Firstly, code in file format one isn't the same as error message. i have my class dataRecord for the headers contained in format 1 - very similar to what you've provided. I also have a custom map taking the values i want from the .csv file. Does this mean i need to create something similar to format 2? With a new class with all the fields contained in format 2 and a new map similar to the above?
point to note as well, i need to keep IP addresses from both files seperate as they are being used to calculate 2 diff columns in new file. IP address from format 1 will be for requests and IP address on format 2 will be for errors
I feel like I answered your question. But to address your overall solution, I would just have 2 separate class definitions (one for each file) and then read the files into 2 separate collections then execute the logic to create the report using 2 collections instead of one. This avoids mapping altogether but assumes the scope of your application is solely for the purpose of generating the report.
Marked answer as correct as it does indeed satisfy what i had posted. Appreciate your comments to allow me to undertsand! Thank you
1

You do not need an external library. Use the code below

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
using System.Data;
using System.Data.OleDb;
using System.IO;


namespace ConsoleApplication1
{
    class Program
    {
        const string FOLDER = @"c:\temp\test";
        static void Main(string[] args)
        {
            CSVReader reader = new CSVReader();

            //table containing merged csv files
            DataTable dt = new DataTable();

            //get csv files one at a time
            foreach (string file in Directory.GetFiles(FOLDER, "*.csv"))
            {
                //read csv file into a new dataset
                DataSet ds = reader.ReadCSVFile(file, true);
                //datatable containing new csv file
                DataTable dt1 = ds.Tables[0];

                //add new columns to datatable dt if doesn't exist
                foreach(DataColumn col in dt1.Columns.Cast<DataColumn>())
                {
                    //test if column exists and add if it doesn't
                    if (!dt.Columns.Contains(col.ColumnName))
                    {
                        dt.Columns.Add(col.ColumnName, typeof(string));
                    }
                }

                //array of column names in new table
                string[] columnNames = dt1.Columns.Cast<DataColumn>().Select(x => x.ColumnName).ToArray();

                //copy row from dt1 into dt
                foreach(DataRow row in dt1.AsEnumerable())
                {
                    //add new row to table dt
                    DataRow newRow = dt.Rows.Add();

                    //add data from dt1 into dt
                    for(int i = 0; i < columnNames.Count(); i++)
                    {
                        newRow[columnNames[i]] = row[columnNames[i]];
                    }
                }
            }

        }
    }
    public class CSVReader
    {

        public DataSet ReadCSVFile(string fullPath, bool headerRow)
        {

            string path = fullPath.Substring(0, fullPath.LastIndexOf("\\") + 1);
            string filename = fullPath.Substring(fullPath.LastIndexOf("\\") + 1);
            DataSet ds = new DataSet();

            try
            {

                //read csv file using OLEDB Net Library
                if (File.Exists(fullPath))
                {
                    string ConStr = string.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0}" + ";Extended Properties=\"Text;HDR={1};FMT=Delimited\\\"", path, headerRow ? "Yes" : "No");
                    string SQL = string.Format("SELECT * FROM {0}", filename);
                    OleDbDataAdapter adapter = new OleDbDataAdapter(SQL, ConStr);
                    adapter.Fill(ds, "TextFile");
                    ds.Tables[0].TableName = "Table1";
                }

                //replace spaces in column names with underscore
                foreach (DataColumn col in ds.Tables["Table1"].Columns)
                {
                    col.ColumnName = col.ColumnName.Replace(" ", "_");
                }
            }

            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
            return ds;
        }
    }
}

2 Comments

I don't want to just take a body of code, would rather understand fully what it does. Can you give me a brief explanation?
Added comments to code. The code is reading csv using OLEDB Net Library method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.