3
\$\begingroup\$

I wrote the following (static) class to read files easily in C#. However, it is a static class, so I wonder if there would be a gain to make it a non static method (and therefore, passing arguments as the separator and the path as attributes of the class).

Besides, the OrderedDictionary class may not be a good choice (it is not generic but it preserves the order of the columns), and I don't know if there is a nice workaround for this.

using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.IO;

namespace MySolution.FileUtils
{
    /// <summary>
    /// Various helpers to read through the lines of a file.
    /// </summary>
    public static class LinesEnumerator
    {
        public static string ReadFirstLine(string path)
        {
            using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (StreamReader sr = new StreamReader(fs))
            {
                return sr.ReadLine();
            }
        }

        /// <summary>
        /// Enumerates the lines of a file.
        /// </summary>
        /// <param name="path">The path of the file.</param>
        /// <param name="maxLines">The maximum number of lines to read.</param>
        /// <returns>The lines of a file, as a IEnumerable</returns>
        public static IEnumerable<string> LinesReader(string path, int maxLines = Int32.MaxValue)
        {
            string line;
            int lineNumber = 0;
            using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (StreamReader sr = new StreamReader(fs))
            {
                while ((line = sr.ReadLine()) != null && lineNumber < maxLines)
                {
                    lineNumber++;
                    yield return line;
                }
            }
        }

        /// <summary>
        /// Reads the lines of a CSV file and enumerates them through a csv file
        /// </summary>
        /// <param name="path">Path to the CSV file</param>
        /// <param name="separator">Separator used by the CSV file</param>
        /// <returns>A collection of ordered dictionaries, corresponding to the lines of the file</returns>
        public static IEnumerable<OrderedDictionary> DictReaderCSV(string path, char separator)
        {
            using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            using (StreamReader sr = new StreamReader(fs))
            {
                string[] header = sr.ReadLine().Split(separator);
                string line = "";
                while ((line = sr.ReadLine()) != null)
                {
                    string[] splittedLine = line.Split(separator);
                    OrderedDictionary od = new OrderedDictionary();
                    for (int i = 0; i < header.Length; i++)
                        od.Add(header[i], splittedLine[i]);
                    yield return od;
                }
            }
        }
    }
}
\$\endgroup\$
1

4 Answers 4

4
\$\begingroup\$

Naming Items

You should avoid abbreviations like Dict in DictReaderCSV, spelling it Dictionary is cleaner.

Next, it's much more natural to use nouns for class names and verbs for methods. As such, I would name your class CSVDictionaryReader and your main method EnumerateEntries(). LinesEnumerator didn't give any hint that the class is also responsible for parsing CSV.

Unused code

ReadFirstLine and ReadFirstLine don't seem to be used in the current code.

Streams

The StreamReader class already provides constructors that receive a disk path, so no need to manually create a FileStream just to pass it onto the StreamReader.

What would however give much more flexibility is for the class to allow the user to provide it with an already constructed Stream or TextReader. Perhaps the user wants to parse a 100 MB CSV as it's arriving over the wire, without writing it to the disk.

Parsing CSV

CSV files allow for quotes to be used so that entries can contain the separator. A parser should be prepared to correctly deal with this.

Id,Name,Age
1,John,20
2,"Jack, the second", 21
\$\endgroup\$
0
2
\$\begingroup\$

If you need a classic CSV reader there are plenty of good choices in NuGet, if you don't have special requirements, I would just use the common ones.

That said, I made few changes to simplify the code and reduce duplication:

public static class CsvReader
{
    public static IEnumerable<string> GetLines(string path, int limit) => GetLines(path).Take(limit);

    public static IEnumerable<string> GetLines(string path)
    {
        using (var streamReader = new StreamReader(path))
        {
            string line;

            while ((line = streamReader.ReadLine()) != null)
                yield return line;
        }
    }

    public static IEnumerable<OrderedDictionary> GetCharSeparatedDictionaries(string path, char separator)
    {
        var lineEnumerator = GetLines(path).GetEnumerator();

        if(!lineEnumerator.MoveNext())
            yield break;

        var header = lineEnumerator.Current.Split(separator);

        if(header.Distinct().Count() != header.Length)
            // Handle duplicate headers to prevent exception later.
            yield break;

        while (lineEnumerator.MoveNext())
        {
            var splittedLine = lineEnumerator.Current.Split(separator);

            // Not sure why you would want a dictionary for each line (?)
            var orderedDictionary = new OrderedDictionary();

            // Assuming the file is valid and lines do not contain less fields than the headers. If assumption wrong, need to check.
            for (var i = 0; i < header.Length; i++)
                orderedDictionary.Add(header[i], splittedLine[i]);

            yield return orderedDictionary;
        }
    }
}

I appreciate you may have added File.Open to specify a specific sharing mode, if so, just put it back unless there is a suitable overload of StreamReader.

There are few weak points for exceptions, I have highlighted some with comments and some with some defensive code, if you have already validated your input file, just remove the checks. ( File not empty on the headers fetching, duplicate headers ).

LINQ is a great code saver for the limit method with Take and I have removed the duplication.

Name-wise I have removed CSV as your dictionary factory method takes any separator and not just a comma, speaking of which, I would also rename the class to something which is less specific.

If you want the class to be testable you can inject the file reader and to mock it.

Good comment from @D. Jurcau about the separator in the content; one more reason to use a good existing CSV parser and then add your logic for the dictionary.

\$\endgroup\$
2
  • \$\begingroup\$ oh no, a for and an if without {} - your code review needs a code review ;-) \$\endgroup\$ Commented Aug 20, 2016 at 19:47
  • \$\begingroup\$ @t3chb0t less distractions from the real code :) I used to write VB.NET, I don't like parenthesis! But drop lines for debugging are a must. \$\endgroup\$ Commented Aug 20, 2016 at 19:51
1
\$\begingroup\$

I suggest you only use the LinesReader to read lines and replace the ReadFirstLine with a call to FirstOrDefault():

return LinesReader(path).FirstOrDefault();

This way you avoid code repetition.


In DictReaderCSV you can do the same and reuse the reader:

var lines = LineReader(path):
var header = lines.FirstOrDefault();
foreach(var line in lines.Skip(1))
{
    ...
}

Making it non-static and providing an abstraction would enable you to use it via Dependency Injection.

\$\endgroup\$
5
  • \$\begingroup\$ I would avoid calling FirstOrDefault and then Skip, this will enumerate the collection twice and therefore open the file twice and double up the creation of the FileStream/StreamReader instances. \$\endgroup\$ Commented Aug 20, 2016 at 19:28
  • 1
    \$\begingroup\$ @Uno oh, your absolutely right, I meant to add .ToList() in the first line but somethow forgot to do it. \$\endgroup\$ Commented Aug 20, 2016 at 19:30
  • \$\begingroup\$ That would work, but it might defeat the purpose of using an iterator to read line by line as it will load the whole file in memory. \$\endgroup\$ Commented Aug 20, 2016 at 19:33
  • \$\begingroup\$ @Uno as the OP didn't say anything about the size of the files I assume they are digestable and only a few kilobytes or megabytes in size - in this case the YAGNI priciple applies ;-) or premature optimization is the root of all evil. \$\endgroup\$ Commented Aug 20, 2016 at 19:35
  • \$\begingroup\$ True, but at this point I would remove the iterator and ToList() in the LinesReader as it will save the creation of the state machine. (Assuming LineReader is not used somewhere else with a different intent, but that would be odd). \$\endgroup\$ Commented Aug 20, 2016 at 19:40
0
\$\begingroup\$

I don't like the overhead of

IEnumerable<OrderedDictionary>

You are repeating the column names for every row.
There is no check for count not matching in row versus header.

Consider

IEnumerable<IEnumerable<string>> 

Or

IEnumerable<string[]>

Somehow report if line does not match header

Split accepts char[]
Might as well support that

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.