I am working on an assignment where I generate an array of string objects read from a text file. I can't use regex \W operator because if a word contains an apostrophe (') or hyphen (-) as part of the word, it must be included. \W splits on those tokens. However, I need it to split on everything else that is not a letter, including numbers. So my strings should include a-z,A-Z,-,' format.
The code I have is below and it gives me almost the correct output, but I have empty cells in the array where it is reading the end of the line (or new line). I can't figure out how to exclude those (\n\r) while keeping the split that I have. Advice?
try
{
using (StreamReader reader = new StreamReader("file.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] words = SplitWords(line.ToLower());
foreach (string aString in words)
{
Console.WriteLine(aString);
}
}
}
}
catch (Exception e)
{
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
static string[] SplitWords(string lines)
{
return Regex.Split(lines, @"[^-'a-zA-Z]");
}