1

I am working on an assignment where I generate an array of string objects read from a text file. I can't use regex \W operator because if a word contains an apostrophe (') or hyphen (-) as part of the word, it must be included. \W splits on those tokens. However, I need it to split on everything else that is not a letter, including numbers. So my strings should include a-z,A-Z,-,' format.

The code I have is below and it gives me almost the correct output, but I have empty cells in the array where it is reading the end of the line (or new line). I can't figure out how to exclude those (\n\r) while keeping the split that I have. Advice?

try
{
    using (StreamReader reader = new StreamReader("file.txt"))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            string[] words = SplitWords(line.ToLower());
            foreach (string aString in words)
            {
                Console.WriteLine(aString);
            }
        }
    }
}
catch (Exception e)
{
    Console.WriteLine("The file could not be read:");
    Console.WriteLine(e.Message);
}
static string[] SplitWords(string lines)
{
    return Regex.Split(lines, @"[^-'a-zA-Z]");
}

2 Answers 2

1

Try this

return Regex.Split(lines, @"[^-'a-zA-Z]")
                              .Where(x=>!string.IsNullOrWhiteSpace(x)).ToArray();

Use IsNullOrWhiteSpace and linq for extracting only matching elements to new array

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! This gives me just what I wanted. Waiting on timer to mark this for my accepted answer. Thank you for the rapid response.
1

You could do this with a little Linq. Use this to exclude any empty strings:

static string[] SplitWords(string lines)
{
    return Regex.Split(lines, @"[^-'a-zA-Z]")
                .Where(s => s.Length > 0)
                .ToArray();
}

Or this to exclude any strings consisting solely whitespace:

static string[] SplitWords(string lines)
{
    return Regex.Split(lines, @"[^-'a-zA-Z]")
                .Where(s => !s.All(Char.IsWhiteSpace))
                .ToArray();
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.