1

I need to split string on words and each line should have 25 characters. for example:

string ORIGINAL_TEXT = "Please write a program that breaks this text into small chucks. Each chunk should have a maximum length of 25 "

output should be:

"Please write a program",

"that breaks this text",

"into small chucks. Each",

"chunk should have a",

"maximum length of 25"

I tried using substring - but it is breaking words like

"Please write a program th" - wrong

"Please write a program" - correct

Please write a program - is only 23 characters, it can take more 2 characters but it would break the word that.

string[] splitSampArr = splitSamp.Split(',', '.', ';');
string[] myText = new string[splitSampArr.Length + 1];

int i = 0;
foreach (string splitSampArrVal in splitSampArr)
{
    if (splitSampArrVal.Length > 25)
    {
        myText[i] = splitSampArrVal.Substring(0, 25);
        i++;
    }
    myText[i] = splitSampArrVal;

    i++;
}
1
  • You split the string with . (and , and ;) but as the output, you say you need "into small chucks. Each" with a dot inside? Commented Feb 6, 2016 at 13:23

2 Answers 2

5

You can achieve that with:

@"(\b.{1,25})(?:\s+|$)"

See the regex demo

This regex matches and captures into Group 1 any character but a newline (with .) preceded with a word boundary (so, we only start matching whole words), 1 to 25 occurrences (thanks to the limiting quantifier {1,25}), and then matches either 1 or more whitespace characters (with \s+) or the end of string ($).

See a code demo:

using System;
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
public class Test
{
    public static void Main()
    {
        var str = "Please write a program that breaks this text into small chucks. Each chunk should have a maximum length of 25 ";
        var chunks = Regex.Matches(str, @"(\b.{1,25})(?:\s+|$)")
                 .Cast<Match>().Select(p => p.Groups[1].Value)
                 .ToList();
        Console.WriteLine(string.Join("\n", chunks));
    }
}
Sign up to request clarification or add additional context in comments.

5 Comments

I see a problem with this regex. If I give length to let's say 5 then it matches only words up to 5 lengths. Ex. "This is my invoice" is broken only into 2 lines but expected was three lines.
@MukeshKumar You get 3 lines, so there is no problem.
@WiktorStribiżew - this works perfectly in the RegEx builder, but when I try it in C# using a larger number it gives me nothing larger than 127 characters (I need 32K characters, but it breaks it down like this even on smaller numbers like 320). var goodsChunks = Regex.Matches(application.Goods ?? "", @"(\b.{1,32000})(?:\s+|$)") .Cast<Match>().Select(p => p.Groups[1].Value) .ToList(); Any ideas?
@Mike I do not know what your input string looks like and what you expect to get.
Apologies - I think the problem is my text has line breaks in it.
1
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication3
{
    class Program
    {
        static void Main(string[] args)
        {
            var sentence = "Please write a program that breaks this text into small chucks. Each chunk should have a maximum length of 25 ";
            StringBuilder sb = new StringBuilder();
            int count = 0;
            var words = sentence.Split(' ');
            foreach (var word in words)
            {
                if (count + word.Length > 25)
                {
                    sb.Append(Environment.NewLine);
                    count = 0;
                }
                sb.Append(word + " ");
                count += word.Length + 1;
            }
            Console.WriteLine(sb.ToString());
            Console.ReadKey();
        }
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.