10

I am attempting to build a string extension method to trim a string to a certain length but with not breaking a word. I wanted to check to see if there was anything built into the framework or a more clever method than mine. Here's mine so far (not thoroughly tested):

public static string SmartTrim(this string s, int length)
        {
            StringBuilder result = new StringBuilder();

            if (length >= 0)
            {
                if (s.IndexOf(' ') > 0)
                {
                    string[] words = s.Split(' ');
                    int index = 0;

                    while (index < words.Length - 1 && result.Length + words[index + 1].Length <= length)
                    {
                        result.Append(words[index]);
                        result.Append(" ");
                        index++;
                    }

                    if (result.Length > 0)
                    {
                        result.Remove(result.Length - 1, 1);
                    }
                }
                else
                {
                    result.Append(s.Substring(0, length));
                }
            }
            else
            {
                throw new ArgumentOutOfRangeException("length", "Value cannot be negative.");
            }

            return result.ToString();
        }
4
  • 1
    i would not split. i would loop over the string searching for the next word break. stop if the position of the found break is after the given length. otherwise add the word before it to the string builder. to find the word before the found break you will need to store the position of the previously found break (or zero). makes sense? Commented Aug 17, 2010 at 17:03
  • 1
    You may not care for your application, but keep in mind that the built-in Trim functions are actually checking for char.IsWhiteSpace, not just space. Commented Aug 17, 2010 at 17:11
  • @Marc - good note. I was questioning my wording while typing it. Commented Aug 17, 2010 at 18:25
  • See also stackoverflow.com/questions/1613896/… Commented Nov 22, 2010 at 16:02

7 Answers 7

14

I'd use string.LastIndexOf - at least if we only care about spaces. Then there's no need to create any intermediate strings...

As yet untested:

public static string SmartTrim(this string text, int length)
{
    if (text == null)
    {
        throw new ArgumentNullException("text");
    }
    if (length < 0)
    {
        throw new ArgumentOutOfRangeException();
    }
    if (text.Length <= length)
    {
        return text;
    }
    int lastSpaceBeforeMax = text.LastIndexOf(' ', length);
    if (lastSpaceBeforeMax == -1)
    {
        // Perhaps define a strategy here? Could return empty string,
        // or the original
        throw new ArgumentException("Unable to trim word");
    }
    return text.Substring(0, lastSpaceBeforeMax);        
}

Test code:

public class Test
{
    static void Main()
    {
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(20));
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(3));
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(4));
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(5));
        Console.WriteLine("'{0}'", "foo bar baz".SmartTrim(7));
    }
}

Results:

'foo bar baz'
'foo'
'foo'
'foo'
'foo bar'
Sign up to request clarification or add additional context in comments.

2 Comments

So how do you refactor if the requirement is any word break, not just a space? Specifically the most common (where a word could break, but the character not have white-space around it) is the hyphen... Just curious.
@AllenG: If it's still in a small set, text.LastIndexOfAny(Delimiters) would be the best option.
2

How about a Regex based solution ? You will probably want to test some more, and do some bounds checking; but this is what spring to my mind:

using System;
using System.Text.RegularExpressions;

namespace Stackoverflow.Test
{
    static class Test
    {
        private static readonly Regex regWords = new Regex("\\w+", RegexOptions.Compiled);

        static void Main()
        {
            Console.WriteLine("The quick brown fox jumped over the lazy dog".SmartTrim(8));
            Console.WriteLine("The quick brown fox jumped over the lazy dog".SmartTrim(20));
            Console.WriteLine("Hello, I am attempting to build a string extension method to trim a string to a certain length but with not breaking a word. I wanted to check to see if there was anything built into the framework or a more clever method than mine".SmartTrim(100));
        }

        public static string SmartTrim(this string s, int length)
        {
            var matches = regWords.Matches(s);
            foreach (Match match in matches)
            {
                if (match.Index + match.Length > length)
                {
                    int ln = match.Index + match.Length > s.Length ? s.Length : match.Index + match.Length;
                    return s.Substring(0, ln);
                }
            }
            return s;
        }
    }
}

Comments

2

Try this out. It's null-safe, won't break if length is longer than the string, and involves less string manipulation.

Edit: Per recommendations, I've removed the intermediate string. I'll leave the answer up as it could be useful in cases where exceptions are not wanted.

public static string SmartTrim(this string s, int length)
{
    if(s == null || length < 0 || s.Length <= length)
        return s;

    // Edit a' la Jon Skeet. Removes unnecessary intermediate string. Thanks!
    // string temp = s.Length > length + 1 ? s.Remove(length+1) : s;
    int lastSpace = s.LastIndexOf(' ', length + 1);
    return lastSpace < 0 ? string.Empty : s.Remove(lastSpace);
}

3 Comments

Not bad, but still creates one intermediate string in some cases :)
I think you can do this too: s.LastIndexOf(' ', length); And you don't have to do your string temp = ... line.
@mlsteeves: Agreed. @Jon's solution handles LastIndexOf better. I hadn't known about the other override.
1
string strTemp = "How are you doing today";
int nLength = 12;
strTemp = strTemp.Substring(0, strTemp.Substring(0, nLength).LastIndexOf(' '));

I think that should do it. When I ran that, it ended up with "How are you".

So your function would be:

public static string SmartTrim(this string s, int length) 
{  
    return s.Substring(0, s.Substring(0, length).LastIndexOf(' '));; 
} 

I would definitely add some exception handling though, such as making sure the integer length is no greater than the string length and not less than 0.

2 Comments

This will fail in various cases, e.g. if the length is longer than you need, or is one word of exactly the right length, or can't be successfully trimmed.
Yeah, you put that comment as I was making the edit. :) I figured I woudl leave the exception handling to him.
1

Obligatory LINQ one liner, if you only care about whitespace as word boundary:

return new String(s.TakeWhile((ch,idx) => (idx < length) || (idx >= length && !Char.IsWhiteSpace(ch))).ToArray());

Comments

1

Use like this

var substring = source.GetSubstring(50, new string[] { " ", "." })

This method can get a sub-string based on one or many separator characters

public static string GetSubstring(this string source, int length, params string[] options)
    {
        if (string.IsNullOrWhiteSpace(source))
        {
            return string.Empty;
        }

        if (source.Length <= length)
        {
            return source;
        }

        var indices =
            options.Select(
                separator => source.IndexOf(separator, length, StringComparison.CurrentCultureIgnoreCase))
                .Where(index => index >= 0)
                .ToList();

        if (indices.Count > 0)
        {
            return source.Substring(0, indices.Min());
        }

        return source;
    }

Comments

0

I'll toss in some Linq goodness even though others have answered this adequately:

public string TrimString(string s, int maxLength)
{
    var pos = s.Select((c, idx) => new { Char = c, Pos = idx })
        .Where(item => char.IsWhiteSpace(item.Char) && item.Pos <= maxLength)
        .Select(item => item.Pos)
        .SingleOrDefault();

    return pos > 0 ? s.Substring(0, pos) : s;
}

I left out the parameter checking that others have merely to accentuate the important code...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.