6

Is there a common C/C++ library (or common technique) for taking a line(s) of input text and splitting the words into separate lines. Where each line of output has a max width and words are not split across lines. Whitespace being collapsed or preserved is ok. Punctuation must be preserved. Small and compact library is preferred.

I could easily spend an afternoon putting something together that works, but would like to know if there is something common out there so I don't re-invent the wheel. Bonus points if the input line can contain a format specifier to indicate an indention level for the output lines.

Example input: "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.

Example output (target width = 60)

123456789012345678901234567890123456789012345678901234567890   Line added to show where 60 is
Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.
6
  • I don't think there's anything in the standard library that does this, so you'll probably have to write your own code to do it (shouldn't be hard at all) (but if there is, I really want to know). Commented Jul 31, 2011 at 19:09
  • 1
    Also this will only work with fixed-width fonts without adding way more complexity. Commented Jul 31, 2011 at 19:11
  • Do you want to store the lines that were split in an array or just print it? Commented Jul 31, 2011 at 19:12
  • @Ram - It's just for printing. Commented Jul 31, 2011 at 19:18
  • 2
    example with width=50 doesn't make sense. I guess it's 60. Commented Jul 31, 2011 at 19:23

8 Answers 8

1

I think what you may be looking for is:

char temp[60];
int cnt, x = 0;
do
{
    cnt = 59;
    strncpy(temp, src + x, 60); //Assuming the original is stored in src
    while(temp[cnt] != ' ') cnt --;
    temp[cnt] = (char) 0;
    x += cnt + 1;
    printf("%s\n", temp);
}while (x < strlen(src));
Sign up to request clarification or add additional context in comments.

Comments

1

Here is a small function with which you can do what you want. It returns a list of the lines. You can remove all of the std:: if you want by using namespace std; or better using std::list; using std::string; using std::size_t; but I didn't want to assume you did.

list<string> wraptext(string input, size_t width) {
    size_t curpos = 0;
    size_t nextpos = 0;

    list<string> lines;
    string substr = input.substr(curpos, width + 1);

    while (substr.length() == width + 1 && (nextpos = substr.rfind(' ')) != input.npos) {
        lines.push_back(input.substr(curpos, nextpos));
        curpos += nextpos + 1;
        substr = input.substr(curpos, width + 1);
    }

    if (curpos != input.length())
        lines.push_back(input.substr(curpos, input.npos));

    return lines;
}

This program using that function:

int main() {
    string input = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";

    list<string> l = wraptext(input, 60);

    for (auto i = l.begin(); i != l.end(); ++i)
        cout << *i << endl;

    cin.get();
}

Prints your example text:

Shankle drumstick corned beef, chuck turkey chicken pork
chop venison beef strip steak cow sausage. Tail short loin
shoulder ball tip, jowl drumstick rump. Tail tongue ball tip
meatloaf, bresaola short loin tri-tip fatback pork loin
sirloin shank flank biltong. Venison short loin andouille.

Comments

1

If you want to do the job in C, you could try the w_wrap.c and w_wrap.h that I posted to Fidonet C_ECHO 20 years ago or so.

If you want to do the job in C++, it seems like you could simplify the code a bit:

#include <sstream>
#include <string>
#include <iostream>

void wrap(std::string const &input, size_t width, std::ostream &os, size_t indent = 0)
{ 
    std::istringstream in(input);

    os << std::string(indent, ' '); 
    size_t current = indent;
    std::string word;

    while (in >> word) {
        if (current + word.size() > width) {
            os << "\n" << std::string(indent, ' ');
            current = indent;
        }
        os << word << ' ';
        current += word.size() + 1;
    }
}

#ifdef TEST 
int main() { 
    char *in = "Shankle drumstick corned beef, chuck turkey chicken pork chop"
               " venison beef strip steak cow sausage. Tail short loin shoulder"
               " ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf,"
               " bresaola short loin tri-tip fatback pork loin sirloin shank"
               " flank biltong. Venison short loin andouille.";

    wrap(in, 60, std::cout);
    return 0;
}
#endif

To add indentation, you'd use something like:

wrap(in, 60, std::cout, 5);

Given that you're doing I/O, it probably doesn't matter much in this case, but if you were doing this under other circumstances, you might want to consider a different algorithm. Rather than copy one word at a time until you exceed the specified width, you can go directly to the maximum line width in the input, and walk backwards through the input string from there until you find whitespace. At least given typical word lengths, you'll only walk back somewhere around 3 characters on average, rather than walking forward through an average of (say) 60 characters. This would be particularly relevant using something like C strings, where you were storing a pointer to the beginning of each line, without copying the content.

1 Comment

Love the technique for handling indent. The only difference I would make is to pass the input via a stream to the wrap() function.
0

Here's my approach, it's certainly not the fastest but I tried to make it as readable as possible. The result is the same as your example.

#include <iostream>
#include <string>


std::string splitInLines(std::string source, std::size_t width, std::string whitespace = " \t\r")
{
    std::size_t  currIndex = width - 1;
    std::size_t  sizeToElim;
    while ( currIndex < source.length() )
    {
        currIndex = source.find_last_of(whitespace,currIndex + 1); 
        if (currIndex == std::string::npos)
            break;
        currIndex = source.find_last_not_of(whitespace,currIndex);
        if (currIndex == std::string::npos)
            break;
        sizeToElim = source.find_first_not_of(whitespace,currIndex + 1) - currIndex - 1;
        source.replace( currIndex + 1, sizeToElim , "\n");
        currIndex += (width + 1); //due to the recently inserted "\n"
    }
    return source;
}

int main() {
    std::string source = "Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille.";
    std::string result = splitInLines(source , 60);
    std::cout << result;
    return 0;
}

2 Comments

I don't like that you are re-defining the meaning of white space.
I ultimately rolled by own, but there was some inspiration with this answer.
0

Ya, load it into a character array, then use strtok, to break it into words, using a space as the word seperator.

5 Comments

Right, but from there it's an easy jump to get the length of the next word, check if the current position + the length will exceed the max width, and switch to the next line if it does. Print away!
strtok is not really fit to do the job, separates tokens with '\0'.
It chops up the string into multiple zero terminated strings. that's the whole point of tokenizing.
humm. I must be misunderstanding the question.
ah yes. the question changed. Yes, josh that is the other piece of the puzzle
0

take a function for your work like:

void put_multiline(const char *s,int width)
{
  int n,i=0;
  char t[100];
  while( 1==sscanf(s,"%99s%n",t,&n) )
  {
    if( i+strlen(t)>width ) puts(""),i=0;
    printf("%s%s",i?++i," ":"",t);i+=strlen(t);
    s+=n;
  }
}

strtok will destroy your string, this solution not. This function will also work on all whitespaces not only space/tab.

Comments

0

You could probably use regex substitution: replace /(.*){,60}? +/ with $1\n, advance the string pointer and repeat (note: the ? is supposed to mean non-greedy matching).

If properly implemented, the conversion could be even made in-place.

Comments

0

Here is a regex-based approach. Different from the approaches in other answers, it also handles newlines in the input string gracefully.

#include <regex>
#include <iostream>
#include <string>

int main() {
  auto test = std::string{"Shankle drumstick corned beef, chuck turkey chicken pork chop venison beef strip steak cow sausage. Tail short loin shoulder ball tip, jowl drumstick rump. Tail tongue ball tip meatloaf, bresaola short loin tri-tip fatback pork loin sirloin shank flank biltong. Venison short loin andouille."};

  // Consume 60 characters that are followed by a space or the end of the input string
  auto line_wrap = std::regex{"(.{1,60})(?: +|$)"};

  // Replace the space or the end of the input string with a new line
  test = regex_replace(test, line_wrap, "$1\n");

  // Trim the new line added for the end of the input string
  test.resize(test.size() - 1);

  std::cout << test << std::endl;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.