0

I am trying to figure out meaning of this regular expression:

.{0,70}(?:\\S(?:-| |$)|$)

I understood the meaning of this expression using regexper.

What I understood:

1) 1 to 70 characters except new line

2) then there can be end of line (at the end of the expression we have "|$)") OR

3) in non capturing group second alternative is "\S(?:-| |$)". It says it can not be non-whitespace characters "-" or SPACE or "end os line".

My understanding might be incorrect. I am not able to find out how does it work. Can you please explain me with some test data examples?

4
  • 1
    Have you tried looking at a table of regex symbols? Or have you understood parts of it? Be more clear about what's causing the confusion. Commented Nov 25, 2013 at 12:20
  • Where did you get it from ? Commented Nov 25, 2013 at 12:20
  • Sorry, SO is not place where we will write tutorial for you. There are many great ones already, like this one. Voting to put your question on hold until you specify which part exactly confuses you. Commented Nov 25, 2013 at 12:25
  • I have updated my question with my current understanding. It will be helpful, if you could mention some examples on how does it work. Commented Nov 25, 2013 at 12:35

3 Answers 3

2

A step by step explanation

  • .{0,70} repeat "." 0 or 1 or 2... up to 70 times ("." = Any character)
  • (?:...) Non capturing group (don't capture the string)
  • \\S is "\S" (\S is a "A non-whitespace character")
  • (?:...) second non capturing group
  • -| |$ "-" or " " (space) or $ ($ = the end of a line)
  • |$ or the end of a line

For more informations about java regex see docs.

Sign up to request clarification or add additional context in comments.

3 Comments

Just to clarify: . = Any one symbol. \s matches whitespace (spaces, tabs and newlines). \S is a negated \s .
Just to add . can also = spaces and in some cases new lines as well.
I realise the does give a breakdown of the expression, but I struggle to believe it will give the OP any more insight into what the expression actually does than he already has.
1

We can ignore the fact the groups are non-caputuring, because that has no influence on whether something matches, so we have:

.{0,70}(\\S(-| |$)|$)

.{0,70} 0-70 non new-line characters

Followed by either (surrounded in single quotes so the space is visible):

  • '\S-' a non-whitespace character and a -
  • '\S ' a non-whitespace character and a space
  • '\S$' a non-whitespace character and end of input
  • '$' end of input

So I would say, it's trying to match any sets of up to 70 chars that are separated by either a - or a space.

I'm not sure what sort of input you would use this with.. Potentially something that takes a passage of text and splits it into lines no longer than 72 characters (with the final character being a space between words or a - in a hyphenated word)?

For the sake of an example if you reduce the .{0,70} to .{0,10), the you could use it on the following input:

"Hello how are you? My name is Dr Bob Scott-Thomas"

To split it into:

           |<-10 limit here
"Hello how "
"are you? "|
"My name is "
"Dr Bob "  |
"Scott-"   |
"Thomas"   |

RegExr Example

1 Comment

Thanks for the simple and clear example. It helped me to understand its meaning perfectly.
0

I just plugged it into regex101 and this is what it says:

. 0 to 70 times [greedy] Any character (except newline)
(?:\S(?:-| |$)|$) Non-capturing Group
1st Alternative: \S(?:-| |$)
  \S Any char except whitespaces [^\t \r\n\f\v]
  (?:-| |$) Non-capturing Group
  1st Alternative: -
    Literal -
  2nd Alternative:  
    Space (ASCII 32)
  3rd Alternative: $
    $ End of string
2nd Alternative: $
  $ End of string

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.