Ruby parsing and regex

Question

Picked up Ruby recently and have been fiddling around with it. I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line.

Let's say I have an order list that looks strictly like this in this format:

cost: 50 items: book,lamp

One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that. How can I check for errors in this format using Ruby? This for example should fail my checks:

cost:     60 items:shoes,football

My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. Also doesn't help me check for trailing whitespaces. How do I go about doing this?

However you do it, this is a very nice example for developing your TDD (test driven development)/testing skills. At time of commenting, I would propose that you go ahead and spend some time playing with minitest and your problem. — Felix
– Felix, Commented Sep 5, 2016 at 7:43

Cary Swoveland · Accepted Answer · 2016-09-05 06:25:20Z

2

You could use the following regular expression.

r = /
    \A                # match beginning of string     
    cost:\s           # match "cost:" followed by a space
    \d+\s             # match > 0 digits followed by a space
    items:\s          # match "items:" followed by a space
    [[:alpha:]]+      # match > 0 lowercase or uppercase letters
    (?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase 
                      # letters in a non-capture group (?: ... )
    *                 # perform the match on non-capture group >= 0 times
    \z                # match the end of the string
    /x                # free-spacing regex definition mode

"cost: 50 items: book,lamp"         =~ r #=> 0   (a match, beginning at index 0)
"cost: 50 items: book,lamp,table"   =~ r #=> 0   (a match, beginning at index 0)
"cost:     60 items:shoes,football" =~ r #=> nil (no match)

The regex can can of course be written in the normal manner:

r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/

or

r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/

though a whitespace character (\s) cannot be replaced by a space in the free-spacing mode definition (\x).

edited Sep 5, 2016 at 6:25

answered Sep 5, 2016 at 5:26

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Stefan Over a year ago

To explicitly match a space in free-spacing mode you could use \ or \u0020 or [ ].

move_slow_break_things Over a year ago

Thanks @CarySwoveland. Just pushing this further, is there a way to include the presence of underscores and numbers and hyphens (No spaces) in the items list? Like if the item was a "version_2_vaccuum" instead of something simple like "lamp". Will adding a "\w" after the [:alpha:] work?

Cary Swoveland Over a year ago

Just change [[:alpha:]]+ in the regex to \w+ in both places (\w being a "word character", which is an uppercase or lowercase letter, digit or underscore).

Collectives™ on Stack Overflow

Ruby parsing and regex

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related