0

Picked up Ruby recently and have been fiddling around with it. I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line.

Let's say I have an order list that looks strictly like this in this format:

cost: 50 items: book,lamp

One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that. How can I check for errors in this format using Ruby? This for example should fail my checks:

cost:     60 items:shoes,football   

My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. Also doesn't help me check for trailing whitespaces. How do I go about doing this?

2
  • However you do it, this is a very nice example for developing your TDD (test driven development)/testing skills. At time of commenting, I would propose that you go ahead and spend some time playing with minitest and your problem. Commented Sep 5, 2016 at 7:43
  • 1
    : is a colon. ; is a semi-colon. Commented Sep 5, 2016 at 11:18

1 Answer 1

2

You could use the following regular expression.

r = /
    \A                # match beginning of string     
    cost:\s           # match "cost:" followed by a space
    \d+\s             # match > 0 digits followed by a space
    items:\s          # match "items:" followed by a space
    [[:alpha:]]+      # match > 0 lowercase or uppercase letters
    (?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase 
                      # letters in a non-capture group (?: ... )
    *                 # perform the match on non-capture group >= 0 times
    \z                # match the end of the string
    /x                # free-spacing regex definition mode

"cost: 50 items: book,lamp"         =~ r #=> 0   (a match, beginning at index 0)
"cost: 50 items: book,lamp,table"   =~ r #=> 0   (a match, beginning at index 0)
"cost:     60 items:shoes,football" =~ r #=> nil (no match)

The regex can can of course be written in the normal manner:

r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/

or

r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/

though a whitespace character (\s) cannot be replaced by a space in the free-spacing mode definition (\x).

Sign up to request clarification or add additional context in comments.

3 Comments

To explicitly match a space in free-spacing mode you could use \ or \u0020 or [ ].
Thanks @CarySwoveland. Just pushing this further, is there a way to include the presence of underscores and numbers and hyphens (No spaces) in the items list? Like if the item was a "version_2_vaccuum" instead of something simple like "lamp". Will adding a "\w" after the [:alpha:] work?
Just change [[:alpha:]]+ in the regex to \w+ in both places (\w being a "word character", which is an uppercase or lowercase letter, digit or underscore).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.