3

I have several strings that contain one or more digits and may also contain one or more letters following the digits (caps on letters don't matter). The strings follow the following regex pattern:

[0-9]+[a-zA-z]*

and may look like:

"15791"
"14810A"
"10480ABCD"
"5ABCDEFGH"

If one of the strings above contains non-numerical characters, how do I split the numbers (first part) into an integer and the letters (second part) into a string?

I know I can split a string like this:

array = "1,2,3,4".split(',')

But this doesn't help since I don't have a separator.

1
  • 1
    Good question and well-written: succinct, complete, unambiguous. Commented Mar 18, 2015 at 17:22

4 Answers 4

11

Use a positive lookbehind assertion based regex in string.split.

> "10480ABCD".split(/(?<=\d)(?=[A-Za-z])/)
=> ["10480", "ABCD"]
  • (?<=\d) Positive lookbehind which asserts that the match must be preceded by a digit character.

  • (?=[A-Za-z]) which asserts that the match must be followed by an alphabet. So the above regex would match the boundary which exists between a digit and an alphabet. Splitting your input according to the matched boundary will give you the desired output.

OR

Use string.scan

> "10480ABCD".scan(/\d+|[A-Za-z]+/)
=> ["10480", "ABCD"]
Sign up to request clarification or add additional context in comments.

4 Comments

Wow, that was prompt! Tried it in irb and works like a charm! Thanks so much, I'll accept the answer in 12 minutes :)
avinash, if the string will have only digits and alphabets, it would be best to stick with \d and \D to keep the expression simple. like in my response. :)
op clearly mention that his string must satisfy [0-9]+[a-zA-z]* pattern. Changing [A-Za-z]+ to \D+ won't much more difference.
since we know the string will have only digits and alphabets, using \d and \D keeps things simple and readable. as an argument: 3 and 4 can be added using 3 + 4 or 1 + 1 + 1 + 4. results are the same. but simplicity is preferred.
9

The splitter is the non-numerical characters themselves:

"10480ABCD".split(/(\D+)/)
# => ["10480", "ABCD"]

1 Comment

Clever. Puzzled readers (if any): one line of the docs for String#split reads, "If pattern contains [capture] groups, the respective matches will be returned in the array as well."
0

You can always use match:

re = /(\d+)([a-z]*)/i
str = "10480ABCD"

m = re.match(str)
m    #=> #<MatchData "10480ABCD" 1:"10480" 2:"ABCD">
m[0] #=> "10480"
m[1] #=> "ABCD"

Use MatchData#[] to extract capture groups:

re.match(str)[1, 2]
["10480", "ABCD"]

Comments

0

[Edit: for some reason @Humza deleted his answer, so I've undeleted mine. I had previously posted this, but then deleted it when I noticed that Humza had already posted a similar answer.]

I feel like I must be missing something, as it seems to have a straightforward solution:

def extract(str)
  str.scan(/\d+|[A-Z]+/i)
end

extract "15791"     #=> ["15791"] 
extract "14810A"    #=> ["14810", "A"] 
extract "10480ABCD" #=> ["10480", "ABCD"]
extract "5ABCDEFGH" #=> ["5", "ABCDEFGH"] 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.