1

I have one string, that looks like this:

a[abcdefghi,2,3,jklmnopqr]

The beginning "a" is fixed and non-changing, however the content within the brackets is and can follow a pattern. It will always be an alphabetical string, possibly followed by numbers separate by commas or more strings and/or numbers.

I'd like to be able to break it into chunks of the string and any numbers that follow it until the "]" or another string is met.

Probably best explained through examples and expected ideal results:

a[abcdefghi]               -> "abcdefghi"
a[abcdefghi,2]             -> "abcdefghi,2"
a[abcdefghi,2,3,jklmnopqr] -> "abcdefghi,2,3" and "jklmnopqr"
a[abcdefghi,2,3,jklmnopqr,stuvwxyz]     -> "abcdefghi,2,3" and "jklmnopqr" and "stuvwxyz"
a[abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz] -> "abcdefghi,2,3" and "jklmnopqr,1,9" and "stuvwxyz"
a[abcdefghi,1,jklmnopqr,2,stuvwxyz,3,4] -> "abcdefghi,1" and "jklmnopqr,2" and "stuvwxyz,3,4"

Ideally a malformed string would be partially caught (but this is a nice extra):

a[2,3,jklmnopqr,1,9,stuvwxyz] -> "jklmnopqr,1,9" and "stuvwxyz"

I'm using Javascript and I realize a regex won't bring me all the way to the solution I'd like but it could be a big help. The alternative is to do a lot of manually string parsing which I can do but doesn't seem like the best answer.

Advice, tips appreciated.

UPDATE: Yes I did mean alphametcial (A-Za-z) instead of alphanumeric. Edited to reflect that. Thanks for letting me know.

3 Answers 3

2

You'd probably want to do this in 2 steps. First, match against:

a\[([^[\]]*)\]

and extract group 1. That'll be the stuff in the square brackets.

Next, repeatedly match against:

[a-z]+(,[0-9]+)*

That'll match things like "abcdefghi,2,3". After the first match you'll need to see if the next character is a comma and if so skip over it. (BTW: if you really meant alphanumeric rather than alphabetic like your examples, use [a-z0-9]*[a-z][a-z0-9]* instead of [a-z]+.)

Alternatively, split the string on commas and reassemble into your word with number groups.

Sign up to request clarification or add additional context in comments.

2 Comments

The first step does not appear to work. Using javascript, this returns null: /a\[([^[\]])\]/.exec("a[abcdefghi]")
@michael: Sorry, I'd forgotten a * in there. Should be fixed now.
1

Why wouldn't a regex bring you all the way to a solution? The following regex works against the given data, but it makes a few assumptions (at least two alphas followed by comma separated single digits).

([a-z]{2,}(?:,\\d)*)

Example:

re = new RegExp('[a-z]{2,}(?:,\\d)*', 'g') 
matches = re.exec("a[abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz]")

2 Comments

Afraid it doesn't work. From what I gather the assumptions go against the examples provided. In javascript it returns an array with two identical values of "abcdefghi,2,3", from this: /([a-z]{2,}(?:,\d)*)/.exec("a[abcdefghi,2,3,jklmnopqr,stuvwxyz]")
The regex works and was tested. However, a slight translation is needed for Javascript (the \d needs to be escaped). Fixed with an example.
0

Assuming you can easily break out the string between the brackets, something like this might be what you're after:

> re = new RegExp('[a-z]+(?:,\\d)*(?:,?)', 'gi')
> while (match = re.exec("abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz")) { print(match[0]) }
abcdefghi,2,3,
jklmnopqr,1,9,
stuvwxyz

This has the advantage of working partially in your malformed case:

> while (match = re.exec("abcdefghi,2,3,jklmnopqr,1,9,stuvwxyz")) { print(match[0]) }
jklmnopqr,1,9,
stuvwxy

The first character class [a-z] can be modified if you meant for it to be truly alphanumeric.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.