1

I have this string

token1=value1Token2=value2Token3[12]=value3

where tokenX might be a string with numbers (e.g.: myToken12 or my2Token) and valueX just numbers or symbols (e.g.: 123123 or {{1, 2}, 3, 4})

that I'd like to transform into this array:

['token1=value1', 'token2=value2', 'token3[12]=value3']

example of string I might have:

String s = na23me=12341234las4tName={{0,0,0},{0,0,0},{0,0,0}}stree2t[696]=764545457OK

I tried with split and matcher ...

this question has been already posted but this is different as it is more general (token=value) where value in the previous question was just a number or in a different post symbols. I'd like to have a general answer here.

thanks.

MORE DETAIL:

with this string:

String s = na23me=12341234las4tName=654567stree2t[696]=764545457OK

this solution:

String[] tokens = s.split("(?<==\\d{1,1000})(?=[a-zA-Z])");

works, but I might have as "value" something like this {1, 2, 3} that I'd like to consider. that's why is different.

ONE LEVEL MORE what about if I want to include the "TEST" too?

e.g.:

conf=0ticket[0,9]="TEST"config={0,0,0}platform_id=121212

I've tried with this

String[] tokens = buffer.split("(?<==\\d{1,1000}|\\W| **\"\\w\"**)(?=[a-zA-Z])");

EXPECTED:

At the end just managing the "STRING" as "value", but it shouldn't be a big issue as is into the double quote.


['conf=0', 'ticket[0,9]="TEST"', 'config={0,0,0}', 'platform_id=121212']

doesn't work ..idea?

10
  • 1
    Show what you have tried already and tell us where it failed. Commented Dec 12, 2014 at 14:52
  • because valueX could be just numbers and symbols (e.g.: {) NOT char Commented Dec 12, 2014 at 15:07
  • split your input according to (?i)(?<!^)(?=token) Commented Dec 12, 2014 at 15:09
  • So an example input is: token1=111Token2=222Token3[12]=33my2Token=444? Commented Dec 12, 2014 at 15:11
  • 1
    @Kasper In your example from few coments earlier token1={{0, 0, 1}, 2, 2}Token2={{0, 0, 1}, 2, 2} Token3[12]={{0, 0, 1}, 2, 2}my2Token={{0, 0, 1}, 2, 2} you have space between {{0, 0, 1}, 2, 2} Token3 but in your description from question I don't see any spaces between value1Token2. So which is wrong, description of your data or example? Commented Dec 12, 2014 at 16:00

1 Answer 1

1

Looks like your input string is getting more and more complex. Here is one regex that seems to be working for all of your provided inputs:

void getTokens(String s) {
   String[] toks = s.split( "(?<==(?>\"[^\"=]{1,1000}\"|\\P{L}{1,1000})) *(?=\\p{L})" );
   for (String tok: toks)
      System.out.printf("=> <%s>%n", tok);
}

Testing:

getTokens("conf=0ticket[0,9]=\"TEST\"config={0,0,0}platform_id=121212");
=> <conf=0>
=> <ticket[0,9]="TEST">
=> <config={0,0,0}>
=> <platform_id=121212>

getTokens("na23me=12341234las4tName={{0,0,0},{0,0,0},{0,0,0}}stree2t[696]=764545457OK");
=> <na23me=12341234>
=> <las4tName={{0,0,0},{0,0,0},{0,0,0}}>
=> <stree2t[696]=764545457>
=> <OK>

getTokens("na23me=12341234las4tName=654567stree2t[696]=764545457OK");
=> <na23me=12341234>
=> <las4tName=654567>
=> <stree2t[696]=764545457>
=> <OK>

Explanation:

Regex uses a lookbehind and a lookahead for splitting:

  • (?<==(?>\"[^\"=]{1,1000}\"|\\P{L}{1,1000})) is a positive lookbehind that makes sure current position is preceded by a = followed by one of these:
    • A double quoted string of max 1000 in length OR
    • 1 to 1000 non unicode letters
  • (?>foo|bar) is called an Atomic Group
  • (?=\\p{L}) is a positive lookahead that makes sure there is a unicode letter following current position
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.