0

I've got a bit of a problem with coming up with the correct regex. I had to create a regex for the following text: Feld1 = 1134 2000 0101 0202 0303

Name1 = Ein Kleiner Namens Test

Daten1 = 2200220

VWZ =

Name2. =

Daten2 = 1100110

The regex has to find all keys and appropriate values and store them in the matches. So far so good. ([\s]+(?[^\s]+)[\s]+=[\s]+(?[^\r\n]+)) did a very nice job there. With one exception: If a value is empty it doesnt recognize it and thinks the key+value of the next line is the value it should assign to the key.

enter image description here

I experimented and found some regex that would rectify this problem BUT it would put the space after the = also into the regex. TRy as I might I'm not finding a regex that solves both situations so that I just get the correct key, value pairs: enter image description here

The question would be what am I doing wrong and how do I need to modify the regex to achieve my goal?

4
  • Why do you think you need a regular exception for that, and why only one regular exception? The Regular exception engine could work as a parser and can be used in parsing, but this is not a parser. Maybe you could greatly simplify the problem if you look at it rationally. Besides, you did not really describe the set of valid input data. How many blank spaces are allowed? How should the lines be separated? and so on... Commented Mar 12 at 20:19
  • @SergeyAKryukov sry didnt notice that the screenshot didn't have it. lines are separated by \r\n the number of spaces before and after a = is 1 each (but as can be seen with vwz the before can have more spaces than only 1 the after always only 1. I didn't see much of a simpler way than a regex tbh. Commented Mar 12 at 21:17
  • 1
    At least parse into separate lines, and parse each line parse using the same function. Even if you perfectly match the entire input with a regular expression, you will have trouble finding the keys and values in all those groups. With a single regular expression, you lose the reuse. Commented Mar 12 at 22:57
  • Key value pairs. "For each" line split on "="; token[0] is the key; token[1], if any, is the value. Commented Mar 13 at 4:50

1 Answer 1

0

The problem is that \s matches a newline as well, so your [\s]+ after = also consumes a newline, leaving the next line as the value.

You can instead use =[^\n\S]* to consume non-newline space characters after =:

^\s*(?<Key>\S+)\s*=[^\n\S]*(?<Value>.*)

Demo: https://regex101.com/r/5l0yNA/1

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.