0

I have a regex capture, and I would like to exclude a character (a space, in this particular case) from the middle of the captured string. Can this be done in one step, by modifying the regex?

(Quick and dirty) example:

Text: Key name = value
My regex: (.*) = (.*)
Output: \1 = "Key name" and \2 = "value"
Desired output: \1 = "Keyname" and \2 = "value"

Update: I'm not sure what regex engine will run this regex, since it's part of a larger software product. If you have a solution, please specify which engines it will run on, and on which it will not.

Update2: The aforementioned product takes a regex as an input, and then uses the matched values further, which is the reason for which a one-step solution is asked for. There is no opportunity to insert an intermediate processing step in the pipeline.

10
  • Can't you replace that after the match? Commented Dec 15, 2015 at 10:05
  • 1
    What is the language? It is difficult to render appropriate help without knowing the programming language the regex will be used in. As the regex tag info states, all questions with this tag should also include a tag specifying the applicable programming language or tool. Commented Dec 15, 2015 at 10:05
  • @stribizhev, the full solution may depend on the language, but the answer to the question doesn't. You can't do that in a single regex match in any regex flavor. You have to match the whole thing and remove the spaces afterward. Commented Dec 15, 2015 at 10:42
  • @AlanMoore: Why do you address me? I know that. Commented Dec 15, 2015 at 10:43
  • @stribizhev: On reflection, I realize that was a canned comment that you posted simply because there's no "flavor" tag. It's good general advice, but you should make it clear that it is general advice. Because you seem to be implying that it's relevant in this case, when it isn't. Commented Dec 15, 2015 at 11:37

2 Answers 2

0

This is a possible theoretical pure-regex implementation using the end-of-previous-match \G anchor:

/(?:\G(\w+)\h(?:(?:=\h)(\w+))?)+/g

Online demo

Legenda

(?:           # Non capturing group 1
  \G          # Matches where the regex engine stops in the previous step
  (\w+)       # capture group 1: a regex word of 1+ chars
  \h*         # zero or more horizontal spaces (space, tabs)
  (?:         # Non capturing group 2
    =\h*      # literal '=' follower by zero or more hspaces
    (\w+)     # capture group 2: a regex word of 1+ chars
  )?          # make the non capturing group 2 optional
)+            # repeat the non capturing group 1, one or more

In the substitution section of the demo:

  • \1 actually contains Keyname (the 2 terms are separated by a fake space)
  • \2 is value

NOTE: i don't recommend using this unless actually needed (why?).

There are multiple possible approaches in 2 steps: as surely already stated simply strip spaces from the first capturing group of the OP regex.

Sign up to request clarification or add additional context in comments.

7 Comments

The question states clearly that a one-step solution is needed. Is there any particular reason behind you saying that you don't recommend your solution?
Your solution doesn't produce the expected result, and I don't understand it enough to modify it myself. Could you edit it (if it's possible) to satisfy the requirements in the question?
\G works fine by itself, there's no need to wrap it in a lookbehind.
@GsusRecovery I'm not sure why the motives have to be explained for a question to be better received, but since you ask, it's because I do not have the opportunity to run a second step. As stated in the question, this regex will be used in a software product, and the product takes a regex as an input, and then uses the matched values further.
@GsusRecovery yeah, I hoped that I will get this result by stating that I need a one-step solution. :)
|
-1

I would come up with sth. like:

(?<key>[\w]+)\s*=\s*(?<value>.+)
# look for a word character and capture it in a group called "key"
# followed by zero or unlimited times of a whitespace character (\s)
# followed by an equation sign
# followed by zero or unlimited times of a whitespace character (\s)
# capture the rest in a group called value

... and process the captured output afterwards. But with the \w character class no whitespace will matched (do you have keys with a whitespace in it?).
See a working demo here. But as mentionned in the comments, it depends on your programming language.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.