How can I exclude a character from a regex capturing group? [duplicate]

Question

I have a regex capture, and I would like to exclude a character (a space, in this particular case) from the middle of the captured string. Can this be done in one step, by modifying the regex?

(Quick and dirty) example:

Text: Key name = value
My regex: (.*) = (.*)
Output: \1 = "Key name" and \2 = "value"
Desired output: \1 = "Keyname" and \2 = "value"

Update: I'm not sure what regex engine will run this regex, since it's part of a larger software product. If you have a solution, please specify which engines it will run on, and on which it will not.

Update2: The aforementioned product takes a regex as an input, and then uses the matched values further, which is the reason for which a one-step solution is asked for. There is no opportunity to insert an intermediate processing step in the pipeline.

What is the language? It is difficult to render appropriate help without knowing the programming language the regex will be used in. As the regex tag info states, all questions with this tag should also include a tag specifying the applicable programming language or tool. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Dec 15, 2015 at 10:05
@stribizhev, the full solution may depend on the language, but the answer to the question doesn't. You can't do that in a single regex match in any regex flavor. You have to match the whole thing and remove the spaces afterward. — Alan Moore
– Alan Moore, Commented Dec 15, 2015 at 10:42
@stribizhev: On reflection, I realize that was a canned comment that you posted simply because there's no "flavor" tag. It's good general advice, but you should make it clear that it is general advice. Because you seem to be implying that it's relevant in this case, when it isn't. — Alan Moore
– Alan Moore, Commented Dec 15, 2015 at 11:37

Giuseppe Ricupero · Accepted Answer · 2015-12-15 12:06:11Z

0

This is a possible theoretical pure-regex implementation using the end-of-previous-match \G anchor:

/(?:\G(\w+)\h(?:(?:=\h)(\w+))?)+/g

Online demo

Legenda

(?:           # Non capturing group 1
  \G          # Matches where the regex engine stops in the previous step
  (\w+)       # capture group 1: a regex word of 1+ chars
  \h*         # zero or more horizontal spaces (space, tabs)
  (?:         # Non capturing group 2
    =\h*      # literal '=' follower by zero or more hspaces
    (\w+)     # capture group 2: a regex word of 1+ chars
  )?          # make the non capturing group 2 optional
)+            # repeat the non capturing group 1, one or more

In the substitution section of the demo:

\1 actually contains Keyname (the 2 terms are separated by a fake space)
\2 is value

NOTE: i don't recommend using this unless actually needed (why?).

There are multiple possible approaches in 2 steps: as surely already stated simply strip spaces from the first capturing group of the OP regex.

edited Dec 15, 2015 at 12:06

answered Dec 15, 2015 at 11:14

Giuseppe Ricupero

6,3023 gold badges27 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Dan Nestor Over a year ago

The question states clearly that a one-step solution is needed. Is there any particular reason behind you saying that you don't recommend your solution?

Dan Nestor Over a year ago

Your solution doesn't produce the expected result, and I don't understand it enough to modify it myself. Could you edit it (if it's possible) to satisfy the requirements in the question?

Alan Moore Over a year ago

\G works fine by itself, there's no need to wrap it in a lookbehind.

Dan Nestor Over a year ago

@GsusRecovery I'm not sure why the motives have to be explained for a question to be better received, but since you ask, it's because I do not have the opportunity to run a second step. As stated in the question, this regex will be used in a software product, and the product takes a regex as an input, and then uses the matched values further.

Dan Nestor Over a year ago

@GsusRecovery yeah, I hoped that I will get this result by stating that I need a one-step solution. :)

|

Jan · Accepted Answer · 2015-12-15 10:46:48Z

-1

I would come up with sth. like:

(?<key>[\w]+)\s*=\s*(?<value>.+)
# look for a word character and capture it in a group called "key"
# followed by zero or unlimited times of a whitespace character (\s)
# followed by an equation sign
# followed by zero or unlimited times of a whitespace character (\s)
# capture the rest in a group called value

... and process the captured output afterwards. But with the \w character class no whitespace will matched (do you have keys with a whitespace in it?).
See a working demo here. But as mentionned in the comments, it depends on your programming language.

edited Dec 15, 2015 at 10:46

answered Dec 15, 2015 at 10:34

Jan

43.3k11 gold badges57 silver badges87 bronze badges

Collectives™ on Stack Overflow

How can I exclude a character from a regex capturing group? [duplicate]

2 Answers 2

7 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Linked

Related