6

I want to replace dashes which appear between letters with a space using regex. For example to replace ab-cd with ab cd

The following matches the character-character sequence, however also replaces the characters [i.e. ab-cd results in a d, rather than ab cd as i desire]

 new_term = re.sub(r"[A-z]\-[A-z]", " ", original_term)

How i adapt the above to only replace the - part?

2
  • Can you do this by simple replacing - with a space in the given string? Is using regex necessary? Commented Oct 16, 2015 at 22:11
  • 1
    @JeffBridgman yes - i only want to replace when the dash occurs between characters, and not when between space. i.e. to replace ab-cd, but not to change ab - cd - [replace doesn't have that control]. Commented Oct 16, 2015 at 22:13

5 Answers 5

7

You need to capture the characters before and after the - to a group and use them for replacement, i.e.:

import re
subject = "ab-cd"
subject = re.sub(r"([a-z])\-([a-z])", r"\1 \2", subject , 0, re.IGNORECASE)
print subject
#ab cd

DEMO

http://ideone.com/LAYQWT


REGEX EXPLANATION

([A-z])\-([A-z])

Match the regex below and capture its match into backreference number 1 «([A-z])»
   Match a single character in the range between “A” and “z” «[A-z]»
Match the character “-” literally «\-»
Match the regex below and capture its match into backreference number 2 «([A-z])»
   Match a single character in the range between “A” and “z” «[A-z]»

\1 \2

Insert the text that was last matched by capturing group number 1 «\1»
Insert the character “ ” literally « »
Insert the text that was last matched by capturing group number 2 «\2»
Sign up to request clarification or add additional context in comments.

Comments

5

Use references to capturing groups:

>>> original_term = 'ab-cd'
>>> re.sub(r"([A-z])\-([A-z])", r"\1 \2", original_term)
'ab cd'

This assumes, of course, that you can't just do original_term.replace('-', ' ') for whatever reason. Perhaps your text uses hyphens where it should use en dashes or something.

1 Comment

You shouldn't use [A-z] since regex ranges uses ascii table index. For this specific range you will match A-Z[\]^_`a-z. However, you can use (?i) if you want to use a-z as key insensitive. For instance, you can have (?i)([a-z])\-([a-z]). Anyway, I know OP original regex is that... but just saying.
1

re.sub() always replaces the whole matched sequence with the replacement.

A solution to only replace the dash are lookahead and lookbehind assertions. They don't count to the matched sequence.

new_term = re.sub(r"(?<=[A-z])\-(?=[A-z])", " ", original_term)

The syntax is explained in the Python documentation for the re module.

Comments

1

You need to use look-arounds:

 new_term = re.sub(r"(?<=[A-Za-z])-(?=[A-Za-z])", " ", original_term)

Or capturing groups:

 new_term = re.sub(r"([A-Za-z])-(?=[A-Za-z])", r"\1 ", original_term)

See IDEONE demo

Note that [A-z] also matches some non-letters (namely [, \, ], ^, _, and `), thus, I suggest replacing it with [A-Z] and use a case-insensitive modifier (?i).

Note that you do not have to escape a hyphen outside a character class.

Comments

-1

I think there's a simple way to replace dashes using Visual Basic, in a multiline textbox:

Regex.Replace(ReadText.Text, "[-]", " ")

1 Comment

This may be accurate, but it's not Python, which is what this question is about.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.