3

I am trying to replace a certain part of a match that a regex found. The relevant strings have the following format:

"<Random text>[Text1;Text2;....;TextN]<Random text>"

So basically there can be N Texts seperated by a ";" inside the brackets. My goal is to change the ";" into a "," (but only for the strings which are in this format) so that I can keep the ";" as a seperator for a CSV file. So the result should be:

"<Random text>[Text1,Text2,...,TextN]<Random text>"

I can match the relevant strings with something like

re.compile(r'\[".*?((;).*?){1,4}"\]')

but if I try to use the sub method it replaces the whole string.

I have searched stackoverflow and I am pretty sure that "capture groups" might be the solution but I am not really getting there. Can anyone help me?

I ONLY want to change the ";" in the ["Text1;...;TextN"]-parts of my text file.

8
  • ["Text1;Text2;....;TextN"] is a string or a list? Commented Jan 21, 2020 at 9:14
  • It is a string! Commented Jan 21, 2020 at 9:15
  • Why is str.replace(";", ",") not valid for this? Commented Jan 21, 2020 at 9:17
  • 1
    What are the constraints on N ? Commented Jan 21, 2020 at 9:21
  • 1
    Text1, Text2, ... Text N are placeholders. So the string can e.g. look like ["Hello;Bye;This123"]. I know where its starts due to [" and where it ends due to "]. Commented Jan 21, 2020 at 9:32

3 Answers 3

6

Try this regex:

;(?=(?:(?!\[).)*])

Replace each match with a ,

Click for Demo

Explanation:

  • ; - matches a ;
  • (?=(?:(?!\[).)*]) - makes sure that the above ; is followed by a closing ] somewhere later in the string but before any opening bracket [
    • (?=....) - positive lookahead
    • (?:(?!\[).)* - 0+ occurrences of any character which does not start with [
    • ] - matches a ]
Sign up to request clarification or add additional context in comments.

Comments

2

If you want to match a ; before a closing ] and not matching [ in between you could use:

;(?=[^[]*])
  • ; Match literally
  • (?= Positive lookahead, assert what is on the right is
    • [^[]* Negated character class, match 0+ times any char except [
  • ] Match literally
  • ) Close lookahead

Regex demo

Note that this will also match if there is no leading [


If you also want to make sure that there is a leading [ you could make use of the PyPi regex module and use \G and \K to match a single ;

(?:\[(?=[^[\]]*])|\G(?!^))[^;[\]]*\K;

Regex demo | Python demo

import regex

pattern = r"(?:\[(?=[^[\]]*])|\G(?!^))[^;[\]]*\K;"
test_str = ("[\"Text1;Text2;....;TextN\"];asjkdjksd;ajksdjksad[\"Text1;Text2;....;TextN\"]\n\n"
    ".[\"Text1;Text2\"]...long text...[\"Text1;Text2;Text3\"]....long text...[\"Text1;...;TextN\"]...long text...\n\n"
    "I ONLY want to change the \";\" in the [\"Text1;...;TextN\"]")

result = regex.sub(pattern, ",", test_str)
print (result)

Output

["Text1,Text2,....,TextN"];asjkdjksd;ajksdjksad["Text1,Text2,....,TextN"]

.["Text1,Text2"]...long text...["Text1,Text2,Text3"]....long text...["Text1,...,TextN"]...long text...

I ONLY want to change the ";" in the ["Text1,...,TextN"]

Comments

1

You can try this code sample:

import re
x = 'anbhb["Text1;Text2;...;TextN"]nbgbyhuyg["Text1;Text2;...;TextN"][]nhj,kji,'
for i in range(len(x)):
    if x[i] == '[' and x[i + 1] == '"':
        while x[i+2] != '"':
            list1 = list(x)
            if x[i] == ';':
                list1[i] = ','
                x = ''.join(list1)

            i = i + 1

print(x)

1 Comment

If I use str.replace(";",",") it will change every ";" in my text-file. I only want to replace the ";" in the parts which have the following format: ["Text1;Text2;Text3;..;TextN"]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.