0

I am new to regex. I have read various tutorials, still I have failed to run my simple codes.

My files are organized such as "c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4", ... "c1c2c4_bb_41", "c1c8c9_cc_58", "c1c3c11_aa_19"

I want to find all those ones that includes "aa" (such as "c1c2c3_aa_3") and convert them to "c1c2c4_zz_3"

So I want the last number and the first string before "_" remains fixed, but change the "aa" in the middle.

"c1", "c2", "c3" are some conditions. Also, the last numbers are quite random, so I do not know them to define them.

I am interested in using regex.

I tried this:

con_list1 = ["c1", "c2", ... "c8"]
con_list2 = ["c1", "c2", ... "c11"]
con_list3 = ["c1", "c2", ... "c10"]

for con1 in con_list1:
    for con2 in con_list2:
        for con3 in con_list3:
            if(os.path.exists("./" + con1 + con2 + con3 + "_aa(.*)")):
                os.rename("./" + con1 + con2 + con3 + "_aa(.*)", "./" + con1 + con2 + con3 + "_zz(.*)")

I want the last number corresponding to the file that I rename remains fixed:

"c1c2c3_aa_3" -> "c1c2c3_zz_3" "c1c2c3_aa_13" -> "c1c2c3_zz_13"

I am also interested in using regex and (.*) in the right way.

However, the above code seems not working.

I appreciate to help to implement this code.

2
  • Unrelated - but if you want a fun way to train your regex-fu: try regexcrossword.com Commented Nov 2, 2022 at 8:25
  • Did any solution finally help? Commented Nov 5, 2022 at 20:52

4 Answers 4

1

If you have a list like con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"] you may try something like:

import re


con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"]

regex = "_aa_"
subst = "_zz_"

for test_str in con_list1:
    result = re.sub(regex, subst, test_str, 1)

but the most simple way is:

con_list1 = ["c1c2c4_aa_1", "c1c2c3_aa_2", "c1c2c8_aa_3", "c1c3c4_aa_4"]
for test_str in con_list1:
    test_str .replace('_aa_', '_zz_')
Sign up to request clarification or add additional context in comments.

Comments

0

Try this to find all names: "[a-z0-9]+_aa_[0-9]+"

names = re.findall(r'\"[a-z0-9]+\_aa\_[0-9]+\"', files_names_list.text, flags=re.I))

files_names_list is a list, where you have all your file names

Hope I understand you correctly

Comments

0

Assuming the files to rename exist in the current directory, would you please try the following:

import os, re
for f in os.listdir('.'):
    m = re.match(r'((?:c\d{1,2}){3})_aa_(\d{1,2})$', f)
    if m:
        newname = m.group(1) + '_zz_' + m.group(2)
        os.rename(f, newname)
  • ((?:c\d{1,2}){3}) matches three repetitions of the set of c + one or two digits.
  • (\d{1,2}) matches one or two digits.
  • As the regexes above are enclosed by parentheses, the matched substrings are captured by m.group(1) and m.group(2) individually.

Comments

0

You can use

import os, re

con_list1 = ["c1", "c2", "c3","c4","c5","c6","c7","c8"]
con_list2 = ["c1", "c2", "c3","c4","c5","c6","c7","c8", "c9","c10", "c11"]
con_list3 = ["c1", "c2", "c3","c4","c5","c6","c7","c8", "c9","c10"]
regex = re.compile(f'^((?:{"|".join(map(re.escape, con_list1))})(?:{"|".join(map(re.escape, con_list2))})(?:{"|".join(map(re.escape, con_list3))}))_aa_')

rootdir = "YOUR_ROOT_DIR"
for root, dirs, files in os.walk(rootdir):
    for file in files:
        if regex.search(file):
            os.rename(file, regex.sub(r'\g<1>_zz_', file))

Note: os.walk() searches in all subdirs recursively, if you do not need that behavior, see Non-recursive os.walk().

This is not the most efficient way to create a dynamic pattern (a regex TRIE would be better), but it shows a viable approach. The regex will look like

^((?:c1|c2|c3|c4|c5|c6|c7|c8)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10|c11)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10))_aa_

See the regex demo. Note that each item in your condition lists is re.escaped to make sure special chars do not prevent your file names from matching.

Details:

  • ^ - start of string
  • ((?:c1|c2|c3|c4|c5|c6|c7|c8)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10|c11)(?:c1|c2|c3|c4|c5|c6|c7|c8|c9|c10)) - Group 1 (\g<1> refers to this group value, if _zz_ is not a placeholder for text starting with a digit, you can even use \1 instead): a value from con_list1, then a value from con_list2 and then a value from con_list3
  • _aa_ - an _aa_ fixed string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.