Replace table names in sql with python and regex

Question

I need to update a number sql queries to point to a new set of tables. I am having a hard time finding the word ends, i.e. the end of the table names, with python's re. The white space character and the end of the table names can be a newline or a space. The script is not finding matches for the regex.sub() method. I have tried using "\s" instead of "\b". This does find the table names well but then when I replace the string I am losing the original white space character. What am I doing wrong?

import os
import re

scan_dir = r"xxxx"
out_dir = r"xxxx"

process_tables = {"MM_WMS_WORK_LOCATION": "DESIGNER_MM_WMS_WORK_LOCATION",
                  "MM_WMS_DESIGN": "DESIGNER_MM_WMS_DESIGN",
                  "MM_WMS_WORK_REQUEST": "DESIGNER_MM_WMS_WORK_REQUEST",
                  "MM_WMS_COMPATIBLE_UNIT": "DESIGNER_MM_WMS_COMP_UNIT",
                  "MM_WMS_DN_COST_FACTOR": "DESIGNER_MM_WMS_DN_COST_FACTOR",
                  "MM_WMS_CU_COST_FACTOR": "DESIGNER_MM_WMS_CU_COST_FACTOR",
                  "MM_WMS_WR_USERS": "DESIGNER_MM_WMS_WR_USERS",
                  "MM_PX_CURRENT_STATE": "DESIGNER_MM_PX_CURRENT_STATE"}
sql_list = os.listdir(scan_dir)

for sql in sql_list:
    if sql[-4:] == ".sql":
        with open(os.path.join(scan_dir, sql), "r") as sql_in:
            sql_out = sql_in.read()
            for old_name, new_name in process_tables.items():
                regex = re.compile(old_name + "\b", re.IGNORECASE)
                sql_out = regex.sub(new_name.lower(), sql_out)
        with open(os.path.join(out_dir, sql[:-4] + " Designer Test.sql"), "w") as file_out:
            file_out.write(sql_out)

It might help to describes what's going wrong with your script. Are you getting some error? What's the behavior of your script? What's the expected behavior? — enumaris
– enumaris, Commented May 14, 2018 at 19:47

Tomalak · Accepted Answer · 2018-05-14 20:39:15Z

1

Python string literals use the backslash as an escape character. The string literal "\b" will contain the backspace character (ASCII 0x8), just like "\n" will contain a newline. It will not contain "backslash b", as you would need for the regex to work.

To get "backslash b" you need to escape the backslash in the string literal, or use a raw string, which is "\\b" or r"\b", respectively.

regex = re.compile(old_name + r"\b", re.IGNORECASE)

Commonly, raw strings are used for regex in Python, so you don't need to double every backslash.

Read https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

Side note: "\s" works because - in contrast to "\b" or "\n" - it means nothing special in a Python string literal, so it actually becomes "backslash s" for the regex.

edited May 14, 2018 at 20:39

answered May 14, 2018 at 20:26

Tomalak

340k68 gold badges547 silver badges635 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replace table names in sql with python and regex

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related