0

I need to update a number sql queries to point to a new set of tables. I am having a hard time finding the word ends, i.e. the end of the table names, with python's re. The white space character and the end of the table names can be a newline or a space. The script is not finding matches for the regex.sub() method. I have tried using "\s" instead of "\b". This does find the table names well but then when I replace the string I am losing the original white space character. What am I doing wrong?

import os
import re

scan_dir = r"xxxx"
out_dir = r"xxxx"

process_tables = {"MM_WMS_WORK_LOCATION": "DESIGNER_MM_WMS_WORK_LOCATION",
                  "MM_WMS_DESIGN": "DESIGNER_MM_WMS_DESIGN",
                  "MM_WMS_WORK_REQUEST": "DESIGNER_MM_WMS_WORK_REQUEST",
                  "MM_WMS_COMPATIBLE_UNIT": "DESIGNER_MM_WMS_COMP_UNIT",
                  "MM_WMS_DN_COST_FACTOR": "DESIGNER_MM_WMS_DN_COST_FACTOR",
                  "MM_WMS_CU_COST_FACTOR": "DESIGNER_MM_WMS_CU_COST_FACTOR",
                  "MM_WMS_WR_USERS": "DESIGNER_MM_WMS_WR_USERS",
                  "MM_PX_CURRENT_STATE": "DESIGNER_MM_PX_CURRENT_STATE"}
sql_list = os.listdir(scan_dir)

for sql in sql_list:
    if sql[-4:] == ".sql":
        with open(os.path.join(scan_dir, sql), "r") as sql_in:
            sql_out = sql_in.read()
            for old_name, new_name in process_tables.items():
                regex = re.compile(old_name + "\b", re.IGNORECASE)
                sql_out = regex.sub(new_name.lower(), sql_out)
        with open(os.path.join(out_dir, sql[:-4] + " Designer Test.sql"), "w") as file_out:
            file_out.write(sql_out)
2
  • It might help to describes what's going wrong with your script. Are you getting some error? What's the behavior of your script? What's the expected behavior? Commented May 14, 2018 at 19:47
  • Thanks for the input. I have edited the post. Commented May 14, 2018 at 20:00

1 Answer 1

1

Python string literals use the backslash as an escape character. The string literal "\b" will contain the backspace character (ASCII 0x8), just like "\n" will contain a newline. It will not contain "backslash b", as you would need for the regex to work.

To get "backslash b" you need to escape the backslash in the string literal, or use a raw string, which is "\\b" or r"\b", respectively.

regex = re.compile(old_name + r"\b", re.IGNORECASE)

Commonly, raw strings are used for regex in Python, so you don't need to double every backslash.

Read https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals


Side note: "\s" works because - in contrast to "\b" or "\n" - it means nothing special in a Python string literal, so it actually becomes "backslash s" for the regex.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.