Python - Parse SQL Query to return only fields names using regular expression

Question

I'm trying to parse an SQL statement (stored in a file) and only save the name of the fields so for the following:

SELECT
Involved_Party_ID=pt.ProgramId,
Involved_Party_Type=2, 
First_Name=REPLACE(REPLACE(REPLACE(pt.ProgramName, '|', '/'), CHAR(10), ' '), CHAR(13), ' '),
Registration_Status=pt.RegistrationStatus,
Involved_Party_Status=CONVERT(SMALLINT, pt.ProgramStatus), 
Creation_Date=pt.CreationDate,
Incorporation_Country=CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry='UK' THEN 'GB' ELSE pd.IncorporationCountry END),
FROM
    SomeTable AS pt
GO

The desired output would be:

Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country

Here's my code:

import re

File1 = open("File1 .sql", "r")
File2 = open("File2 .sql", "w")
    for line in File1:
        if re.match('\s*SELECT\s*', line):
            continue
        if re.match('\s*FROM\s*', line):
            break
        if re.match('(\n).*?(?=(=))', line):
            Field_Name = re.search('(\n).*?(?=(=))', line)
            File2 .write(Field_Name.group(0) + '\n')

I first tried using this regular expression:

'.*?(?=(=))'

But then my result came out as:

Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country
CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry

Now that I've added (\n) to my regular expression the file returns completely empty although online regular-expression testing sites return the desired outcome. (I'm not concerned about the whitespaces matching the regexp as I'm only retrieving the first result per line)

Your line only ends with \n, they do not start with \n. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Sep 19, 2022 at 12:47
CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry= matches your regex (without the \n) — Nick
– Nick, Commented Sep 19, 2022 at 12:49
You could try anchoring to beginning of line instead and only searching for non-= characters e.g. ^[^=]* — Nick
– Nick, Commented Sep 19, 2022 at 12:50

Wiktor Stribiżew · Accepted Answer · 2022-09-19 12:55:58Z

1

Judging by the patterns you use with re.match, you can do without regex here. Just skip the line that starts with SELECT, stop matching at the line starting with FROM and collect the parts of lines between them before the first =:

File1 = open("File1 .sql", "r")
File2 = open("File2 .sql", "w")
for line in File1:
    if line.strip().startswith("SELECT"):
        continue
    elif line.strip().startswith("FROM"):
        break
    else:
        result = line.strip().split("=", 1)[0]
        File2.write(result + '\n')

The output I get is

Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country

See this Python demo.

answered Sep 19, 2022 at 12:55

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dina Kleper Over a year ago

This did the trick. Thank you.

Andrej Kesely · Accepted Answer · 2022-09-19 12:51:03Z

0

Try (regex101):

s = """\
SELECT
Involved_Party_ID=pt.ProgramId,
Involved_Party_Type=2, 
First_Name=REPLACE(REPLACE(REPLACE(pt.ProgramName, '|', '/'), CHAR(10), ' '), CHAR(13), ' '),
Registration_Status=pt.RegistrationStatus,
Involved_Party_Status=CONVERT(SMALLINT, pt.ProgramStatus), 
Creation_Date=pt.CreationDate,
Incorporation_Country=CONVERT(VARCHAR(50), CASE WHEN pd.IncorporationCountry='UK' THEN 'GB' ELSE pd.IncorporationCountry END),
FROM
    SomeTable AS pt
GO"""

for v in re.findall(r"^([^=\s]+)=", s, flags=re.M):
    print(v)

Prints:

Involved_Party_ID
Involved_Party_Type
First_Name
Registration_Status
Involved_Party_Status
Creation_Date
Incorporation_Country

answered Sep 19, 2022 at 12:51

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Collectives™ on Stack Overflow

Python - Parse SQL Query to return only fields names using regular expression

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related