2

I'm looking to find and modify some sql syntax around the convert function. I want basically any convert(A,B) or CONVERT(A,B) in all my files to be selected and converted to B::A.

So far I tried selecting them with re.findall(r"\bconvert\b\(.*?,.*\)", l, re.IGNORECASE) But it's only returning a small selection out of what I want and I also have trouble actually manipulating the A/B I mentioned.

For example, a sample line (note the nested structure here is irrelevant, I'm only getting the outer layer working if possible)

convert(varchar, '/' || convert(nvarchar, es.Item_ID) || ':' || convert(nvarchar, o.Option_Number) || '/') as LocPath

...should become...

'/' || es.Item_ID::nvarchar || ':' || o.Option_Number::nvarchar || '/' :: varchar as LocPath

Example2:

SELECT LocationID AS ItemId, convert(bigint, -1),

...should become...

SELECT LocationID AS ItemId, -1::bigint,

I think this should be possible with some kind of re.sub with groups and currently have a code structure inside a for each loop where line is the each line in the file:

matchConvert = ["convert(", "CONVERT("]
a = next((a for a in matchConvert if a in line), False)
if a:
    print("convert() line")
    #line = re.sub(re.escape(a) + r'', '', line)

Edit: In the end I went with a non re solution and handled each line by identifying each block and manipulate them accordingly.

13
  • I see the convert can be nested, if so, regex will not work here. Commented Jul 19, 2022 at 21:10
  • I seem to have missed one space here: || ':'||. It should have been:|| ':' || Commented Jul 19, 2022 at 21:16
  • @Bharel: yes it can, if we need to handle up to (say) N=3 levels of nesting, we just process the line and apply the regex 3 times. Commented Jul 19, 2022 at 21:36
  • Related: Matching Nested Structures With Regular Expressions in Python Commented Jul 19, 2022 at 21:39
  • Yeah If I need to nest this I can simply run it multiple times. Right now I'm still just looking to get the outer layer working. Commented Jul 19, 2022 at 21:42

4 Answers 4

1

This may be an X/Y problem, meaning you’re asking how to do something with Regex that may be better solved with parsing (meaning using/modifying/writing a SQL parser). An indication that this is the case is the fact that “convert” calls can be nested. I’m guessing Regex is going to be more of a headache than it’s worth here in the long run if you’re working with a lot of files and they’re at all complicated.

Sign up to request clarification or add additional context in comments.

1 Comment

Yeah I'm not expecting a solution to cover all potential cases, and regex may have limitations given a lot of variation between the parameters here. I'll be testing with a different strategy and see if that works easier.
1

The task:

Swap the parameters of all the 'convert' functions in this given. Parameters can contain any character, including nested 'convert' functions.

A solution:

def convert_py(s):
    #capturing start:
    left=s.index('convert')
    start=s[:left]
    #capturing part_1:
    c=0
    line=''
    for n1,i in enumerate(s[left+8:],start=len(start)+8):
        if i==',' and c==0:
            part_1=line
            break
        if i==')':
            c-=1
        if i=='(':
            c+=1
        line+=i
    #capturing part_2:
    c=0
    line=''
    for n2,i in enumerate(s[n1+1:],start=n1+1):
        if i==')':
            c-=1
        if i=='(':
            c+=1
        if c<0:
            part_2=line
            break
        line+=i
    #capturing end:
    end=s[n2+1:]
    #capturing result:
    result=start+part_2.lstrip()+' :: '+part_1+end
    return result

def multi_convert_py(s):
    converts=s.count('convert')
    for n in range(converts):
        s=convert_py(s)
    return s

Notes:

  • Unlike the solution based on the re module, which is presented in another answer - this version should not fail if there are more than two parameters in the 'convert' function in the given string. However, it will swap them only once, for example: convert(a,b, c) --> b, c : a
  • I am afraid that unforeseen cases may arise that will lead to failure. Please tell if you find any flaws

Comments

1

The task:

Swap the parameters of all the 'convert' functions in the given string. Parameters can contain any character, including nested 'convert' functions.

A solution based on the re module:

def convert_re(s):
    import re
    start,part_1,part_2,end=re.search(r'''
                               (.*?)   
                               convert\(
                               ([^,)(]+\(.+?\)[^,)(]*|[^,)(]+)
                               ,
                               ([^,)(]+\(.+?\)[^,)(]*|[^,)(]+)
                               \)
                               (.*)                                     
                                       ''',s,re.X).groups()


    result=start+part_2.lstrip()+' :: '+part_1+end
    return result

def multi_convert_re(s):
    converts=s.count('convert')
    for n in range(converts):
        s=convert_re(s)
    return s

Discription of the 'convert_re' function:

Regular expression:

  • start is the first group with what comes before 'convert'

  • Then follows convert\() which has no group and contains the name of the function and the opening '('

  • part_1 is the second group ([^,)(]+\(.+?\)[^,)(]*|[^,)(]+). This should match the first parameter. It can be anything except - ,)(, or a function preceded by anything except ,)(, optionally followed by anything except ,)( and with anything inside (except a new line)

  • Then follows a comma ,, which has no group

  • part_2 is the third group and it acts like the second, but should catch everything what's left inside the external function

  • Then follows ), which has no group

  • end is the fourth group (.*) with what's left before the new line.

The resulting string is then created by swapping part_1 and part_2, putting ' :: ' between them, removing spaces on the left from part_2 and adding start to the beginning and end to the end.

Description of the 'multi_convert_re' function

Repeatedly calls 'convert_re' function until there are no "convert" left.

Notes:

  • N.B.: The code implies that the 'convert' function in the string has exactly two parameters.
  • The code works on the given examples, but I'm afraid there may still be unforeseen flaws when it comes to other examples. Please tell, if you find any flaws.
  • I have provided another solution presented in another answer that is not based on the re module. It may turn out that the results will be different.

18 Comments

wow that's pretty impressive I appreciate it. There are other cases SELECT LocationID AS ItemId, convert(bigint, -1),, here's another one ` WHERE le.LocPath not like '%/' || convert(nvarchar, es.Item_ID) || ':%' ` which isn't picked up by this yet. Also the result should be without the start+ in the beginning.
It looks like the other cases not working is just having other text before the function name (e.g. textext convert(A,B) texttext), I'll add another group before start to cover for that and see if that works. before,start,mid_1,mid_2,end=re.search(r'''(.*)(\w+?\()(.+)(?<=\)),(.+)(\).*)''',string,re.X).groups() result=before+mid_2.lstrip()+':: '+mid_1+end It looks like that's not quite working still.
Make the first group 'before' non-greedy, add '?' after '*' : (.*?)
Looks like that's still not picking up the new case, but that does fix the issue I mentioned about breaking the original case.
Also, this regular expression will match only if the first parameter is a function and in a string ends with ')'. This is a rather narrow search... If there may be other cases where the first parameter is not a function, there should be another regular expression
|
0

Here's my solution based on @Иван-Балван's code. Breaking this structure into blocks makes further specification a lot easier than I previously thought and I'll be using this method for a lot of other operations as well.

# Check for balanced brackets
def checkBracket(my_string):
    count = 0
    for c in my_string:
        if c == "(":
            count+=1
        elif c == ")":
            count-=1
    return count


# Modify the first convert in line
# Based on suggestions from stackoverflow.com/questions/73040953
def modifyConvert(l):
    # find the location of convert()
    count = l.index('convert(')

    # select the group before convert() call
    before = l[:count]

    group=""
    n1=0
    n2=0
    A=""
    B=""
    operate = False
    operators = ["|", "<", ">", "="]
    # look for A group before comma
    for n1, i in enumerate(l[count+8:], start=len(before)+8):
        # find current position in l
        checkIndex = checkBracket(l[count+8:][:n1-len(before)-8])
        if i == ',' and checkIndex == 0:
            A = group
            break
        group += i

    # look for B group after comma
    group = ""
    for n2, i in enumerate(l[n1+1:], start=n1+1):
        checkIndex = checkBracket(l[count+n1-len(before):][:n2-n1+1])
        if i == ',' and checkIndex == 0:
            return l
        elif checkIndex < 0:
            B = group
            break
        group += i
        
        # mark operators
        if i in operators:
            operate = True

    # select the group after convert() call
    after = l[n2+1:]

    # (B) if it contains operators
    if operate:
        return before + "(" + B.lstrip() + ') :: ' + A + after
    else:
        return before + B.lstrip() + '::' + A + after


# Modify cast syntax with convert(a,b). return line.
def convertCast(l):

    # Call helper for nested cases
    i = l.count('convert(')
    while i>0:
        i -= 1
        l = modifyConvert(l)

    return l

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.