1

I'm trying to extract all the table names in a query using sqlparse but I'm having trouble with subqueries or insert statements wrapped in parenthesis.

I have the following query,

import sqlparse 
    
sql = """ 
          create test.test_table as (
                            select 1
                            from fake.table
                        );
        """
    

When I look for the token that included the statement in the parenthesis using

y = sqlparse.parse(sql)

for i in y[0].tokens:
    if isinstance(i, Identifier):
        print(i)
        print(i.get_real_name())

I get the following results,

test.test_table as (
                            select 1
                            from fake.table
                        )
test_table

The results come back as just one identifier token. When I attempt to get the table name from inside the parenthesis all the comes back is test.test_table. What I'm ultimately trying to do is extract both table names test.test_table and fake.table

Does anyone have any insight on how I can go about doing this?

1 Answer 1

1

This may help, I've been working with Select SQL statements that contain subqueries in them, so usually the format has been:

Select blah from (select blah from table_name) alias

so firstly I ignore the first Select statement, and look for tokens that contain the word Select:

for item in parsed.tokens:

    if 'SELECT' in identifier.value.upper():
        subquery = identifier.value

The subquery will return

(select blah from table_name) alias

I then have a separate function which removes the outermost parentheses and the alias to give just the subquery script:

def subquery_parsing(subquery, full_tables, tables, alias):
    #print(subquery)
    #new subquery string ready to parse
    res_sub = """"""

    #captures the alias outside the parantheses
    alias = """"""
    
    #record the number of parentheses as they open and close
    paren_cnt = 0


    for char in subquery:
        #if ( and there's already been a ( , include it
        if char == '(' and paren_cnt > 0:
            res_sub += char
    
        #if (, add to the count
        if char == '(':
            paren_cnt += 1
   
        # if ) and there's at least 2 (, include it
        if char == ')' and paren_cnt > 1:
            res_sub += char
          
        # if ), subtract from the count        
        if char == ')':
            paren_cnt -= 1
    
        # capture the script
        if char != '(' and char != ')' and paren_cnt >0:
            res_sub += char
    
        # capture the alias
        if char != '(' and char != ')'  and char != ' ' and paren_cnt == 0:
            alias += char

this returns

select blah from table_name

Which you should then be able to run through the sqlparse.parse again and get the table name.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.