More efficient way to make dynamic sql queries than iteration

Question

I have to make multiple sql queries of entire tables, and concatenate them into one big data table.

I have a dictionary where the key is a team name, and the value serves as an acronym where the acronym is the prefix to mySQL data tables

    engine = create_engine('mysql+mysqlconnector://%s:%s@%s/%s' % (mysql_user, mysql_password, mysql_host, mysql_dbname), echo=False, pool_recycle=1800)
    
    mysql_conn = engine.connect()
    
    team_dfs = []
    
    nba_dict = {'New York Knicks': 'nyk',
        'Boston Celtics': 'bos',
        'Golden State Warriors': 'gsw',
        'New York Knicks': 'nyk'}
    
        for name, abbr in nba_dict.items()
    
            query = f'''
            SELECT *
            from {abbr}_record
            '''
            df = pd.read_sql_query(query, mysql_conn)

            df['team_name'] = name
    
            team_dfs.append(df)
    
    team_dfs = pd.concat(team_dfs)

Is there a better way to refactor this code and make it more efficient?

Don't separate records per team if you are going to query them together. Avoid select *. Don't try to preemptively optimize your code around the data. Get the database with structured data and the performance and simple code will flow from that. — danblack
– danblack, Commented Dec 9, 2021 at 3:49
100% agree. I'm not the DB admin, and the goal of the above task is to create a gold layer table where all this data is combined in one place. — nick_rinaldi
– nick_rinaldi, Commented Dec 9, 2021 at 3:58
Since this is a one-time task to fix a broken schema, does efficiency matter? — Rick James
– Rick James, Commented Dec 10, 2021 at 6:12

O. Jones · Accepted Answer · 2021-12-09 11:12:14Z

Your database layout, with a separate table for each team, is doomed to inefficiency whenever you need to retrieve data for more than one team at a time. You would be much much better off putting all that data in one table, giving the table a column mentioning the team associated with each row.

Why inefficient? More tables: more work. And, more queries: more work.

I suggest you push back, hard, on the designer of this database table structure. Its design is, bluntly, wrong.

If you must live with this structure, I suggest you create the following view. It will fake the single-table approach and give you your "gold layer". You get away with this because pro sports franchises don't come and go that often. You do this just once in your database.

CREATE OR REPLACE VIEW teams_record AS
SELECT 'nyk' team, * FROM nyk_record
UNION ALL
SELECT 'bos' team, * FROM bos_record
UNION ALL
SELECT 'gsw' team, * FROM gsw_record
UNION ALL .... the other teams

Then you can do SELECT * FROM teams_record ORDER BY team to get all your data.

JayPeerachai · Accepted Answer · 2021-12-09 03:52:28Z

0

If your nba_dict is fixed, you can use UNION to manually combine the result table of SQL.

abbr = list(nba_dict.values())

query = f'''
SELECT *
from {abbr[0]}_record UNION
SELECT *
from {abbr[1]}_record UNION
SELECT *
from {abbr[2]}_record UNION
SELECT *
from {abbr[3]}_record UNION
'''
df = pd.read_sql_query(query, mysql_conn)
df['team_name'] = list(nba_dict.keys())

answered Dec 9, 2021 at 3:52

JayPeerachai

3,8523 gold badges19 silver badges34 bronze badges

4 Comments

nick_rinaldi Over a year ago

Is this a more efficient method, performance wise to go about this? Since in this case, you're making one big query, rather than 7 separate queries.

JayPeerachai Over a year ago

What are your definitions of efficiency and performance???

nick_rinaldi Over a year ago

My definition is speed

O. Jones Over a year ago

UNION deduplicates. UNION ALL doesn't. Deduplication is probably not the right way to go here.

Collectives™ on Stack Overflow

More efficient way to make dynamic sql queries than iteration

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related