I want to merge an excel file with sql in pandas, here's my code
import pandas as pd
import pymysql
from sqlalchemy import create_engine
data1 = pd.read_excel('data.xlsx')
engine = create_engine('...cloudprovider.com/...')
data2 = pd.read_sql_query("select id, column3, column4 from customer", engine)
data = data1.merge(data2, on='id', how='left')
It works, just to make it clearer
If input data1.columns the output Index(['id', 'column1', 'column2'], dtype='object')
If input data2.columns the output Index(['id', 'column3', 'column4'], dtype='object')
If input data.columns the output Index(['id', 'column1', 'column2', 'column3', 'column4'], dtype='object')
Since the data2 getting bigger, I can't query entirely, so I want to query data2 with id that exist on data1. How suppose I do this?
data1['id']is large (thousands of ids), you might find the solution from here stackoverflow.com/questions/48392311/… efficient. It'll allow you to use proper joins, in addition to IN operator queries.