I have a database that contains two tables in the data, cdr and mtr. I want a join of the two based on columns ego_id and alter_id, and I want to output this into another table in the same database, complete with the column names, without the use of pandas.
Here's my current code:
mtr_table = Table('mtr', MetaData(), autoload=True, autoload_with=engine)
print(mtr_table.columns.keys())
cdr_table = Table('cdr', MetaData(), autoload=True, autoload_with=engine)
print(cdr_table.columns.keys())
query = db.select([cdr_table])
query = query.select_from(mtr_table.join(cdr_table,
((mtr_table.columns.ego_id == cdr_table.columns.ego_id) &
(mtr_table.columns.alter_id == cdr_table.columns.alter_id))),
)
results = connection.execute(query).fetchmany()
Currently, for my test code, what I do is to convert the results as a pandas dataframe and then put it back in the original SQL database:
df = pd.DataFrame(results, columns=results[0].keys())
df.to_sql(...)
but I have two problems:
- loading everything into a pandas dataframe would require too much memory when I start working with the full database
- the columns names are (apparently) not included in
resultsand would need to be accessed byresults[0].keys()
I've checked this other stackoverflow question but it uses the ORM framework of sqlalchemy, which I unfortunately don't understand. If there's a simpler way to do this (like pandas' to_sql), I think this would be easier.
What's the easiest way to go about this?