4

I'm trying to achieve the following. I want to create a python Class that transforms all tables in a database to pandas dataframes.

This is how I do it, which is not very generic...

class sql2df():
    def __init__(self, db, password='123',host='127.0.0.1',user='root'):
        self.db = db

        mysql_cn= MySQLdb.connect(host=host,
                        port=3306,user=user, passwd=password, 
                        db=self.db)

        self.table1 = psql.frame_query('select * from table1', mysql_cn)
        self.table2 = psql.frame_query('select * from table2', mysql_cn)
        self.table3 = psql.frame_query('select * from table3', mysql_cn)

Now I can access all tables like so:

my_db = sql2df('mydb')
my_db.table1

I want something like:

class sql2df():
    def __init__(self, db, password='123',host='127.0.0.1',user='root'):
        self.db = db

        mysql_cn= MySQLdb.connect(host=host,
                        port=3306,user=user, passwd=password, 
                        db=self.db)
        tables = (""" SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_SCHEMA = '%s' """ % self.db)
        <some kind of iteration that gives back all the tables in df as class attributes>

Suggestions are most welcome...

3 Answers 3

5

I would use SQLAlchemy for this:

engine = sqlalchemy.create_engine("mysql+mysqldb://root:[email protected]/%s" % db)

Note the syntax is dialect+driver://username:password@host:port/database.

def db_to_frames_dict(engine):
    meta = sqlalchemy.MetaData()
    meta.reflect(bind=engine)
    tables = meta.sorted_tables
    return {t: pd.read_sql('SELECT * FROM %s' % t.name,
                           engine.raw_connection())
                   for t in tables}
    # Note: frame_query is depreciated in favor of read_sql

This returns a dictionary, but you could equally well have these as class attributes (e.g. by updating the class dict and __getitem__)

class SQLAsDataFrames:
    def __init__(self, engine):
        self.__dict__ = db_to_frames_dict(engine)  # allows .table_name access
    def __getitem__(self, key):                    # allows [table_name] access
        return self.__dict__[key]

In pandas 0.14 the sql code has been rewritten to take engines, and IIRC there is helpers for all tables and for reading all of a table (using read_sql(table_name)).

Sign up to request clarification or add additional context in comments.

17 Comments

Thanks! Didn't know about sqlalchemy! The problem I'm having still is how to 'dynamically' set a class attributes, eg what you refer to as 'updating the class dict'.
@bowlby for attributes: class A: def __init__(self, d): self.__dict__.update(d) where d is the dict above. Maybe def __getitem__(self, key): return self.__dict__[key] for good measure.
+Andy Hayden think you are missing the closing ) on the select
How do I call sql2df? what is the expected argument? eg sql2df(?) and then how to access the tables?
if i try t=[] sql2df(t) I get global name 'sqlalchmey' is not defined
|
1

Here is what I have now: Imports

 import sqlalchemy
 from sqlalchemy import create_engine
 from sqlalchemy import Table, Column,Date, Integer, String, MetaData, ForeignKey
 from sqlalchemy.ext.declarative import declarative_base
 from sqlalchemy.orm import relationship, backref
 import pandas as pd

engine = sqlalchemy.create_engine("mysql+mysqldb://root:[email protected]/%s" % 'surveytest')
def db_to_frames_dict(engine):
    meta = sqlalchemy.MetaData()
    meta.reflect(bind=engine)
    tables = meta.sorted_tables
    return {t: pd.read_sql('SELECT * FROM %s' % t.name, engine.connect())
               for t in tables}
# Note: frame_query is depreciated in favor of read_sql

Have not started to fiddle with this part yet! below:

class SQLAsDataFrames:
    def __init__(self, engine):
        self.__dict__ = db_to_frames_dict(engine)  # allows .table_name access
    def __getitem__(self, key):                    # allows [table_name] access
        return self.__dict__[key]

And the error: Which looks like at least it is trying to get the table names...

frames=db_to_frames_dict(engine)
frames

Error on sql SELECT * FROM tbl_original_survey_master
---------------------------------------------------------------------------
 AttributeError                            Traceback (most recent call last)
 <ipython-input-4-6b0006e1ce47> in <module>()
 ----> 1 frames=db_to_frames_dict(engine)
 >>>> more tracebck
 ---> 53             con.rollback()
 54         except Exception:  # pragma: no cover
 55             pass

 AttributeError: 'Connection' object has no attribute 'rollback'

Thank you for sticking with this!

1 Comment

as above, Connection object is apparently not a connection! stackoverflow.com/questions/20401392/… Apologies again for poor initial state of answer!
1

Thanks for all the help, this is what I ended up using:

import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy import Table, Column,Date, Integer, String, MetaData, ForeignKey
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, backref
import pandas as pd

engine = sqlalchemy.create_engine("mysql+mysqldb://root:[email protected]/%s" % 'mydb')

def db_to_frames_dict(engine):
    meta = sqlalchemy.MetaData()
    meta.reflect(engine)
    tables = meta.tables.keys()
    cnx = engine.raw_connection()

    return {t: pd.read_sql('SELECT * FROM %s' % t, cnx )
               for t in tables}

class SQLAsDataFrames:
    def __init__(self, engine):
        self.__dict__ = db_to_frames_dict(engine)  # allows .table_name access
    def __getitem__(self, key):                    # allows [table_name] access
        return self.__dict__[key]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.