Is there a way to retain the SqlAlchemy attribute names when you query the data into a pandas dataframe?
Here's a simple mapping of my database. For the school table, I've renamed 'SchoolDistrict', the DB name, to a shorter 'district'. I'm several layers removed from the DBA, so changing them in the source isn't feasible.
class School(Base):
__tablename__ = 'DimSchool'
id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
class StudentScore(Base):
__tablename__ = 'FactStudentScore'
SchoolKey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
PointsPossible = Column('PointsPossible', Integer)
PointsReceived = Column('PointsReceived', Integer)
school = relationship("School", backref='studentscore')
So when I query something like:
query = session.query(StudentScore, School).join(School)
df = pd.read_sql(query.statement, query.session.bind)
I end up with the underlying 'SchoolDistrict' name for the column, not my attribute name, in the returned DataFrame df.
EDIT: An even more annoying case is when duplicate column names exist across tables. For example:
class Teacher(Base):
__tablename__ = 'DimTeacher'
id = Column('TeacherKey', Integer, primary_key=True)
fname = Column('FirstName', String)
lname = Column('FirstName', String)
class Student(Base):
__tablename__ = 'DimStudent'
id = Column('StudentKey', Integer, primary_key=True)
fname = Column('FirstName', String)
lname = Column('FirstName', String)
So a query across both tables (like the one below) produces a dataframe with duplicate FirstName and LastName columns.
query = session.query(StudentScore, Student, Teacher).join(Student).join(Teacher)
Would it be possible to rename these columns at the moment of the query? Right now I'm having trouble keeping my head straight with these two systems of column names.
select aslike in this SO answer or you could just change the name of the column after reading the dataframe. Hope it helps.read_sqlis using the sql statement as provided bysqlalchemy, and it is going to have the wrong name unless you change the statement (sqlalchemymaps it afterwards, i.e. after the query is run, when processing the results). And in pandas there is no way to do a mapping like the one you propose; why there would be, if you can change the name of the column? Good luck anyway!sqlalchemyclasses. You know what I mean?