1

I need to read the .mdb file in python which is in azure blob storage and export dataframe as as csv, I am able to read the csv but i am not able to read the .mdb file. Is there any other method to do so, Please feel free to give suggestion other than python.

What i tried:

from azure.storage.blob import BlockBlobService
import pandas as pd
import tables

STORAGEACCOUNTNAME= <storage_account_name>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>

blob_service=BlockBlobService(account_name=STORAGEACCOUNTNAME,account_key=STORAGEACCOUNTKEY)
blob_service.get_blob_to_path(CONTAINERNAME,BLOBNAME,test.mdb)

# LOCALFILE is the file path
dataframe_blobdata = pd.read_csv(test.mdb)

2 Answers 2

1

The easiest way to do this is to install mdbtools (brew install mdbtools on MacOS) use them to convert the database to a sqlite database, which is much easier to use from Python via import sqlite3 -- and you won't run into the problem of trying to use pyodbc and realizing you don't have an MDB driver and then trying to go find one, pay for it, install it, and use it.

I had this problem and wrote a small script to wrap mdbtools and convert the mdb database to sqlite. The important piece is below:

def list_tables(filename):
    delimiter = ", "
    u = run_command(["mdb-tables", "-d", delimiter, filename])
    tables = u.split(delimiter)
    return [stripped for t in tables if (stripped := t.strip()) != ""]

def export_sqlite(dbname, tablenames, filename):
    print(f"creating {filename}")
    con = sqlite3.connect(filename)
    cur = con.cursor()

    # Populate it.
    create = "mdb-schema --indexes --relations --default-values --not-null".split(" ")
    for table in tablenames:
        print(f"creating table {table}")
        table_create = run_command(create + [dbname, "-T", table, "sqlite"])
        cur.execute(table_create)

        sql = run_command(["mdb-export", "-I", "sqlite", "-S", "1", dbname, table])
        for i, ins in enumerate(sql.split(";\n")):
            cur.execute(ins)

        print(f"inserted {i} records into {table}")
        con.commit()
    con.close()

def main(filename):
    tables = list_tables(filename)
    export_sqlite(filename, tables, filename + '.sqlite')

Sharing the full code here in case it helps someone.

Sign up to request clarification or add additional context in comments.

Comments

0

To read .mdb files from database it requires third party application called pyodbc and below is the sample code for reading .mdb files from python.

import csv
import pyodbc

MDB = 'c:/path/to/my.mdb'
DRV = '{Microsoft Access Driver (*.mdb)}'
PWD = 'mypassword'

conn = pyodbc.connect('DRIVER=%s;DBQ=%s;PWD=%s' % (DRV,MDB,PWD))
curs = conn.cursor()

SQL = 'SELECT * FROM mytable;' # insert your query here
curs.execute(SQL)

rows = curs.fetchall()

curs.close()
conn.close()

# you could change the 'w' to 'a' for subsequent queries
csv_writer = csv.writer(open('mytable.csv', 'w'), lineterminator='\n')

for row in rows:
    csv_writer.writerow(row)

For further information find the related SO1 SO2

2 Comments

I think the windows dependencies won't work on azure env since it is using linux env
if at all it is linux env try this stackoverflow.com/a/15400363

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.