Python Script to Count Tables, Columns and Rows in SQLite Database

One of my colleagues asked me for some metrics regarding the size of our project’s database to put in a report, so I wrote a short little script Python to summarize the dimensions of SQLite tables. It prints a list of the tables in the database, along with the number of columns, rows and cells in each.

The command line is as follows:

python.exe DatabaseSizeSummary.py database.sqlite output.txt

Output is to a specified output file, or to standard out if no file is specified. It’s tab delimited for easy Excel import. Here is a sample of the output text:

TableName	Columns	Rows	Cells
Boreholes	25	820	20500
Canals	28	14	392
Pipelines	25	785	19625
Reservoirs	27	387	10449
Siphons	23	23	529

Number of Tables:	5
Total Number of Columns:	128
Total Number of Rows:	2029
Total Number of Cells:	51495

And here is the same output with a little bit of Excel formatting:

output screenshot

Interesting code includes getting the list of tables:

tableListQuery = "SELECT name FROM sqlite_master WHERE type='table' ORDER BY Name"
cursor.execute(tableListQuery)
tables = map(lambda t: t[0], cursor.fetchall())

and getting the number of columns in each table:

columnsQuery = "PRAGMA table_info(%s)" % table
cursor.execute(columnsQuery)
numberOfColumns = len(cursor.fetchall())

You can download the script here, and if you add any nifty features, do share!

PS: This was the first time I wanted to share a code file on my blog, and I didn’t know what the best way to do so was. I tried Google Drive, but couldn’t get a direct download link to the file. I also considered my Bitbucket account, but ended up using the public folder of my Dropbox account. However, I don’t like the URL much, and I’m sure there is a more generally accepted way of doing this. What do you use to share code files on your blog?

Update: I’m trying out gist.github.com, so below is a version controlled, editable version of the script. Thanks to @simondlr and @konradblum for telling me about this awesome service – you can read more about it’s integration with WordPress here: en.support.wordpress.com/gist/


# <author>Pieter Muller</author>
# <date>2012-11-14</date>
import sys
import sqlite3 as sqlite
tablesToIgnore = ["sqlite_sequence"]
outputFilename = None
def Print(msg):
if (outputFilename != None):
outputFile = open(outputFilename,'a')
print >> outputFile, msg
outputFile.close()
else:
print msg
def Describe(dbFile):
connection = sqlite.connect(dbFile)
cursor = connection.cursor()
Print("TableName\tColumns\tRows\tCells")
totalTables = 0
totalColumns = 0
totalRows = 0
totalCells = 0
# Get List of Tables:
tableListQuery = "SELECT name FROM sqlite_master WHERE type='table' ORDER BY Name"
cursor.execute(tableListQuery)
tables = map(lambda t: t[0], cursor.fetchall())
for table in tables:
if (table in tablesToIgnore):
continue
columnsQuery = "PRAGMA table_info(%s)" % table
cursor.execute(columnsQuery)
numberOfColumns = len(cursor.fetchall())
rowsQuery = "SELECT Count() FROM %s" % table
cursor.execute(rowsQuery)
numberOfRows = cursor.fetchone()[0]
numberOfCells = numberOfColumns*numberOfRows
Print("%s\t%d\t%d\t%d" % (table, numberOfColumns, numberOfRows, numberOfCells))
totalTables += 1
totalColumns += numberOfColumns
totalRows += numberOfRows
totalCells += numberOfCells
Print( "" )
Print( "Number of Tables:\t%d" % totalTables )
Print( "Total Number of Columns:\t%d" % totalColumns )
Print( "Total Number of Rows:\t%d" % totalRows )
Print( "Total Number of Cells:\t%d" % totalCells )
cursor.close()
connection.close()
if __name__ == "__main__":
if (len(sys.argv) == 2):
dbFile = sys.argv[1]
Describe(dbFile)
elif (len(sys.argv) == 3):
dbFile = sys.argv[1]
outputFilename = sys.argv[2]
Describe(dbFile)
else:
print "\n\tUsage:"
print "\n\t\tDBDescribe.py {dbFile}"
print "\t\t\tPrints summary of {dbFile} to standard output."
print "\n\t\tDBDescribe.py {dbFile} {outputFile}"
print "\t\t\tAppends summary of {dbFile} to {outputFile}."

UPDATE 2013-02-28: I figured out that pysqlite2 is deprecated and has been replaced by the sqlite3 library, which is bundled with Python 2.5 and up by default, and thus requires no download. I’ve removed the links to pysqlite2 and modified the downloadable script to use sqlite3 instead.

 

This entry was posted in Software Development and tagged , , , , . Bookmark the permalink.

1 Response to Python Script to Count Tables, Columns and Rows in SQLite Database

  1. Pingback: Python, Pythonista und SQLite: Teil 1 | Nilslog

Leave a comment