One of my colleagues asked me for some metrics regarding the size of our project’s database to put in a report, so I wrote a short little script Python to summarize the dimensions of SQLite tables. It prints a list of the tables in the database, along with the number of columns, rows and cells in each.
The command line is as follows:
python.exe DatabaseSizeSummary.py database.sqlite output.txt
Output is to a specified output file, or to standard out if no file is specified. It’s tab delimited for easy Excel import. Here is a sample of the output text:
TableName Columns Rows Cells Boreholes 25 820 20500 Canals 28 14 392 Pipelines 25 785 19625 Reservoirs 27 387 10449 Siphons 23 23 529 Number of Tables: 5 Total Number of Columns: 128 Total Number of Rows: 2029 Total Number of Cells: 51495
And here is the same output with a little bit of Excel formatting:
Interesting code includes getting the list of tables:
tableListQuery = "SELECT name FROM sqlite_master WHERE type='table' ORDER BY Name" cursor.execute(tableListQuery) tables = map(lambda t: t[0], cursor.fetchall())
and getting the number of columns in each table:
columnsQuery = "PRAGMA table_info(%s)" % table cursor.execute(columnsQuery) numberOfColumns = len(cursor.fetchall())
You can download the script here, and if you add any nifty features, do share!
PS: This was the first time I wanted to share a code file on my blog, and I didn’t know what the best way to do so was. I tried Google Drive, but couldn’t get a direct download link to the file. I also considered my Bitbucket account, but ended up using the public folder of my Dropbox account. However, I don’t like the URL much, and I’m sure there is a more generally accepted way of doing this. What do you use to share code files on your blog?
Update: I’m trying out gist.github.com, so below is a version controlled, editable version of the script. Thanks to @simondlr and @konradblum for telling me about this awesome service – you can read more about it’s integration with WordPress here: en.support.wordpress.com/gist/
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # <author>Pieter Muller</author> | |
| # <date>2012-11-14</date> | |
| import sys | |
| import sqlite3 as sqlite | |
| tablesToIgnore = ["sqlite_sequence"] | |
| outputFilename = None | |
| def Print(msg): | |
| if (outputFilename != None): | |
| outputFile = open(outputFilename,'a') | |
| print >> outputFile, msg | |
| outputFile.close() | |
| else: | |
| print msg | |
| def Describe(dbFile): | |
| connection = sqlite.connect(dbFile) | |
| cursor = connection.cursor() | |
| Print("TableName\tColumns\tRows\tCells") | |
| totalTables = 0 | |
| totalColumns = 0 | |
| totalRows = 0 | |
| totalCells = 0 | |
| # Get List of Tables: | |
| tableListQuery = "SELECT name FROM sqlite_master WHERE type='table' ORDER BY Name" | |
| cursor.execute(tableListQuery) | |
| tables = map(lambda t: t[0], cursor.fetchall()) | |
| for table in tables: | |
| if (table in tablesToIgnore): | |
| continue | |
| columnsQuery = "PRAGMA table_info(%s)" % table | |
| cursor.execute(columnsQuery) | |
| numberOfColumns = len(cursor.fetchall()) | |
| rowsQuery = "SELECT Count() FROM %s" % table | |
| cursor.execute(rowsQuery) | |
| numberOfRows = cursor.fetchone()[0] | |
| numberOfCells = numberOfColumns*numberOfRows | |
| Print("%s\t%d\t%d\t%d" % (table, numberOfColumns, numberOfRows, numberOfCells)) | |
| totalTables += 1 | |
| totalColumns += numberOfColumns | |
| totalRows += numberOfRows | |
| totalCells += numberOfCells | |
| Print( "" ) | |
| Print( "Number of Tables:\t%d" % totalTables ) | |
| Print( "Total Number of Columns:\t%d" % totalColumns ) | |
| Print( "Total Number of Rows:\t%d" % totalRows ) | |
| Print( "Total Number of Cells:\t%d" % totalCells ) | |
| cursor.close() | |
| connection.close() | |
| if __name__ == "__main__": | |
| if (len(sys.argv) == 2): | |
| dbFile = sys.argv[1] | |
| Describe(dbFile) | |
| elif (len(sys.argv) == 3): | |
| dbFile = sys.argv[1] | |
| outputFilename = sys.argv[2] | |
| Describe(dbFile) | |
| else: | |
| print "\n\tUsage:" | |
| print "\n\t\tDBDescribe.py {dbFile}" | |
| print "\t\t\tPrints summary of {dbFile} to standard output." | |
| print "\n\t\tDBDescribe.py {dbFile} {outputFile}" | |
| print "\t\t\tAppends summary of {dbFile} to {outputFile}." | |
UPDATE 2013-02-28: I figured out that pysqlite2 is deprecated and has been replaced by the sqlite3 library, which is bundled with Python 2.5 and up by default, and thus requires no download. I’ve removed the links to pysqlite2 and modified the downloadable script to use sqlite3 instead.
Pingback: Python, Pythonista und SQLite: Teil 1 | Nilslog