Reading &I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the csvCSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.
So I decided on the approach below. First, a wee test.csvtest.csv:
"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"
Here's the code:
# -*- coding: utf-8 -*-
import pandas as pd
infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')
# replace '.' by '_'
df.columns = df.columns.str.replace('.', '_')
# we may want to iterate over useful_cols later, so to keep things consistent:
useful_cols = [s.replace('', '') for s in useful_cols]
# now we can do this..
print(df['birth_place'])
# ... and this
for row in df.itertuples():
print(row.birth_place)
# ain't that nice?
It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.
Any thoughts?