Skip to main content
deleted 74 characters in body; edited title
Source Link
Jamal
  • 35.2k
  • 13
  • 134
  • 238

Handling periods ('.') in csvCSV column names with Pandas

Reading &I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the csvCSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csvtest.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

 
# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice? 

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

Any thoughts?

Handling periods ('.') in csv column names with Pandas

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice? 

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

Any thoughts?

Handling periods ('.') in CSV column names with Pandas

I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the CSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"
 
# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice?

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

deleted 98 characters in body
Source Link
RolfBly
  • 897
  • 1
  • 10
  • 23

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice? 

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

(Aside, I'm striving for good readability, but cols imho is just as readable as columns.)

Any thoughts?

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice? 

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

(Aside, I'm striving for good readability, but cols imho is just as readable as columns.)

Any thoughts?

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice? 

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

Any thoughts?

Source Link
RolfBly
  • 897
  • 1
  • 10
  • 23

Handling periods ('.') in csv column names with Pandas

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice? 

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

(Aside, I'm striving for good readability, but cols imho is just as readable as columns.)

Any thoughts?