deleted 74 characters in body; edited title

Source Link

edited Jul 19, 2018 at 6:06

Jamal

35.2k
13
134
238

Handling periods ('.') in csvCSV column names with Pandas

Reading &I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the csvCSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csvtest.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice?

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

Any thoughts?

Handling periods ('.') in csv column names with Pandas

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice?

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

Any thoughts?

Handling periods ('.') in CSV column names with Pandas

I'm reading and processing a fairly large csv using Pandas and Python 3.7. Header names in the CSV have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice?

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

deleted 98 characters in body

Source Link

edited Jul 18, 2018 at 20:26

RolfBly

897
1
10
23

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice?

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

(Aside, I'm striving for good readability, but cols imho is just as readable as columns.)

Any thoughts?

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice?

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

(Aside, I'm striving for good readability, but cols imho is just as readable as columns.)

Any thoughts?

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice?

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

Any thoughts?

Source Link

asked Jul 18, 2018 at 19:24

RolfBly

897
1
10
23

Handling periods ('.') in csv column names with Pandas

Reading & processing a fairly large csv using Pandas and Python 3.7. Header names in the csv have periods in them ('full stops', Britons say). That's a problem when you want to address data cells by column name.

So I decided on the approach below. First, a wee test.csv:

"name","birth.place","not.important"
"John","",""
"Paul","Liverpool","blue"

Here's the code:

# -*- coding: utf-8 -*-

import pandas as pd

infile = 'test.csv'
useful_cols = ['name', 'birth.place']
df = pd.read_csv(infile, usecols=useful_cols, encoding='utf-8-sig', engine='python')

# replace '.' by '_' 
df.columns = df.columns.str.replace('.', '_')

# we may want to iterate over useful_cols later, so to keep things consistent: 
useful_cols = [s.replace('', '') for s in useful_cols]

# now we can do this..
print(df['birth_place'])

# ... and this
for row in df.itertuples():
    print(row.birth_place)

# ain't that nice?

It works, but since Pandas is such a powerful library and the use case is quite common, I'm wondering if there isn't an even better way of doing this.

(Aside, I'm striving for good readability, but cols imho is just as readable as columns.)

Any thoughts?

python csv pandas

Stack Exchange Network

Return to Question

Handling periods ('.') in csvCSV column names with Pandas

Handling periods ('.') in csv column names with Pandas

Handling periods ('.') in CSV column names with Pandas

Handling periods ('.') in csv column names with Pandas