csv - how do I get the value of a row in a certain line using python and pandas?

Question

My csv looks like this:

name;        street;         number;
------------------------------------
Jimmy;       Nice Street 24; 24;
Carl;        Great Street;   128;
Tim;         Long Street 5;   - ;
...

I read that csv with panda like this:

data = pd.read_csv(r'export.csv')
x = data[['name', 'street', 'number']]

As you can see the users did not input their adress correctly in line 1 and 2.

So what I want to do is check each street name for a name and if there is already a number in it. If there is, erase the number from the street row and put it in the number row if there isn't one yet. All lines should look like line 2 later.

I am new to python and pandas and can't figure out the smoothest way to do this. Any input is much appreaciated!

mensik · Accepted Answer · 2017-05-30 11:45:31Z

1

I would consider removing the trailing separators in a source csv. But it is not necessary.

This code will do the magic:

import pandas as pd
import re


def check_street_no(row):
    number_match = re.search(r'\d+$', row['street'])
    if number_match is not None:
        row['number'] = number_match.group()
        row['street'] = re.sub(r' *\d+$', '', row['street'])
    return row

data = pd.read_csv(r'streets.csv', sep=';', skiprows=[1], skipinitialspace=True)
data = data.apply(check_street_no, axis=1)
print(data)

Output:

    name        street number
0  Jimmy   Nice Street     24
1   Carl  Great Street    128
2    Tim   Long Street      5

answered May 30, 2017 at 11:45

mensik

465 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2017-05-30 11:49:25Z

0

You can use str.extract with combine_first for replace NaNs to original values, for reorder columns use reindex_axis:

df = pd.read_csv(r'export.csv', sep=';', skiprows=[1], skipinitialspace=True)

#if necessary remove columns full of NaNs
df = df.dropna(how='all', axis=1)
df1 = df['street'].str.extract('(?P<street>[a-zA-z\s]+) (?P<number>\d+)', expand=True)
print (df1)
        street number
0  Nice Street     24
1          NaN    NaN
2  Long Street      5

df = df1.combine_first(df).reindex_axis(df.columns, axis=1)
print (df)
    name        street number
0  Jimmy   Nice Street     24
1   Carl  Great Street    128
2    Tim   Long Street      5

edited May 30, 2017 at 11:49

answered May 30, 2017 at 11:44

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

csv - how do I get the value of a row in a certain line using python and pandas?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related