0

For example we have a csv file with

name age address john 25 koramangala banglore #@ sales maneger %$ india harshuth rao 36 belandur banglore #@ maneger %$ india vijay kumar 45 ulsoor banglore #@ sales maneger %$ india suhas 25 koramangala banglore #@analist %$ india mithun 22 venkatapura banglore #@ execitive %$ india

how to make this and add to different column

name           age  city                  country     position 
john           25   koramangala banglore  india       sales maneger
harshuth rao   36   belandur banglore     india       maneger
vijay kumar    45   ulsoor banglore       india       sales maneger
suhas          25   koramangala banglore  india       analist
mithun         22   venkatapura banglore  india       execitive

The code i am using is

 import re
 import csv
 with open("/home/vipul/Desktop/example.csv", 'rb') as f:
    mycsv = csv.reader(f)
    for row in mycsv:
        text = row[0]
        txt = re.findall(r'(\w+[\s\w]*)\b', text)  
        print txt

This is how it looks in txt editor

name ,age ,address
john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
harshuth rao ,36,belandur banglore +ACMAQA-  maneger +ACUAJA- india 
vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india
2
  • read the space separated csv as a pandas dataframe and split the city column on the special character Commented Feb 20, 2018 at 3:13
  • i am new to coding i dont know much can you help me ! Commented Feb 20, 2018 at 3:15

2 Answers 2

2

First, load your data using pd.read_csv:

import pandas as pd

df = pd.read_csv("/home/vipul/Desktop/example.csv", sep=',')

print(df)
           name   age                                             address
0           john    25  koramangala banglore +ACMAQA- sales maneger +A...
1  harshuth rao     36  belandur banglore +ACMAQA-  maneger +ACUAJA- i...
2    vijay kumar    45  ulsoor banglore +ACMAQA- sales maneger +ACUAJA...
3          suhas    25  koramangala banglore +ACMAQA-analist +ACUAJA- ...
4         mithun    22  venkatapura banglore +ACMAQA- execitive +ACUAJ...

Next, use str.split to separate the data + pd.concat to join with the original:

v = df.pop('address').str.split('\s*\+.*?-\s*', expand=True)
v.columns = ['city', 'position', 'country']

df = pd.concat([df, v], 1)

print(df)
           name   age                   city       position country
0           john    25  koramangala banglore  sales maneger   india
1  harshuth rao     36     belandur banglore        maneger  india 
2    vijay kumar    45       ulsoor banglore  sales maneger   india
3          suhas    25  koramangala banglore        analist   india
4         mithun    22  venkatapura banglore      execitive   india

Finally, save to CSV:

df.to_csv("/home/vipul/Desktop/new.csv")
Sign up to request clarification or add additional context in comments.

13 Comments

its a csv file i dont know how to do it i am new to python and pandas how to import and remove address column and add 2 more columns and save the file plz help !!
@VipulRao Start with df = pd.read_csv("/home/vipul/Desktop/example.csv") and then run this code. If that doesn't work, please open your CSV in sublime text or notepad and paste 5 rows in your question.
@COLDSPEED i tried the code its not working and i have edited my question
@coldspeed i dont understand can you help how to import and save. Can you give me entire code plz help me
@VipulRao See edit. I have literally served the answer to you on a platter. Please try to understand what is happening so you can figure these things out yourself next time.
|
1

Passing regular expression in the sep of read_csv

import io
t = """name ,age , address
john,25,koramangala banglore +ACMAQA- sales maneger +ACUAJA- india
harshuth rao ,36,belandur banglore +ACMAQA-  maneger +ACUAJA- india 
vijay kumar,45,ulsoor banglore +ACMAQA- sales maneger +ACUAJA- india
suhas,25,koramangala banglore +ACMAQA-analist +ACUAJA- india
mithun,22,venkatapura banglore +ACMAQA- execitive +ACUAJA- india"""

df = pd.read_csv(io.StringIO(t), 
                 sep='\s*\+ACMAQA-\s*|\s*\+ACUAJA-\s*|\s*,\s*', engine='python')
df = df.reset_index()
df.columns = ["name", "age", "city", "position", "country"]


    name          age                   city    position      country
0   john           25   koramangala banglore    sales maneger   india
1   harshuth rao   36   belandur banglore       maneger         india
2   vijay kumar    45   ulsoor banglore sales   maneger         india
3   suhas          25   koramangala banglore    analist         india
4   mithun         22   venkatapura banglore    execitive       india

1 Comment

Gosh damn, I wouldn't want to be an analist... also see your column headers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.