0

Hello everyone I am learning python I am new I have a column in a csv file with this example of value: enter image description here

I want to divide the column programme based on that semi column into two columns for example

program 1: H2020-EU.3.1.
program 2: H2020-EU.3.1.7.

This is what I wrote initially

import csv
import os
with open('IMI.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file)
    
    with open('new_IMI.csv', 'w') as new_file:
              csv_writer = csv.writer(new_file, delimiter='\t')
              
    #for line in csv_reader:
       # csv_writer.writerow(line)

please note that after i do the split of columns I need to write the file again as a csv and save it to my computer

Please guide me

3
  • pandas provides a very easy to use .to_csv method. Perhaps do some research on this. This can be used with the .read_csv method which will read the CSV into a DataFrame object where the column can easily be split, then written back out. Commented Dec 7, 2021 at 12:37
  • I did this but I was blocked : import pandas as pd df = pd.read_csv('IMI2.csv', sep=';',encoding='cp1252' it will bring parser error and if I do this it won't put data in the right format: import pandas as pd df = pd.read_csv('IMI2.csv', sep=';',encoding='cp1252', names=['colA', 'colB', 'colC', 'colD','colE', 'colG','colH', 'colI', 'colJ', 'colK' , 'colL', 'colM', 'colN', 'colO', 'colP', 'colQ', 'colR', 'colS', 'colT', 'colU']) df Commented Dec 7, 2021 at 12:41
  • Does this answer your question? How to split a column in csv file into multiple column in python jupyter? Please avoid asking the same question twice Commented Dec 7, 2021 at 12:53

2 Answers 2

1

Using .loc to iterate through each row of a dataframe is somewhat inefficient. Better to split an entire column, with the expand=True to assign to the new columns. Also as stated, easy to use pandas here:

Code:

import pandas as pd

df = pd.read_csv('IMI.csv')
df[['programme1','programme2']] = df['programme'].str.split(';', expand=True)
df.drop(['programme'], axis=1, inplace=True)

df.to_csv('IMI.csv', index=False)

Example of output:

Before:

print(df)
       id     acronym  status                   programme           topics
0  945358  BIGPICTURE  SIGNED  H2020-EU.3.1.;H2020-EU3.1.7  IMI2-2019-18-01
1  821362      EBiSC2  SIGNED  H2020-EU.3.1.;H2020-EU3.1.7  IMI2-2017-13-06
2  116026     HARMONY  SIGNED                H202-EU.3.1.  IMI2-2015-06-04

After:

print(df)
       id     acronym  status           topics    programme1     programme2
0  945358  BIGPICTURE  SIGNED  IMI2-2019-18-01  H2020-EU.3.1.  H2020-EU3.1.7
1  821362      EBiSC2  SIGNED  IMI2-2017-13-06  H2020-EU.3.1.  H2020-EU3.1.7
2  116026     HARMONY  SIGNED  IMI2-2015-06-04  H2020-EU.3.1.           None
Sign up to request clarification or add additional context in comments.

Comments

0

You can use pandas library instead of csv.

import pandas as pd
df = pd.read_csv('IMI.csv')
p1 = {}
p2 = {}

for i in range(len(df)):
    if ';' in df['programme'].loc[i]:
        p1[df['id'].loc[i]] = df['programme'].loc[i].split(';')[0]
        p2[df['id'].loc[i]] = df['programme'].loc[i].split(';')[1]

df['programme1'] = df['id'].map(p1)
df['programme2'] = df['id'].map(p2)

and if you want to delete programme column:

df.drop('programme', axis=1)

To save new csv file:

df.to_csv('new_file.csv', inplace=True)

3 Comments

if you want to iterate through each row, probably better to use for i, row in df.iterrows(): as the way you have it is dependant on if the index is numerical sequence from 0 to n.
secondly, if you want to drop the column, you'll need to reassign the df or use inplace=True
About first comment, i wrote this for data shown in the picture. And for second, yes, my bad. I forgot to write `inplace=True'.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.