Merging records from two '.CSV' files in python

Question

I have two '.csv' files in the below format: First File :

Roll_num Class_Name
1         ABC
2         DEF
5         PQR
27        UVW

Second File :

Roll_num Marks Grade
1        75    A
2        60    C
27       68    B
61       45    E

Now i want to add a column in the second file appending a column 'Class_Name' from First File. The data in both the files has duplicates in it and is not sorted.

I have written the following code that writes our required data from 2 files into a new file.

import csv

path="xyz"
file_read=open(path + "ClassName.CSV", "r")
reader_ClassName = csv.reader(file_read)

read_all_data=open(path + "Student.CSV", "r")
reader_Student =csv.reader(read_all_data)
write_all_data=open( path +"Student_Classname.CSV", "w")

for line_Student in reader_Student:
        Roll_Student=line_Student[0]
        for line_ClassName in reader_ClassName:
            Roll_ClassName=line_ClassName[0]
            ClassName=line_ClassName[1]         
            if(Roll_ClassName == Roll_Student):
                string= Roll_Student +","+ClassName  +"\n"
                print string
                write_all_data.write(string)
                break

Output Expected :

Roll_num Marks Grade Class_Name
1        75    A     ABC
2        60    C     DEF
27       68    B     UVW
61       45    E     LMN

Output our code generates:

   Roll_num Marks Grade Class_Name
    1        75    A     ABC
    2        60    C     DEF

There is some issue in reading the Third line from Second inner 'for' loop. We have hundreds of thousands of records in both the files.

Is there missing data? There is no value "LMN" in the input, but it appears in the output. — Marichyasana
– Marichyasana, Commented Dec 10, 2014 at 7:29
There is more data in both the files.So I have just given example. — Rohit
– Rohit, Commented Dec 10, 2014 at 11:21

Phuc Tran · Accepted Answer · 2014-12-10 08:00:49Z

1

I suggest to avoid loop in loop by reading the whole ClassName.csv file and put into an dictionary first. I suggest the idea below

mydict = {}
for each_line in ClassName_csvfile:
     rollnum = get_roll_num()
     classname = get_class_name()
     mydict[rollnum]=classname 

for each_line in Student_csv_file:
     rollnum = get_roll_num()
     mark = get_mark()
     grade = get_grade()
     Classname = ''
     if mydict.has_key(rollnum):
        Classname = mydict[rollnum]
     writetofile(rollnum, mark, grade, Classname)

Update: you can use if rollnum in mydict: instead of mydict.has_key(rollnum) if you are using Python 2.3+. I am using python 2.7 and both works

P/s: Sorry for not commenting as it requires me 50 reputations

edited Dec 10, 2014 at 8:00

answered Dec 10, 2014 at 7:41

Phuc Tran

17012 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kannan Mohan · Accepted Answer · 2014-12-10 08:08:00Z

0

I have named first CSV file as hash.csv and second CSV file as data.csv. Below script will help you.

import re

# Building up a hash with roll_num and class_name from hash.csv
chash = dict([ re.split('\s+', x.strip()) for x in open('hash.csv').readlines()][1:])

# Building a list of students record from data.csv
data = [ re.split('\s+', x.strip()) for x in open('data.csv').readlines() ][1:]

# Iterating through each data
for x in data:
    if x[0] in chash:
        x.append(chash[x[0]])
        print('{0:<5} {1:<5} {2:<5} {3:<5}'.format(*x))
    else:
        print('{0:<5} {1:<5} {2:<5}'.format(*x))

Output:

1     75    A  ABC
2     60    C  DEF
27    68    B  UVW
61    45    E

edited Dec 10, 2014 at 8:08

answered Dec 10, 2014 at 7:47

Kannan Mohan

1,85012 silver badges16 bronze badges

Collectives™ on Stack Overflow

Merging records from two '.CSV' files in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related