0

I have two '.csv' files in the below format: First File :

Roll_num Class_Name
1         ABC
2         DEF
5         PQR
27        UVW

Second File :

Roll_num Marks Grade
1        75    A
2        60    C
27       68    B
61       45    E

Now i want to add a column in the second file appending a column 'Class_Name' from First File. The data in both the files has duplicates in it and is not sorted.

I have written the following code that writes our required data from 2 files into a new file.

import csv

path="xyz"
file_read=open(path + "ClassName.CSV", "r")
reader_ClassName = csv.reader(file_read)

read_all_data=open(path + "Student.CSV", "r")
reader_Student =csv.reader(read_all_data)
write_all_data=open( path +"Student_Classname.CSV", "w")

for line_Student in reader_Student:
        Roll_Student=line_Student[0]
        for line_ClassName in reader_ClassName:
            Roll_ClassName=line_ClassName[0]
            ClassName=line_ClassName[1]         
            if(Roll_ClassName == Roll_Student):
                string= Roll_Student +","+ClassName  +"\n"
                print string
                write_all_data.write(string)
                break  

Output Expected :

Roll_num Marks Grade Class_Name
1        75    A     ABC
2        60    C     DEF
27       68    B     UVW
61       45    E     LMN

Output our code generates:

   Roll_num Marks Grade Class_Name
    1        75    A     ABC
    2        60    C     DEF

There is some issue in reading the Third line from Second inner 'for' loop. We have hundreds of thousands of records in both the files.

4
  • 1
    Is there missing data? There is no value "LMN" in the input, but it appears in the output. Commented Dec 10, 2014 at 7:29
  • Are values in Roll_num unique in each file? Commented Dec 10, 2014 at 7:40
  • Try it with less data. Commented Dec 10, 2014 at 8:20
  • There is more data in both the files.So I have just given example. Commented Dec 10, 2014 at 11:21

2 Answers 2

1

I suggest to avoid loop in loop by reading the whole ClassName.csv file and put into an dictionary first. I suggest the idea below

mydict = {}
for each_line in ClassName_csvfile:
     rollnum = get_roll_num()
     classname = get_class_name()
     mydict[rollnum]=classname 

for each_line in Student_csv_file:
     rollnum = get_roll_num()
     mark = get_mark()
     grade = get_grade()
     Classname = ''
     if mydict.has_key(rollnum):
        Classname = mydict[rollnum]
     writetofile(rollnum, mark, grade, Classname)

Update: you can use if rollnum in mydict: instead of mydict.has_key(rollnum) if you are using Python 2.3+. I am using python 2.7 and both works

P/s: Sorry for not commenting as it requires me 50 reputations

Sign up to request clarification or add additional context in comments.

Comments

0

I have named first CSV file as hash.csv and second CSV file as data.csv. Below script will help you.

import re

# Building up a hash with roll_num and class_name from hash.csv
chash = dict([ re.split('\s+', x.strip()) for x in open('hash.csv').readlines()][1:])

# Building a list of students record from data.csv
data = [ re.split('\s+', x.strip()) for x in open('data.csv').readlines() ][1:]

# Iterating through each data
for x in data:
    if x[0] in chash:
        x.append(chash[x[0]])
        print('{0:<5} {1:<5} {2:<5} {3:<5}'.format(*x))
    else:
        print('{0:<5} {1:<5} {2:<5}'.format(*x))

Output:

1     75    A  ABC
2     60    C  DEF
27    68    B  UVW
61    45    E

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.