0

I am trying to solve a simple problem...I have a file called data.csv with the following data:

enroll_code,student_id
10030,55000
10030,55804
10250,55804
10510,55000

What I am trying to do is to load the file, read the contents, and get the count of the number of values for each enroll_code. Without using Pandas, how can this be done? Here's what I have tried so far...

file = open('data.csv')
csv_reader = csv.reader(file)
next(csv_reader)
for key, value in csv_reader.items():
    print(key, len([item for item in csv_reader if item]))
3
  • Why are you doing csv_reader.items()? Commented Feb 28, 2021 at 2:53
  • just curious by why don't you want to use pandas? Commented Feb 28, 2021 at 2:53
  • @KillerToilet more than one way to skin a cat... Commented Feb 28, 2021 at 2:56

2 Answers 2

1

I think you have issues in reading the CSV file correctly. Here is the snippet for reading CSV.

    In [8]: import csv
   ...: with open("data.csv", 'r') as file:
   ...:     csv_file = csv.DictReader(file)
   ...:     count = {}
   ...:     for row in csv_file:
   ...:         entry = dict(row)
   ...:         if entry['enroll_code'] in count:
   ...:             count[entry['enroll_code']] +=1
   ...:         else:
   ...:             count[entry['enroll_code']] = 1
   ...:     print(count)
   ...:
   ...:
   ...:
{'10030': 2, '10250': 1, '10510': 1}

Inside the for loop add your logic for counting all enrollments, which you can do it using a dictionary. All the best.

Sign up to request clarification or add additional context in comments.

3 Comments

I don't know why, seems like he wanna try it without using any data science library.
@Frank for simple CSV file operation, developers don't want to use data science library. Even I would prefer the CSV library instead of NumPy, pandas if it is an operation like this.
Yes. I am using pandas in my project actually. I am just confused by this problem setting. :d
0

Without using Pandas. How can this be done?

Say you are read the .csv like

0,1,1,2,3,

Short Answer

Numpy.

tmp = np.loadtxt(path, dtype=np.str, delimiter=",")

To get the length of the data. Just print the shape of tmp.

print(tmp.shape)

To make it without using any libs.

def csv_reader(datafile):
    data = []
    with open(datafile, "r") as f:
        header = f.readline().split(",")  # 获取表头
        counter = 0
        for line in f:
            data.append(line) # you can split the line later.
            fields = line.split(",")
            print("line: ",line, " ", fields)
            counter += 1

    return data

if __name__ == '__main__':
    csv_reader("0.csv")


4 Comments

let's try it without using any data science library...
@RichardBroyles Updated. Check it out.
wasn't what I had hoped for...here's the output... line: 10030,55000 ['10030', '55000\n'] line: 10030,55804 ['10030', '55804\n'] line: 10250,55804 ['10250', '55804\n'] line: 10510,55000 ['10510', '55000\n'] Looking for something like this.... 10030: 2 10250: 1 10510: 1
@RichardBroyles hi, I am posting the idea here. you need to modify the parser according to the data format in your csv file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.