2

Basically I'm trying to add together the count column values of item columns with the same name in a csv file. I then need to sort the results in ascending alphabetic order by item column values. For example:

Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 100

I need to add the values 123 and 100 to get a new row for Diabetes.

It should look like this:
    Diabetes Mellitus, 223.

This is the code I have so far:

import csv
import sys

with open(sys.argv[1], 'r') as File:
    reader = csv.reader(File)
    itemindex = sys.argv[2]
    countindex = sys.argv[3]
    item column = 0
    count column = 0
    first row = True
    dictionary = {}

    for row in reader:
       if firstrow == True:
          firstrow = False
          itemcolumn = row.index(itemindex)
          countcolumn = row.index(countindex)
       else:
           if item column in dictionary:
               # Add the item at the row's count column (converted to an int) to the
               # prexisting entry with that item column.
           else:
               #create a new entry in the dictionary

       print(itemindex + "," + countindex)

for key, value in sorted(dictionary)
    print(key + "," + value)

The commented parts are the ones I'm stuck on.

3 Answers 3

1

If you have access you could use the pandas package to handle the csv.

Text file titled values.txt

Leading Cause, Deaths
Diabetes Mellitus, 123
Influenza and Pneumonia, 325
Diabetes Mellitus, 1008

The desired data frame can be achieved with:

import pandas as pd

data = pd.read_csv('values.txt')
print(data)

sum_data = data.groupby(['Leading Cause']).sum()
print(sum_data)

print(sum_data.loc['Diabetes Mellitus'])

Which would output

             Leading Cause   Deaths
0        Diabetes Mellitus      123
1  Influenza and Pneumonia      325
2        Diabetes Mellitus     1008

                          Deaths
Leading Cause                   
Diabetes Mellitus           1131
Influenza and Pneumonia      325

 Deaths    1131
Name: Diabetes Mellitus, dtype: int64
Sign up to request clarification or add additional context in comments.

1 Comment

pandas is a great package to manage data within csv files, and even for excel files.
0

Using a collections.defaultdict, which is a specialized dictionary class, would make it easy:

import collections
import csv
import os
import sys

try:
    filename = sys.argv[1]
    itemindex = int(sys.argv[2])
    countindex = int(sys.argv[3])
except IndexError:
    print('Error:\n  Usage: {} <file name> <item index> <count index>'.format(
            os.path.basename(sys.argv[0])))
    sys.exit(-1)

with open(filename, 'r', newline='') as file:
    reader = csv.reader(file, skipinitialspace=True)
    next(reader)  # Skip first row.

    counter = collections.defaultdict(int)
    for row in reader:
        disease, deaths = row[itemindex], int(row[countindex])
        counter[disease] += deaths

for key, value in sorted(counter.items()):
    print('{}, {}'.format(key, value))

Sample usage:

python3 script_name.py diseases.csv 0 1    

Sample output:

Diabetes Mellitus, 223
Influenza and Pneumonia, 325

Comments

0

You could try not using the libraries provided and just loop through the data as text. Parsing the text.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.