Finding average of every column from CSV file using Python?

Question

I have a CSV file, which has several columns and several rows. Please, see the picture above. In the picture is shown just the two first baskets, but in the original CSV -file I have hundreds of them. [1]: https://i.sstatic.net/R2ZTo.png

I would like to calculate average for every Fruit in every Basket using Python. Here is my code but it doesn't seem to work as it should be. Better ideas? I have tried to fix this also importing and using numpy but I didn't succeed with it.

I would appreciate any help or suggestions! I'm totally new in this.

import csv
from operator import itemgetter


fileLineList = []
averageFruitsDict = {} # Creating an empty dictionary here.

with open('Fruits.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        fileLineList.append(row)

for column in fileLineList:
    highest = 0
    lowest = 0
    total = 0
    average = 0
    for column in row:
        if column.isdigit():
            column = int(column)
            if column > highest:
                highest = column
            if column < lowest or lowest == 0:
                lowest = column
            total += column    
    average = total / 3
  
    averageFruitsDict[row[0]] = [highest, lowest, round(average)]

averageFruitsList = []


for key, value in averageFruitsDict.items():
    averageFruitsList.append([key, value[2]])


print('\nFruits in Baskets\n')
print(averageFruitsList)

--- So I'm know trying with this code:

import pandas as pd

fruits = pd.read_csv('fruits.csv', sep=';')
print(list(fruits.columns))
fruits['Unnamed: 0'].fillna(method='ffill', inplace = True)
fruits.groupby('Unnamed: 0').mean()
fruits.groupby('Bananas').mean()
fruits.groupby('Apples').mean()
fruits.groupby('Oranges').mean()
fruits.to_csv('results.csv', index=False)

It creates a new CSV file for me and it looks correct, I don't get any errors but I can't make it calculate the mean of every fruit for every basket. Thankful of all help!

row is not defined before line for column in row: did you mean for row in fileLineList: and then for column in row: ? — Abhi_J
– Abhi_J, Commented Mar 15, 2021 at 10:22
Read CSV as Pandas data frame and do as df["columnname"].mean() — user6304394
– user6304394, Commented Mar 15, 2021 at 10:23

jojo_040 · Accepted Answer · 2021-03-15 10:42:28Z

1

So using the image you posted and replicating/creating an identical test csv called fruit - I was able to create this quick solution using pandas.

import pandas as pd
fruit = pd.read_csv('fruit.csv')

With the unnamed column containing the basket numbers with NaNs in between - we fill with the preceding value. By doing so we are then able to group by the basket number (by using the 'Unnamed: 0' column and apply the mean to all other columns)

fruit['Unnamed: 0'].fillna(method='ffill', inplace = True)

fruit.groupby('Unnamed: 0').mean()

This gets you your desired output of a fruit average for each basket (please note I made up values for basket 3)

edited Mar 15, 2021 at 10:42

answered Mar 15, 2021 at 10:37

jojo_040

1329 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

siniviikuna Over a year ago

Thanks very much! This looks just like I hoped the final result would be. I'm anyway getting an error Exception has occurred: KeyError 'Unnamed: 0' If I have understood correct, that error means that the key doesn't exit? The path key should be correct? I'm not familiar with pandas (yet!) but does it add automatically the average values to the original csv file or do I have to add fruit.to_csv('fruit.csv', index=False)to the code? I'm trying to get the key error fixed so that I can try the saving. I'm sorry that my questions might be stupid but I'm still very beginner. Thank you!

jojo_040 Over a year ago

i'd have to see your code in order to tell you where that error has come from. if you replicate what I have done except using the correct file name i.e. 'Fruits.csv' then there should be no issue. To answer your 2nd question - no the changes will not be automatically made to the original csv however you can store the grouped dataframe (with averages) as a new dataframe: i.e. new_df = fruit.groupby('Unnamed: 0').mean() and then create a csv with the results (using new_df.to_csv('new_file_name.csv')

jojo_040 Over a year ago

Hi no problem happy to help - There appears to be an issue when you reference the column name which is throwing up the KeyError. can you please call the column titles using fruits.columns and tell me what you get. Thanks

jojo_040 Over a year ago

My suspicion is that when reading in the csv yours has for some reason given that 'unamed' column a slightly different title which is throwing up the error (could be all lower case for example or containing no spaces) - therefore this is easily resolved by printing the list of column titles using print(list(fruits.columns)) and referencing the correct column title

jojo_040 Over a year ago

Can you try referencing the column for another purpose just to see there are no KeyErrors there, i.e. print(fruits['Unnamed: 0] (key error in pandas generally means it can't find the thing that you are looking for so must be something to do with the column title)

|

Collectives™ on Stack Overflow

Finding average of every column from CSV file using Python?

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related