0

I have a CSV file, which has several columns and several rows. Please, see the picture above. In the picture is shown just the two first baskets, but in the original CSV -file I have hundreds of them. [1]: https://i.sstatic.net/R2ZTo.png

I would like to calculate average for every Fruit in every Basket using Python. Here is my code but it doesn't seem to work as it should be. Better ideas? I have tried to fix this also importing and using numpy but I didn't succeed with it.

I would appreciate any help or suggestions! I'm totally new in this.

import csv
from operator import itemgetter


fileLineList = []
averageFruitsDict = {} # Creating an empty dictionary here.

with open('Fruits.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        fileLineList.append(row)

for column in fileLineList:
    highest = 0
    lowest = 0
    total = 0
    average = 0
    for column in row:
        if column.isdigit():
            column = int(column)
            if column > highest:
                highest = column
            if column < lowest or lowest == 0:
                lowest = column
            total += column    
    average = total / 3
  
    averageFruitsDict[row[0]] = [highest, lowest, round(average)]

averageFruitsList = []


for key, value in averageFruitsDict.items():
    averageFruitsList.append([key, value[2]])


print('\nFruits in Baskets\n')
print(averageFruitsList)

--- So I'm know trying with this code:

import pandas as pd

fruits = pd.read_csv('fruits.csv', sep=';')
print(list(fruits.columns))
fruits['Unnamed: 0'].fillna(method='ffill', inplace = True)
fruits.groupby('Unnamed: 0').mean()
fruits.groupby('Bananas').mean()
fruits.groupby('Apples').mean()
fruits.groupby('Oranges').mean()
fruits.to_csv('results.csv', index=False)

It creates a new CSV file for me and it looks correct, I don't get any errors but I can't make it calculate the mean of every fruit for every basket. Thankful of all help!

2
  • row is not defined before line for column in row: did you mean for row in fileLineList: and then for column in row: ? Commented Mar 15, 2021 at 10:22
  • Read CSV as Pandas data frame and do as df["columnname"].mean() Commented Mar 15, 2021 at 10:23

1 Answer 1

1

So using the image you posted and replicating/creating an identical test csv called fruit - I was able to create this quick solution using pandas.

import pandas as pd
fruit = pd.read_csv('fruit.csv')

enter image description here

With the unnamed column containing the basket numbers with NaNs in between - we fill with the preceding value. By doing so we are then able to group by the basket number (by using the 'Unnamed: 0' column and apply the mean to all other columns)

fruit['Unnamed: 0'].fillna(method='ffill', inplace = True)

fruit.groupby('Unnamed: 0').mean()

This gets you your desired output of a fruit average for each basket (please note I made up values for basket 3)

enter image description here

Sign up to request clarification or add additional context in comments.

10 Comments

Thanks very much! This looks just like I hoped the final result would be. I'm anyway getting an error Exception has occurred: KeyError 'Unnamed: 0' If I have understood correct, that error means that the key doesn't exit? The path key should be correct? I'm not familiar with pandas (yet!) but does it add automatically the average values to the original csv file or do I have to add fruit.to_csv('fruit.csv', index=False)to the code? I'm trying to get the key error fixed so that I can try the saving. I'm sorry that my questions might be stupid but I'm still very beginner. Thank you!
i'd have to see your code in order to tell you where that error has come from. if you replicate what I have done except using the correct file name i.e. 'Fruits.csv' then there should be no issue. To answer your 2nd question - no the changes will not be automatically made to the original csv however you can store the grouped dataframe (with averages) as a new dataframe: i.e. new_df = fruit.groupby('Unnamed: 0').mean() and then create a csv with the results (using new_df.to_csv('new_file_name.csv')
Hi no problem happy to help - There appears to be an issue when you reference the column name which is throwing up the KeyError. can you please call the column titles using fruits.columns and tell me what you get. Thanks
My suspicion is that when reading in the csv yours has for some reason given that 'unamed' column a slightly different title which is throwing up the error (could be all lower case for example or containing no spaces) - therefore this is easily resolved by printing the list of column titles using print(list(fruits.columns)) and referencing the correct column title
Can you try referencing the column for another purpose just to see there are no KeyErrors there, i.e. print(fruits['Unnamed: 0] (key error in pandas generally means it can't find the thing that you are looking for so must be something to do with the column title)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.