0

I want to create a column in a pandas dataframe that would add the values of the other columns (which are 0 or 1s). the column is called "sum"

my HEADPandas looks like:

     Application  AnsSr sum  Col1 Col2 Col3 .... Col(n-2) Col(n-1) Col(n)
date 28-12-11      0.0   0.0  28/12/11      ....    ...Dates...    28/12/11
~00c        0      0.0   0.0  0    0     0  ....    0       0       0
~00pr       0      0.0   0.0  0    0     0  ....    0       0       0
~00te       0      0.0   0.0  0    0     1  ....    0       0       1

in an image from pythoneverywhere: enter image description here

expected result (assuming there would be no more columns

     Application  AnsSr sum  Col1  Col2  Col3 .... Col(n-2) Col(n-1) Col(n)
date 28-12-11      0.0   nan  28/12/11        ....    ...Dates...    28/12/11
~00c        0      0.0   0.0   0    0     0  ....    0      0        0
~00pr       0      0.0   0.0   0    0     0  ....    0      0        0
~00te       0      0.0   2     0    0     1  ....    0      0        1

as you see the values of 'sum' are kept 0 even if there are 1s values in some columns. what Am I doing wrong?

The basics of the code are:

theMatrix=pd.DataFrame([datetime.today().strftime('%Y-%m-%d')],['Date'],['Application'])
theMatrix['Ans'] = 0
theMatrix['sum'] = 0

so far so good then I add all the values with loc. and then I want to add up values with

theMatrix.fillna(0, inplace=True)
# this being the key line:
theMatrix['sum'] = theMatrix.sum(axis=1)
theMatrix.sort_index(axis=0, ascending=True, inplace=True)

As you see in the result (attached image) the sum remains 0. I had a look to here or here and to the pandas documentation at no avail. Actually the expression:

theMatrix['sum'] = theMatrix.sum(axis=1)

I got it from there.

changing this last line by:

theMatrix['sum'] = theMatrix[3:0].sum(axis=1)

in order to avoid to sum the first three columns gives as result:

     Application  AnsSr sum  Col1  Col2  Col3 .... Col(n-2) Col(n-1) Col(n)
date 28-12-11      0.0   nan  28/12/11        ....    ...Dates...    28/12/11
~00c        0      0.0   nan   1    1     0  ....    0      0        0
~00pr       0      0.0   1.0   0    0     0  ....    0      0        1
~00te       0      0.0   0     0    0     0  ....    0      0        0

please observe two things: a) how in row '~00c' sum is nan but there are 1s in that row. b) before the calculating the sum the code theMatrix.fillna(0, inplace=True) should have change all possible nan into 0 so the sum should never be nan since in theory there are no nan values in any of the columns[3:]

it wouldnt work.

some idea?

thanks

PS: Later edition, just in case you wondere how the dataframe is populated: reading and parsing an XML and the lines are:

# myDocId being the name of the columns
# concept being the index.
theMatrix.loc[concept,myDocId]=1 
12
  • 1
    Please post samples in text format, images samples are NOT recommended. Commented Dec 11, 2018 at 14:19
  • Thanks for the coment. Image is only a small image of the dataframe. all the code is in text Commented Dec 11, 2018 at 14:27
  • Yes, please add TEXT sample of input and expected data in your post. Commented Dec 11, 2018 at 14:30
  • 1
    Your first slice [3:] is cutting the columns off, your still summing data in the date column providing you the nan value in the first column. In the second column, it may or may not be a value error, or issue in the data itself. I would try the updated code below to see if that would help. I try not to call functions on data of incorrect type. Commented Dec 11, 2018 at 16:44
  • 1
    Hi @johnnyb, its a hell of a lot of data, but you gave me an idea: pd.set_option('display.max_columns', None) pd.set_option('display.max_rows', None) and go over all the data after printing. Commented Dec 11, 2018 at 16:46

2 Answers 2

1

If I understand correctly, this can help you:

import pandas as pd
import datetime

#create dataframe following your example
theMatrix=pd.DataFrame([datetime.datetime.today().strftime('%Y-%m-%d')],['Date'],['Application'])
theMatrix['Ans'] = 0
theMatrix['col1'] = 1
theMatrix['col2'] = 1

# create 'sum' column with summed values from certain columns
theMatrix['sum'] = theMatrix['col1'] + theMatrix['col2']
Sign up to request clarification or add additional context in comments.

3 Comments

Could it be a problem that the first row is actually not "addable" since it is populated with dates. I am using pythoneverywhere and in the console the 'sum' column would not add the values
theMatrix['sum'] = theMatrix[3:].sum(axis=1) adds up some values, but not all, because there are some values in sum that now are NaN even if a) there are some 1s in the columns and b) if there are any it should remain 0
@JoseBerlines Updated to match your dataset a bit clearer.
1

Any data you choose to sum, just add to a list, and use that list to provide to your sum function, with axis=1. This will provide you the desired outcome. Here is a sample related to your data.

Sample File Data:

Date,a,b,c
bad, bad, bad, bad # Used to simulate your data better
2018-11-19,1,0,0
2018-11-20,1,0,0
2018-11-21,1,0,1
2018-11-23,1,nan,0 # Nan here is just to represent the missing data
2018-11-28,1,0,1
2018-11-30,1,nan,1 # Nan here is just to represent the missing data
2018-12-02,1,0,1

Code:

import pandas as pd
df = pd.read_csv(yourdata.filename) # Your method of loading the data
#rows_to_sum = ['a','b','c'] # The rows you wish to summarize
rows_to_sum = df.columns[1:] # Alternate method to select remainder of rows.
df = df.fillna(0) # used to fill the NaN you were talking about below. 
df['sum'] = df[rows_to_sum][1:].astype(int).sum(axis=1) # skip the correct amount of rows here. 
# Also, the use of astype(int), is due to the bad data read from the top. So redefining it here, allows you to sum it appropriately. 
print(df)

Output:

       Date     a     b     c  sum
        bad   bad   bad   bad  NaN
 2018-11-19     1     0     0  1.0
 2018-11-20     1     0     0  1.0
 2018-11-21     1     0     1  2.0
 2018-11-23     1     0     0  1.0
 2018-11-28     1     0     1  2.0
 2018-11-30     1     0     1  2.0
 2018-12-02     1     0     1  2.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.