1

I've got a Pandas dataframe with 118 columns and I'd like to add a new column 'x119'. I tried using various methods which all seem to work like:

df = df.assign(x119=F))

or:

df.loc[:,'x119'] = F

The methods seem to add the column to the df dataframe but when I use:

df.describe()

I still get 118 columns. Has anyone encountered this situation? The column seem to exist when calling df['x119'] but not shown in the description of df.describe().

EDIT: The values of F are categorical with numeric values of 1,2,3. The column 'x119' did not exist in df before and when I use df2=df and then df2.decribe() it works fine and I can see all columns.

1
  • It is categorical data with numeric labels: 1, 2, 3 Commented Sep 6, 2017 at 6:06

2 Answers 2

1

Case 1: all datatypes are numeric:

df.describe() works fine after df.assign(..) for numeric datatypes, here's a reproducible example:

>>> df = pd.DataFrame([[1,2],[3,4]], columns=list('AB'))
>>> df
   A  B
0  1  2
1  3  4
>>> import numpy as np 
>>> df["C"] = np.nan 
>>> df
   A  B   C
0  1  2 NaN
1  3  4 NaN
>>> df.describe()
              A         B    C
count  2.000000  2.000000  0.0
mean   2.000000  3.000000  NaN
std    1.414214  1.414214  NaN
min    1.000000  2.000000  NaN
25%    1.500000  2.500000  NaN
50%    2.000000  3.000000  NaN
75%    2.500000  3.500000  NaN
max    3.000000  4.000000  NaN
>>> df.assign(D=5)
   A  B   C  D
0  1  2 NaN  5
1  3  4 NaN  5
>>> df.describe()
              A         B    C
count  2.000000  2.000000  0.0
mean   2.000000  3.000000  NaN
std    1.414214  1.414214  NaN
min    1.000000  2.000000  NaN
25%    1.500000  2.500000  NaN
50%    2.000000  3.000000  NaN
75%    2.500000  3.500000  NaN
max    3.000000  4.000000  NaN
>>> df  = df.assign(D=5)
>>> df.describe()
              A         B    C    D
count  2.000000  2.000000  0.0  2.0
mean   2.000000  3.000000  NaN  5.0
std    1.414214  1.414214  NaN  0.0
min    1.000000  2.000000  NaN  5.0
25%    1.500000  2.500000  NaN  5.0
50%    2.000000  3.000000  NaN  5.0
75%    2.500000  3.500000  NaN  5.0
max    3.000000  4.000000  NaN  5.0
>>> 
  • Make sure you assign the result of df.assign back to df like df= df.assign(...)

Case 2: mixed numeric and object datatypes:

For mixed object and numeric datatypes, you need to do df.describe(include='all') as mentioned in the Notes section from the documentation here:

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

>>> df["E"] = ['1','2']
>>> df
   A  B   C  D  E
0  1  2 NaN  5  1
1  3  4 NaN  5  2
>>> df.describe()
              A         B    C    D
count  2.000000  2.000000  0.0  2.0
mean   2.000000  3.000000  NaN  5.0
std    1.414214  1.414214  NaN  0.0
min    1.000000  2.000000  NaN  5.0
25%    1.500000  2.500000  NaN  5.0
50%    2.000000  3.000000  NaN  5.0
75%    2.500000  3.500000  NaN  5.0
max    3.000000  4.000000  NaN  5.0
>>> df
   A  B   C  D  E
0  1  2 NaN  5  1
1  3  4 NaN  5  2
>>> 

so you need to call describe as follows:

>>> df.describe(include='all')
               A         B    C    D    E
count   2.000000  2.000000  0.0  2.0    2
unique       NaN       NaN  NaN  NaN    2
top          NaN       NaN  NaN  NaN    2
freq         NaN       NaN  NaN  NaN    1
mean    2.000000  3.000000  NaN  5.0  NaN
std     1.414214  1.414214  NaN  0.0  NaN
min     1.000000  2.000000  NaN  5.0  NaN
25%     1.500000  2.500000  NaN  5.0  NaN
50%     2.000000  3.000000  NaN  5.0  NaN
75%     2.500000  3.500000  NaN  5.0  NaN
max     3.000000  4.000000  NaN  5.0  NaN
>>> 
Sign up to request clarification or add additional context in comments.

3 Comments

Unfortuately it is not empty. If I use: df2=df and then do df2.decribe() it works fine
Thank you! The solution was the include='all' which included the categorical numeric data as well!
@AR_ yes that was it :)
1

I think problem should be x119 column was in df before, so only overwrite values.

You can check it by:

print (df['x119'])

Simpliest add new column is by:

print (len(df.columns))
df['x119'] = F
print (len(df.columns))

9 Comments

Thank you for your answer, just edited my post to clarify.
Ok, if check lengt of columns, are always same?
Tried exactly what you suggested. got 117 and 118 respectively. This is so weird :/
And print (len(df.columns)) before and after is same?
I didn't use include='all' for categorical data. Thank you for your time!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.