10

I want to assign values to the diagonal of a dataframe. The fastest way I can think of is to use numpy's np.diag_indices and do a slice assignment on the values array. However, the values array is only a view and ready to accept assignment when a dataframe is of a single dtype

Consider the dataframes d1 and d2

d1 = pd.DataFrame(np.ones((3, 3), dtype=int), columns=['A', 'B', 'C'])
d2 = pd.DataFrame(dict(A=[1, 1, 1], B=[1., 1., 1.], C=[1, 1, 1]))

d1

   A  B  C
0  0  1  1
1  1  0  1
2  1  1  0

d2

   A    B  C
0  1  1.0  1
1  1  1.0  1
2  1  1.0  1

Then let's get our indices

i, j = np.diag_indices(3)

d1 is of a single dtype and therefore, this works

d1.values[i, j] = 0
d1

   A  B  C
0  0  1  1
1  1  0  1
2  1  1  0

But not on d2

d2.values[i, j] = 0
d2

   A    B  C
0  1  1.0  1
1  1  1.0  1
2  1  1.0  1

I need to write a function and make it fail when df is of mixed dtype. How do I test that it is? Should I trust that if it is, this assignment via the view will always work?

2
  • 1
    You inspect d1.dtypes which is a Series itself, and you check if all have the same value. Commented Oct 17, 2017 at 15:52
  • 6
    You mean d2.dtypes.nunique()>1? Commented Oct 17, 2017 at 15:52

3 Answers 3

14

You could use internal _is_mixed_type method

In [3600]: d2._is_mixed_type
Out[3600]: True

In [3601]: d1._is_mixed_type
Out[3601]: False

Or, check unique dtypes

In [3602]: d1.dtypes.nunique()>1
Out[3602]: False

In [3603]: d2.dtypes.nunique()>1
Out[3603]: True

A bit of de-tour, is_mixed_type checks how blocks are consolidated.

In [3618]: len(d1.blocks)>1
Out[3618]: False

In [3619]: len(d2.blocks)>1
Out[3619]: True

In [3620]: d1.blocks    # same as d1.as_blocks()
Out[3620]:
{'int32':    A  B  C
 0  0  1  1
 1  1  0  1
 2  1  1  0}

In [3621]: d2.blocks
Out[3621]:
{'float64':      B
 0  1.0
 1  1.0
 2  1.0, 'int64':    A  C
 0  1  1
 1  1  1
 2  1  1}
Sign up to request clarification or add additional context in comments.

2 Comments

I have seen this so many times in the source code :) though couldnt come up with this answer +1
I'm trying to use the method is_mixed_type, but it says it's not defined. Has this changed in more recent Python versions?
3
def check_type(df):
  return len(set(df.dtypes)) == 1

or

 def check_type(df):
   return df.dtypes.nunique() == 1

2 Comments

A set(..) is never equal to 1. It can be equal to {1} (although not here). You probably want len(set(df.dtypes)).
@WillemVanOnsem should be len
1

You can inspect DataFrame.dtypes to check the types of the columns. For instance:

>>> d1.dtypes
A    int64
B    int64
C    int64
dtype: object
>>> d2.dtypes
A      int64
B    float64
C      int64
dtype: object

Given that there is at least one column, you can thus check this with:

np.all(d1.dtypes == d1.dtypes[0])

For your dataframes:

>>> np.all(d1.dtypes == d1.dtypes[0])
True
>>> np.all(d2.dtypes == d2.dtypes[0])
False

You can of course first check whether there is at least one column. So we can construct a function:

def all_columns_same_type(df):
    dtypes = df.dtypes
    return not dtypes.empty and np.all(dtypes == dtypes[0])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.