Concat DataFrame Reindexing only valid with uniquely valued Index objects

Question

I am trying to concat the following dataframes:

df1

                                price   side timestamp
timestamp           
2016-01-04 00:01:15.631331072   0.7286  2   1451865675631331
2016-01-04 00:01:15.631399936   0.7286  2   1451865675631400
2016-01-04 00:01:15.631860992   0.7286  2   1451865675631861
2016-01-04 00:01:15.631866112   0.7286  2   1451865675631866

and:

df2

                                bid     bid_size offer  offer_size
timestamp               
2016-01-04 00:00:31.331441920   0.7284  4000000 0.7285  1000000
2016-01-04 00:00:53.631324928   0.7284  4000000 0.7290  4000000
2016-01-04 00:01:03.131234048   0.7284  5000000 0.7286  4000000
2016-01-04 00:01:12.131444992   0.7285  1000000 0.7286  4000000
2016-01-04 00:01:15.631364096   0.7285  4000000 0.7290  4000000

With

 data = pd.concat([df1,df2], axis=1)

But I get the follwing output:

InvalidIndexError                         Traceback (most recent call last)
<ipython-input-38-2e88458f01d7> in <module>()
----> 1 data = pd.concat([df1,df2], axis=1)
      2 data = data.fillna(method='pad')
      3 data = data.fillna(method='bfill')
      4 data['timestamp'] =  data.index.values#converting to datetime
      5 data['timestamp'] = pd.to_datetime(data['timestamp'])#converting to datetime

/usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    810                        keys=keys, levels=levels, names=names,
    811                        verify_integrity=verify_integrity,
--> 812                        copy=copy)
    813     return op.get_result()
    814 

/usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy)
    947         self.copy = copy
    948 
--> 949         self.new_axes = self._get_new_axes()
    950 
    951     def get_result(self):

/usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_new_axes(self)
   1013                 if i == self.axis:
   1014                     continue
-> 1015                 new_axes[i] = self._get_comb_axis(i)
   1016         else:
   1017             if len(self.join_axes) != ndim - 1:

/usr/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in _get_comb_axis(self, i)
   1039                 raise TypeError("Cannot concatenate list of %s" % types)
   1040 
-> 1041         return _get_combined_index(all_indexes, intersect=self.intersect)
   1042 
   1043     def _get_concat_axis(self):

/usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in _get_combined_index(indexes, intersect)
   6120             index = index.intersection(other)
   6121         return index
-> 6122     union = _union_indexes(indexes)
   6123     return _ensure_index(union)
   6124 

/usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in _union_indexes(indexes)
   6149 
   6150         if hasattr(result, 'union_many'):
-> 6151             return result.union_many(indexes[1:])
   6152         else:
   6153             for other in indexes[1:]:

/usr/local/lib/python2.7/site-packages/pandas/tseries/index.pyc in union_many(self, others)
    959             else:
    960                 tz = this.tz
--> 961                 this = Index.union(this, other)
    962                 if isinstance(this, DatetimeIndex):
    963                     this.tz = tz

/usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in union(self, other)
   1553                 result.extend([x for x in other._values if x not in value_set])
   1554         else:
-> 1555             indexer = self.get_indexer(other)
   1556             indexer, = (indexer == -1).nonzero()
   1557 

/usr/local/lib/python2.7/site-packages/pandas/core/index.pyc in get_indexer(self, target, method, limit, tolerance)
   1890 
   1891         if not self.is_unique:
-> 1892             raise InvalidIndexError('Reindexing only valid with uniquely'
   1893                                     ' valued Index objects')
   1894 

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

I have removed additional columns and removed duplicates and NA where there could be a conflict - but I simply do not know what's wrong.

@gmoshkin, I am assuming that pd as an alias for pandas: import pandas as pd, and df1 and df2 are pandas DataFrame objects — jugovich
– jugovich, Commented Nov 16, 2017 at 2:30
something else to look for, I had a similar error because I had duplicate columns in my dataframe — Tom G.
– Tom G., Commented Apr 14, 2021 at 18:28
Attention, this error could also be caused if there are duplicated columns in one of the dataframes (github.com/pandas-dev/pandas/pull/38654) — Angel
– Angel, Commented May 20, 2021 at 15:40

Angel · Accepted Answer · 2021-06-30 07:32:04Z

163

Duplicated column names!

In my case the problem was because I had duplicated column names.

answered Jun 30, 2021 at 7:32

Angel

3,1724 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

jsta Over a year ago

See stackoverflow.com/a/40435354/3362993 for code to remove duplicated column names

LanternD Over a year ago

This is a nice hint! In my case, I use df1 = df1.append(df2) to concatenate. However, 2 out of the 16 columns are different, causing the issue.

Herbert Over a year ago

I think the error usually occurs with column names, because mostly people will concatenate over the row-axis (i.e. axis=0 or unspecified). In this case the index must be unique axis=1. With axis=0 the indices don't need to be unique, but the columns do. This is because otherwise it's ambiguous how to align the duplicate columns among different dataframes. In the case of axis=1 the columns can have duplicates, but the indices must be unique with the same reasoning, e.g. which rows should be aligned across dataframes.

Kaizzen Over a year ago

This happened in my case because I converted all column names to upper case, making two of them the same.

Lucas Aug 28 at 6:30

Tried to have AI debugging for me. It run but totally messed up the outcome lol I wish it tell me that there's duplicated col name in my data instead of working around just to run the program. This answer saved me. It turns out that there's dup cols that was hidden away :))

|

Nicholas Morley · Accepted Answer · 2017-07-12 11:23:15Z

117

You can mitigate this error without having to change your data or remove duplicates. Just create a new index with DataFrame.reset_index:

df = df.reset_index()

The old index is kept as a column in your dataframe, but if you don't need it you can do:

df = df.reset_index(drop=True)

Some prefer:

df.reset_index(inplace=True, drop=True)

edited Jul 12, 2017 at 11:23

answered Jul 12, 2017 at 11:16

Nicholas Morley

4,2603 gold badges20 silver badges14 bronze badges

9 Comments

Juanu Over a year ago

This should be marked asd the right answer, since it solves the problem without loosing information. Not always duplicate records are wrong on the datasets.

BHouwens Over a year ago

I still get the error, even I perform this transformation on the dataset

Soerendip Over a year ago

What if this does not work? pd.concat([df_1a.reset_index()[common_features], df_1b.reset_index()[common_features]]) > InvalidIndexError: Reindexing only valid with uniquely valued Index objects

João Over a year ago

In my case, I had a duplicated column (hope it helps someone else!).

JQTs Over a year ago

For me I need to change a series from an object to a datetime. On one series it works fine but on the other 3 i get the above errors. So my questions is why does my df allow me to do it for one series but not another?

|

Community · Accepted Answer · 2017-05-23 12:34:24Z

pd.concat requires that the indices be unique. To remove rows with duplicate indices, use

df = df.loc[~df.index.duplicated(keep='first')]

import pandas as pd
from pandas import Timestamp

df1 = pd.DataFrame(
    {'price': [0.7286, 0.7286, 0.7286, 0.7286],
     'side': [2, 2, 2, 2],
     'timestamp': [1451865675631331, 1451865675631400,
                  1451865675631861, 1451865675631866]},
    index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))


df2 = pd.DataFrame(
    {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],
     'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],
     'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],
     'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},
    index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))


df1 = df1.loc[~df1.index.duplicated(keep='first')]
#              price  side         timestamp
# 2000-01-01  0.7286     2  1451865675631331
# 2001-01-01  0.7286     2  1451865675631861
# 2002-01-01  0.7286     2  1451865675631866

df2 = df2.loc[~df2.index.duplicated(keep='first')]
#                bid  bid_size   offer  offer_size
# 2000-01-01  0.7284   4000000  0.7285     1000000
# 2001-01-01  0.7284   4000000  0.7290     4000000
# 2002-01-01  0.7284   5000000  0.7286     4000000
# 2003-01-01  0.7285   1000000  0.7286     4000000
# 2004-01-01  0.7285   4000000  0.7290     4000000

result = pd.concat([df1, df2], axis=0)
print(result)
               bid  bid_size   offer  offer_size   price  side     timestamp
2000-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15
2001-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15
2002-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+15
2000-01-01  0.7284   4000000  0.7285     1000000     NaN   NaN           NaN
2001-01-01  0.7284   4000000  0.7290     4000000     NaN   NaN           NaN
2002-01-01  0.7284   5000000  0.7286     4000000     NaN   NaN           NaN
2003-01-01  0.7285   1000000  0.7286     4000000     NaN   NaN           NaN
2004-01-01  0.7285   4000000  0.7290     4000000     NaN   NaN           NaN

Note there is also pd.join, which can join DataFrames based on their indices, and handle non-unique indices based on the how parameter. Rows with duplicate index are not removed.

In [94]: df1.join(df2)
Out[94]: 
             price  side         timestamp     bid  bid_size   offer  \
2000-01-01  0.7286     2  1451865675631331  0.7284   4000000  0.7285   
2000-01-01  0.7286     2  1451865675631400  0.7284   4000000  0.7285   
2001-01-01  0.7286     2  1451865675631861  0.7284   4000000  0.7290   
2002-01-01  0.7286     2  1451865675631866  0.7284   5000000  0.7286   

            offer_size  
2000-01-01     1000000  
2000-01-01     1000000  
2001-01-01     4000000  
2002-01-01     4000000  

In [95]: df1.join(df2, how='outer')
Out[95]: 
             price  side     timestamp     bid  bid_size   offer  offer_size
2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000
2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     1000000
2001-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7290     4000000
2002-01-01  0.7286     2  1.451866e+15  0.7284   5000000  0.7286     4000000
2003-01-01     NaN   NaN           NaN  0.7285   1000000  0.7286     4000000
2004-01-01     NaN   NaN           NaN  0.7285   4000000  0.7290     4000000

If the columns are duplicated, it's df = df.loc[:,~df.columns.duplicated(keep='first')]

Anuj Sharma · Accepted Answer · 2021-12-14 20:48:08Z

32

This post comes up top when you search for the error but the answers are not complete, so let me add mine. There is another reason this error can happen: If you have duplicate columns in your data frames, you will not be able to concatenate and raise this. In fact, even in the original question there are two columns called timestamp. So it will be better to check if len(df.columns) == len(set(df.columns)) for all the data frames you are trying to concatenate.

edited Dec 14, 2021 at 20:48

Anuj Sharma

5376 silver badges16 bronze badges

answered Feb 9, 2021 at 17:50

Darina

1,65210 silver badges19 bronze badges

3 Comments

Lewis Morris Over a year ago

dupe columns was the problem for me.

Wouter Over a year ago

Or you can use: assert df.columns.is_unique, df.loc[:,df.columns.duplicated()] : this asserts that the columns are unique and shows the duplicate columns if they are not unique

ximix424 Over a year ago

This was also the issue for me. Thanks!

Xieyi · Accepted Answer · 2021-03-26 03:00:21Z

8

As a complement of Nicholas Morley's answer, when you find even this not works:

df = df.reset_index(drop=True)

You should check whether the columns are unique. When they are not, even reseting index not works. Duplicated columns should be removed first to make it works.

answered Mar 26, 2021 at 3:00

Xieyi

1,3281 gold badge14 silver badges19 bronze badges

Comments

Jane Kathambi · Accepted Answer · 2022-05-31 12:20:42Z

5

This is because you have duplicated columns. Before concatenating drop duplicated columns in each DataFrame as follows:

df = df.loc[:,~df.columns.duplicated()].reset_index(drop=True)

answered May 31, 2022 at 12:20

Jane Kathambi

96510 silver badges10 bronze badges

Comments

Miguel Gutierrez · Accepted Answer · 2021-10-04 19:50:37Z

4

This happens also when you have duplicates in the columns names.

answered Oct 4, 2021 at 19:50

Miguel Gutierrez

1592 silver badges8 bronze badges

Comments

s510 · Accepted Answer · 2023-09-26 11:20:26Z

4

The indexes of your two dataframes don't match.

When pandas is performing concat operation horizontally (axis=1) it tries to find rows with same indexes and join them horizontally. So row with index 1 from df1 will be matched with row with index 1. Therefore reset index in either one of them or both if both have troubled indexes.

pd.concat([df1.reset_index(drop=True), df2], axis=1)

or

pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True), axis=1])

answered Sep 26, 2023 at 11:20

s510

2,88219 silver badges25 bronze badges

Comments

Sayyor Y · Accepted Answer · 2021-12-13 13:59:24Z

3

Same Indices Between the Two DFs

Another reason for this issue might be that df1 and df2 might have the same indices, between each other. For example, both the dfs might have the same index idx1.

To check if this is the issue, you can see if the following outputs not an empty list:

print([org_name for org_name in cum_df.index if org_name in df_from_2002.index])

My suggested solution then would be to rename the indices (so df1 would keep having idx1 and you would change idx1 to idx2 in df2) and after concatenating (df1 = pd.concat([df1, df2])), combine the two indices (in case you need to get the sum of them) with this code:

df1.iloc[idx1] = df1.iloc[[idx1, idx2]].sum()

and then remove idx2:

df1.drop([idx2], inplace=True)

edited Dec 13, 2021 at 13:59

answered Dec 13, 2021 at 13:35

Sayyor Y

1,4043 gold badges17 silver badges32 bronze badges

Comments

Vini Machado · Accepted Answer · 2022-03-11 04:08:29Z

2

This happened to me when I was trying to concat two dataframes that have duplicated column names!

Let's say that I want to remove the first duplicated column:

duplicated_column = 'column'

df_tmp = df[duplicated_column].T
df_tmp = df_tmp.iloc[1: , :]

df = df.drop([duplicated_column], axis=1)
df = pd.concat([df, df_tmp.T], axis=1)

answered Mar 11, 2022 at 4:08

Vini Machado

211 bronze badge

Comments

Andru · Accepted Answer · 2022-07-26 08:28:38Z

2

The problem for me was duplicate column labels, just as many others here mentioned it. To keep only the first column for duplicates I used below:

df=df.T[~df.T.index.duplicated(keep='first')].T

answered Jul 26, 2022 at 8:28

Andru

233 bronze badges

Comments

shiva · Accepted Answer · 2022-05-06 16:45:11Z

Answers here helped but concat worked fine for me in some cases even where duplicate columns were present. However, in some cases it didn't work and raised the InvalidIndexError.

It turned out that it works fine if order of duplicate columns is same but raises an error if order of duplicate columns is different.

Example where it works fine:

df = pd.DataFrame({'a': [1, 2, 3], 'b': [5, 6, 7], 'c': [9, 10, 11]})
df1 = pd.DataFrame({'a': [12], 'b': [13], 'c': [14]})
df.rename(columns={
    'c': 'b'
}, inplace=True)
df1.rename(columns={
    'c': 'b'
}, inplace=True)
print(pd.concat([df, df1]))

Output:
    a   b   b
0   1   5   9
1   2   6  10
2   3   7  11
0  12  13  14

Example where it doesn't work:

df = pd.DataFrame({'b': [1, 2, 3], 'a': [5, 6, 7], 'c': [9, 10, 11]})
df1 = pd.DataFrame({'a': [12], 'b': [13], 'c': [14]})
df.rename(columns={
    'c': 'b'
}, inplace=True)
df1.rename(columns={
    'c': 'b'
}, inplace=True)
print(pd.concat([df, df1]))

Output:
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely 
valued Index objects

jawad.shaik shaik · Accepted Answer · 2023-05-29 21:08:23Z

0

below solution would work if you are concat is using axis=0, which means you want to append rows not columns

reason : one or both of your dataframes might have duplicate columns df1 columns could be A, B, C , C df2 columns could be B, D

in this case A has duplicate column C, and for this reason you might get this error. Drop one of the C column in df1 and hopefully issue will be resolved

df1['C'].is_unique

answered May 29, 2023 at 21:08

jawad.shaik shaik

211 bronze badge

Comments

Alez · Accepted Answer · 2023-12-07 12:14:41Z

0

Happened the same with me, but then I noticed that my index (datetime) has different last dates. When I fixed the date using the same timeframe interval for both dataframes the pd.concat() worked fine.

edited Dec 7, 2023 at 12:14

Alez

2,87710 gold badges29 silver badges38 bronze badges

answered Dec 4, 2023 at 22:53

Filipe Papa

1

Comments

Srikrishna Krishnarao Srinivas · Accepted Answer · 2024-04-06 20:18:52Z

0

df1 = pd.DataFrame.from_dict({'A':[1,2,3], 'B':[110,120,230]})
df2 = pd.DataFrame.from_dict({'B':[11,12,23], 'C':[10,11,12]})
df2.rename(columns={'C':'B'}, inplace=True)
pd.concat([df1,df2])

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

The above code will replicate the error when concatenating rows of dataframe. The reason the error occurs is because, there are two columns in df2 with the same name 'B'

    A   B
0   1   110
1   2   120
2   3   230

    B   B
0   11  10
1   12  11
2   23  12

answered Apr 6, 2024 at 20:18

Srikrishna Krishnarao Srinivas

559 bronze badges

Comments

herdek550 · Accepted Answer · 2025-02-27 08:55:11Z

0

In my case, one of the pandas DataFrames had multiple columns with the same name

answered Feb 27 at 8:55

herdek550

7051 gold badge8 silver badges17 bronze badges

Comments

Zoe - Save the data dump · Accepted Answer · 2020-04-05 16:20:09Z

-1

best solution from this page: https://pandas.pydata.org/pandas-docs/version/0.20/merging.html

df = pd.concat([df1, df2], axis=1, join_axes=[df1.index])

edited Apr 5, 2020 at 16:20

Zoe - Save the data dump

28.4k22 gold badges130 silver badges163 bronze badges

answered Apr 5, 2020 at 16:18

Yapi

3142 silver badges7 bronze badges

1 Comment

mdrishan Over a year ago

TypeError: concat() got an unexpected keyword argument 'join_axes'

Collectives™ on Stack Overflow

Concat DataFrame Reindexing only valid with uniquely valued Index objects

17 Answers 17

Duplicated column names!

9 Comments

9 Comments

1 Comment

3 Comments

Comments

Comments

Comments

The indexes of your two dataframes don't match.

Comments

Same Indices Between the Two DFs

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

17 Answers 17

Duplicated column names!

9 Comments

9 Comments

1 Comment

3 Comments

Comments

Comments

Comments

The indexes of your two dataframes don't match.

Comments

Same Indices Between the Two DFs

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related