1

I have two dataframes, structured something like

# df1
                        data1   data2
id      feature_count   
12345   1               111     888
        2               222     999
        3               333     101010
45678   0               444     111111
        2               555     121212
        3               666     131313
        4               777     141414

and

# df2
        descriptor
id
12345   "foo"
45678   "bar"

Based on this solution it seems like I should simply be able to do df1.join(df2) to get the desired result

#joined
                        data1   data2   descriptor
id      feature_count   
12345   1               111     888     "foo"
        2               222     999     "foo"
        3               333     101010  "foo"
45678   0               444     111111  "bar"
        2               555     121212  "bar"
        3               666     131313  "bar"
        4               777     141414  "bar"

However, what I actually get is NotImplementedError: Index._join_level on non-unique index is not implemented in Pandas 1.0.5.

This seems like it shouldn't be complicated, but I'm clearly misunderstanding something. All I'm looking for is to append the column of unique mappings in df2 on to the (guaranteed existing mapping) first index of df1.

2
  • I can't duplicate your error. I have version 1.0.5 too. Your example code works for me. Does your example code work for you or do you get the error on the larger dataset? Commented Sep 26, 2020 at 0:04
  • It was a much larger dataset pulled in from a SQL query. After I accepted the answer (and a different error proved to be more informative) I think that the query had unexpected duplication (which I'm surprised Pandas let me assign an index to). Commented Sep 28, 2020 at 17:13

1 Answer 1

1

Since you only need to map one column, just do:

df1['descriptor'] = df1.index.get_level_values('id').map(df2['descriptor'])

In general, you can temporarily reset the other index, join the dataframes, and set it back:

df1.reset_index('feature_count').join(df2).set_index('feature_count', append=True)

Output:

                     data1   data2 descriptor
id    feature_count                          
12345 1                111     888      "foo"
      2                222     999      "foo"
      3                333  101010      "foo"
45678 0                444  111111      "bar"
      2                555  121212      "bar"
      3                666  131313      "bar"
      4                777  141414      "bar"
Sign up to request clarification or add additional context in comments.

1 Comment

Your first case still gave me an error, but your second worked like a charm. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.