4

Consider the following single index DataFrame:

      energy    fat
1      2000      28
2      1900      17
3      2200      30
4      1750      15
5      1800      18
6      1600      12

I also have a multindex Series:

1  vitamin-c    0.0004
   vitamin-a    0.0150
2  vitamin-c    0.0030
3  vitamin-d    1.2000
   vitamin-e    1.0007
   vitamin-c    1.2020
4  vitamin-a    0.0780
5  vitamin-b    0.9650
6  vitamin-e    1.9801
   vitamin-c    1.0011

How can I join the two so the result looks like this:

      energy    fat          vitamins
1      2000      28     vitamin-c    0.0004
                        vitamin-a    0.0150
2      1900      17     vitamin-c    0.0030
3      2200      30     vitamin-d    1.2000
                        vitamin-e    1.0007
                        vitamin-c    1.2020
4      1750      15     vitamin-a    0.0780
5      1800      18     vitamin-b    0.9650
6      1600      12     vitamin-e    1.9801
                        vitamin-c    1.0011

I tried df.join(series, how = 'inner') but all I got is the following error message:

"ValueError: cannot join with no level specified and no overlapping names"

Can someone please explain me what im doing wrong here and how i can achieve the combination of the two ? Thank you !

1
  • can you provide a reproducible example? Code to generate your Series would be very helpful. Commented Nov 6, 2017 at 20:20

3 Answers 3

5

Option 1
I don't suggest moving things into the index that shouldn't be there.
That said, you can use pd.DataFrame.join if your index levels are appropriately named, or rather they match so pandas knows what to join on.

df.rename_axis('ord').join(s.rename_axis(['ord', 'vit']).rename('val'))

               energy  fat     val
ord vit                           
1   vitamin-c    2000   28  0.0004
    vitamin-a    2000   28  0.0150
2   vitamin-c    1900   17  0.0030
3   vitamin-d    2200   30  1.2000
    vitamin-e    2200   30  1.0007
    vitamin-c    2200   30  1.2020
4   vitamin-a    1750   15  0.0780
5   vitamin-b    1800   18  0.9650
6   vitamin-e    1600   12  1.9801
    vitamin-c    1600   12  1.0011

In a couple more lines to add readability

s = s.rename_axis(['ord', 'vit']).rename('val')
df = df.rename_axis('ord')

df.join(s)

               energy  fat     val
ord vit                           
1   vitamin-c    2000   28  0.0004
    vitamin-a    2000   28  0.0150
2   vitamin-c    1900   17  0.0030
3   vitamin-d    2200   30  1.2000
    vitamin-e    2200   30  1.0007
    vitamin-c    2200   30  1.2020
4   vitamin-a    1750   15  0.0780
5   vitamin-b    1800   18  0.9650
6   vitamin-e    1600   12  1.9801
    vitamin-c    1600   12  1.0011

Option 2
We can also use pd.concat with loc and pd.Index.get_level_values

pd.concat(
    [df.loc[s.index.get_level_values(0)].set_index(s.index), s.rename('val')],
    axis=1
)

             energy  fat     val
1 vitamin-c    2000   28  0.0004
  vitamin-a    2000   28  0.0150
2 vitamin-c    1900   17  0.0030
3 vitamin-d    2200   30  1.2000
  vitamin-e    2200   30  1.0007
  vitamin-c    2200   30  1.2020
4 vitamin-a    1750   15  0.0780
5 vitamin-b    1800   18  0.9650
6 vitamin-e    1600   12  1.9801
  vitamin-c    1600   12  1.0011
Sign up to request clarification or add additional context in comments.

4 Comments

this is very smart !
rename_axis is nice way to do it!
Many thanks to all of you for your answers and @piRSquared for your comprehensive explanation. Is 'rename_axis()' any better than 'index.names = ()' ? (solution suggested by Andy Hayden)
@solub it serves the same purpose. The difference is that using rename_axis allows you to rename the index levels "inline". This facilitates some pipelining exercises, writing code with fewer lines of code, readability... point is that it is a subjective call and entirely up to you.
4

If you add names to the index/multiindex you can use a join:

In [11]: df
Out[11]:
   energy  fat
n
1    2000   28
2    1900   17
3    2200   30
4    1750   15
5    1800   18
6    1600   12

In [12]: df2
Out[12]:
                val
n vitamin
1 vitamin-c  0.0004
  vitamin-a  0.0150
2 vitamin-c  0.0030
3 vitamin-d  1.2000
  vitamin-e  1.0007
  vitamin-c  1.2020
4 vitamin-a  0.0780
5 vitamin-b  0.9650
6 vitamin-e  1.9801
  vitamin-c  1.0011

In [13]: df.join(df2)
Out[13]:
             energy  fat     val
n vitamin
1 vitamin-c    2000   28  0.0004
  vitamin-a    2000   28  0.0150
2 vitamin-c    1900   17  0.0030
3 vitamin-d    2200   30  1.2000
  vitamin-e    2200   30  1.0007
  vitamin-c    2200   30  1.2020
4 vitamin-a    1750   15  0.0780
5 vitamin-b    1800   18  0.9650
6 vitamin-e    1600   12  1.9801
  vitamin-c    1600   12  1.0011

Note: Do this by setting the .index.names:

In [21]: df.index.names = ["n"]  # or .name = "n"

In [22]: df2.index.names = ["n", "vitamin"]

Comments

2

Source sets:

In [96]: s
Out[96]:
id   vitamins
1.0  vitamin-c    0.0004
     vitamin-a    0.0150
2.0  vitamin-c    0.0030
3.0  vitamin-d    1.2000
     vitamin-e    1.0007
     vitamin-c    1.2020
4.0  vitamin-a    0.0780
5.0  vitamin-b    0.9650
6.0  vitamin-e    1.9801
     vitamin-c    1.0011
Name: val, dtype: float64

In [97]: df
Out[97]:
   energy  fat
1    2000   28
2    1900   17
3    2200   30
4    1750   15
5    1800   18
6    1600   12

Solution:

In [99]: s.reset_index() \
          .merge(df, left_on='id', right_index=True) \
          .set_index(['id','energy','fat','vitamins'])
Out[99]:
                             val
id  energy fat vitamins
1.0 2000   28  vitamin-c  0.0004
               vitamin-a  0.0150
2.0 1900   17  vitamin-c  0.0030
3.0 2200   30  vitamin-d  1.2000
               vitamin-e  1.0007
               vitamin-c  1.2020
4.0 1750   15  vitamin-a  0.0780
5.0 1800   18  vitamin-b  0.9650
6.0 1600   12  vitamin-e  1.9801
               vitamin-c  1.0011

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.