How to use python pandas to combine the same name into one, without removing other columns

Question

Here is the example

    name        year     date      start opp
0   A.J. Price  2015     2014-12-02 No  MIL
1   A.J. Price  2015     2014-12-04 No  NYK
2   A.J. Price  2015     2014-12-05 No  TOR
3   A.J. Price  2015     2014-12-08 No  BRK
4   A.J. Price  2015     2014-12-09 No  TOR
318 Aaron       2015     2014-12-15 No  ATL
319 Aaron       2015     2014-12-18 No  NYK
320 Aaron       2015     2014-12-19 No  MEM

How to make the data frame above into something hierarchy like below

0   A.J. Price  2015     2014-12-02 No  MIL
                2015     2014-12-04 No  NYK
                2015     2014-12-05 No  TOR 
                2015     2014-12-08 No  BRK
                2015     2014-12-09 No  TOR  
318 Aaron       2015     2014-12-15 No  ATL
                2015     2014-12-18 No  NYK
                2015     2014-12-19 No  MEM

IIUC then df.set_index('name', append=True, inplace=True) should work, see the docs — EdChum
– EdChum, Commented Apr 8, 2016 at 16:55

Community · Accepted Answer · 2017-05-23 10:28:08Z

1

With the help given by EdChum here is how it can be done:

In [11]: df
Out[11]:
   name         year       date        start  opp
0  A.J. Price      2015    2014-12-02   No    MIL
1  A.J. Price      2015    2014-12-04   No    NYK
2  A.J. Price      2015    2014-12-05   No    TOR
3  A.J. Price      2015    2014-12-08   No    BRK
4  A.J. Price      2015    2014-12-09   No    TOR
5  Aaron           2015    2014-12-15   No    ATL
6  Aaron           2015    2014-12-18   No    NYK
7  Aaron           2015    2014-12-19   No    MEM

In [12]: df.set_index('name',inplace=True)
In [13]: df.set_index('year',append=True, inplace=True)

In [14]: df
Out[14]:
                          date  start  opp
name        year
A.J. Price  2015    2014-12-02   No    MIL
            2015    2014-12-04   No    NYK
            2015    2014-12-05   No    TOR
            2015    2014-12-08   No    BRK
            2015    2014-12-09   No    TOR
Aaron       2015    2014-12-15   No    ATL
            2015    2014-12-18   No    NYK
            2015    2014-12-19   No    MEM

Update:

While writing out multi-index tables (or pivot_tables) to_csv will write the hierarchical index of each row, resulting in data as follows:

However using to_excel will write the output like this by merging the rows of hierarchical index:

So if the concern is how to get back the multi-index while reading the csv back, use the index_col argument of the function:

pd.read_csv('input.csv', index_col=[0,1])

Here is another link that will help you to write the csv the way you wanted.

edited May 23, 2017 at 10:28

CommunityBot

11 silver badge

answered Apr 8, 2016 at 17:34

Abbas

4,0897 gold badges45 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kenneth Chan Over a year ago

This works in the console (print df). But when I output this dataframe to csv file (df.to_csv), all the repeated names are still there in csv.

Collectives™ on Stack Overflow

How to use python pandas to combine the same name into one, without removing other columns

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related