How to have multiple rows under the same row index using pandas

Question

I'm writing a script to normalise data from RT-PCR. I am reading the data from a tsv file and I'm struggling to put it into a pandas data frame so that it's usabale. The issue here is that the row index have the same name, is it possible to make it a hierarchal structure?

I'm using Python 3.6. I've tried .groupby() and .pivot() but I can't seem to get it to do what I want.

def calculate_peaks(file_path):
    peaks_tsv = pd.read_csv(file_path, sep='\t', header=0, index_col=0)

My input file is this: input file image

My expected output:

                EMB.brep1.peak  EMB.brep1.length  EMB.brep2.peak  EMB.brep2.length  EMB.brep3.peak  EMB.brep3.length
primer name
Hv161        0        19276            218.41           20947            218.39           21803            218.26
             1        22906            221.35           26317            221.17           26787            221.21
Hv223        0         4100            305.24            5247            305.37            4885            305.25
             1         2593            435.25            3035            435.30            2819            435.32
             2         4864            597.40            5286            597.20            4965            596.60

Actual Output:

             EMB.brep1.peak  EMB.brep1.length  EMB.brep2.peak  EMB.brep2.length  EMB.brep3.peak  EMB.brep3.length
primer name
Hv161                 19276            218.41           20947            218.39           21803            218.26
Hv161                 22906            221.35           26317            221.17           26787            221.21
Hv223                  4100            305.24            5247            305.37            4885            305.25
Hv223                  2593            435.25            3035            435.30            2819            435.32
Hv223                  4864            597.40            5286            597.20            4965            596.60

Possible duplicate of pandas DataFrame print index value only once — G. Anderson
– G. Anderson, Commented Aug 1, 2019 at 16:07

Quang Hoang · Accepted Answer · 2019-08-01 16:07:15Z

1

You can do this:

peaks_tsv = pd.read_csv(file_path, sep='\t', header=0)

peaks_tsv['idx'] = peaks_tsv.groupby('primer name').cumcount()

peaks_tsv.set_index(['primer name', 'idx'], inplace=True)

answered Aug 1, 2019 at 16:07

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to have multiple rows under the same row index using pandas

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related