How to remove multiple headers row in a pandas data frame

Question

I have a csv file which i load to a database table using python and use pandas for transformations.

The file has repetitive headers after every certain number of rows as shown

 ProductID   Title             Date      Volume SalesAmount
    123         Face wash    6-17-2019     7       35
    124         Cleanser     6-17-2019     6       40
    125         Hair Spray   6-17-2019     3       33
    ProductID   Title         Date       Volume SalesAmount
    126         Hair Gel     6-17-2019     5       20
    127         Shampoo      6-17-2019     4       24
    128         Nail Varnish 6-17-2019     0        0
    ProductID   Title         Date       Volume SalesAmount
    129         Nail Color   6-17-2019     9       18
    130         Moisturizer  6-17-2019     3       27

And im desired output is one single header at top

  ProductID   Title             Date      Volume SalesAmount
    123         Face wash    6-17-2019     7       35
    124         Cleanser     6-17-2019     6       40
    125         Hair Spray   6-17-2019     3       33
    126         Hair Gel     6-17-2019     5       20
    127         Shampoo      6-17-2019     4       24
    128         Nail Varnish 6-17-2019     0        0
    129         Nail Color   6-17-2019     9       18
    130         Moisturizer  6-17-2019     3       27

I'm able to achieve it by index, by excluding the rows in pandas dataframe, but i want to know how to achieve the same using string comparison/regex in pandas or any better way of doing it.

df = df[df['ProductID'].ne('ProductID')]?

Quang Hoang
– Quang Hoang

2019-06-17 17:56:18 +00:00
Commented Jun 17, 2019 at 17:56 — Quang Hoang
– Quang Hoang, Commented Jun 17, 2019 at 17:56
Is the number of rows set? Are you aware of it before hand?

user3483203
– user3483203

2019-06-17 18:03:56 +00:00
Commented Jun 17, 2019 at 18:03 — user3483203
– user3483203, Commented Jun 17, 2019 at 18:03

Quang Hoang · Accepted Answer · 2019-06-17 18:02:39Z

4

A little more systematic than the comment that takes into account all columns:

df[df.ne(df.columns).any(1)]

Output:

  ProductID         Title       Date Volume SalesAmount
0       123     Face wash  6-17-2019      7         35
1       124      Cleanser  6-17-2019      6         40
2       125    Hair Spray  6-17-2019      3         33
4       126      Hair Gel  6-17-2019      5         20
5       127       Shampoo  6-17-2019      4         24
6       128  Nail Varnish  6-17-2019      0          0
8       129    Nail Color  6-17-2019      9         18
9       130   Moisturizer  6-17-2019      3         27

answered Jun 17, 2019 at 18:02

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Tejas Over a year ago

how does this work, i'm new to python, so your explanation would be helpful

Naik · Accepted Answer · 2019-06-17 17:59:27Z

1

One solution could be dropping those rows:

df = pd.read_csv('my_data.csv')

df = df[df['ProductID'] != 'ProductID']

answered Jun 17, 2019 at 17:59

Naik

1,27512 silver badges17 bronze badges

Collectives™ on Stack Overflow

How to remove multiple headers row in a pandas data frame

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related