1

I have a csv file which i load to a database table using python and use pandas for transformations.

The file has repetitive headers after every certain number of rows as shown

 ProductID   Title             Date      Volume SalesAmount
    123         Face wash    6-17-2019     7       35
    124         Cleanser     6-17-2019     6       40
    125         Hair Spray   6-17-2019     3       33
    ProductID   Title         Date       Volume SalesAmount
    126         Hair Gel     6-17-2019     5       20
    127         Shampoo      6-17-2019     4       24
    128         Nail Varnish 6-17-2019     0        0
    ProductID   Title         Date       Volume SalesAmount
    129         Nail Color   6-17-2019     9       18
    130         Moisturizer  6-17-2019     3       27

And im desired output is one single header at top

  ProductID   Title             Date      Volume SalesAmount
    123         Face wash    6-17-2019     7       35
    124         Cleanser     6-17-2019     6       40
    125         Hair Spray   6-17-2019     3       33
    126         Hair Gel     6-17-2019     5       20
    127         Shampoo      6-17-2019     4       24
    128         Nail Varnish 6-17-2019     0        0
    129         Nail Color   6-17-2019     9       18
    130         Moisturizer  6-17-2019     3       27

I'm able to achieve it by index, by excluding the rows in pandas dataframe, but i want to know how to achieve the same using string comparison/regex in pandas or any better way of doing it.

2
  • 3
    df = df[df['ProductID'].ne('ProductID')]? Commented Jun 17, 2019 at 17:56
  • Is the number of rows set? Are you aware of it before hand? Commented Jun 17, 2019 at 18:03

2 Answers 2

4

A little more systematic than the comment that takes into account all columns:

df[df.ne(df.columns).any(1)]

Output:

  ProductID         Title       Date Volume SalesAmount
0       123     Face wash  6-17-2019      7         35
1       124      Cleanser  6-17-2019      6         40
2       125    Hair Spray  6-17-2019      3         33
4       126      Hair Gel  6-17-2019      5         20
5       127       Shampoo  6-17-2019      4         24
6       128  Nail Varnish  6-17-2019      0          0
8       129    Nail Color  6-17-2019      9         18
9       130   Moisturizer  6-17-2019      3         27
Sign up to request clarification or add additional context in comments.

1 Comment

how does this work, i'm new to python, so your explanation would be helpful
1

One solution could be dropping those rows:

df = pd.read_csv('my_data.csv')

df = df[df['ProductID'] != 'ProductID']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.