1

I am fairly new to Python but understand what it is I need help with.

I am trying to import a csv file which looks like this:

Title of Project
name1, Info 1
name2, Info 2
name3, Info 3
name4, Info 4
name5, Info 5
a,b,c,d,e,f,g,h,i,j,k,l,m
a1,b1,c1,d1,e1,f1,g1,h1,i1,j1,k1,l1,m1
...
...
an,bn,cn,dn,en,fn,gn,hn,in,jn,jn,kn,ln,mn

All of the commas are actually delimited by '\t' and I don't know how to deal with that?

So overall, I want to remove the title and the next 5 rows and only work with the data from row 7 to the nth row in the file for plotting purposes.

The row that holds 'a,b,c,d,etc' need to be headers and the data under it should be separated accordingly into the columns.

I need help with syntax and understanding the most efficient way to do this. Any help or guidance would be much appreciated!

Thanks.

1
  • That's not a properly-formatted CSV file - although that can probably be worked-around. However you haven't indicated what module you intend to use to plot the data, so no one can help with that. Commented Feb 3, 2022 at 22:11

1 Answer 1

1

You can use Python's built-in CSV module to handle this.

My example reads your sample data with a tab-delimiter, discards the first 6 rows, and saves the rest of the rows to write out to a new comma-delimited file.

You first need to open the file for reading and create a CSV reader object:

import csv

with open('input.csv', newline='') as f:
    reader = csv.reader(f, delimiter='\t')
    ...

You can iterate over the rows of the reader a couple of different ways.

In your case, to skip a number of rows, call next(reader) a number of times in sequence, ignoring the data it returns:

    ...
    for i in range(1, 7):
        next(reader)  # manually advance reader / discard rows
    ...

You can also iterate the rows of the reader with a standard for-loop. I'm creating the all_rows list to hold the rows you really care about:

    ...
    all_rows = []
    for row in reader:
        all_rows.append(row)
    ...

For the standard reader object, there's nothing special about headers, it's totally up to you if it exists or has any special value... to the reader, it's just a row like any other:

print(all_rows)
[
 ['a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'h ', 'i' , 'j' , 'k' , 'l' , 'm'],
 ['a1', 'b1', 'c1', 'd1', 'e1', 'f1', 'g1', 'h1', 'i1', 'j1', 'k1', 'l1', 'm1'],
 ['an', 'bn', 'cn', 'dn', 'en', 'fn', 'gn', 'hn', 'in', 'jn', 'kn', 'ln', 'mn']
]

Now, you can do whatever you need with all_rows, feed it to a plotter, or save it as a cleaned-up CSV:

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(all_rows)

Here's the official docs for the CSV module. It's a good reference to remind of you how it works, once you have the hang of it, but I don't think it's really very good to get a basic understanding of what's going on.

I've written my own how-to CSV w/Python guide. I take a little more time to go over more basic things, but it's the simple things I see people here on SO getting tripped up on.

Sign up to request clarification or add additional context in comments.

1 Comment

This is the perfect response and has helped me think through this approach immensely. Thank you good sir! I will most certainly take a look at your guides.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.