Importing and formatting a CSV in Python for plotting

Question

I am fairly new to Python but understand what it is I need help with.

I am trying to import a csv file which looks like this:

Title of Project
name1, Info 1
name2, Info 2
name3, Info 3
name4, Info 4
name5, Info 5
a,b,c,d,e,f,g,h,i,j,k,l,m
a1,b1,c1,d1,e1,f1,g1,h1,i1,j1,k1,l1,m1
...
...
an,bn,cn,dn,en,fn,gn,hn,in,jn,jn,kn,ln,mn

All of the commas are actually delimited by '\t' and I don't know how to deal with that?

So overall, I want to remove the title and the next 5 rows and only work with the data from row 7 to the nth row in the file for plotting purposes.

The row that holds 'a,b,c,d,etc' need to be headers and the data under it should be separated accordingly into the columns.

I need help with syntax and understanding the most efficient way to do this. Any help or guidance would be much appreciated!

Thanks.

That's not a properly-formatted CSV file - although that can probably be worked-around. However you haven't indicated what module you intend to use to plot the data, so no one can help with that. — martineau
– martineau, Commented Feb 3, 2022 at 22:11

Zach Young · Accepted Answer · 2022-02-04 05:51:18Z

You can use Python's built-in CSV module to handle this.

My example reads your sample data with a tab-delimiter, discards the first 6 rows, and saves the rest of the rows to write out to a new comma-delimited file.

You first need to open the file for reading and create a CSV reader object:

import csv

with open('input.csv', newline='') as f:
    reader = csv.reader(f, delimiter='\t')
    ...

You can iterate over the rows of the reader a couple of different ways.

In your case, to skip a number of rows, call next(reader) a number of times in sequence, ignoring the data it returns:

    ...
    for i in range(1, 7):
        next(reader)  # manually advance reader / discard rows
    ...

You can also iterate the rows of the reader with a standard for-loop. I'm creating the all_rows list to hold the rows you really care about:

    ...
    all_rows = []
    for row in reader:
        all_rows.append(row)
    ...

For the standard reader object, there's nothing special about headers, it's totally up to you if it exists or has any special value... to the reader, it's just a row like any other:

print(all_rows)

[
 ['a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' , 'h ', 'i' , 'j' , 'k' , 'l' , 'm'],
 ['a1', 'b1', 'c1', 'd1', 'e1', 'f1', 'g1', 'h1', 'i1', 'j1', 'k1', 'l1', 'm1'],
 ['an', 'bn', 'cn', 'dn', 'en', 'fn', 'gn', 'hn', 'in', 'jn', 'kn', 'ln', 'mn']
]

Now, you can do whatever you need with all_rows, feed it to a plotter, or save it as a cleaned-up CSV:

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(all_rows)

Here's the official docs for the CSV module. It's a good reference to remind of you how it works, once you have the hang of it, but I don't think it's really very good to get a basic understanding of what's going on.

I've written my own how-to CSV w/Python guide. I take a little more time to go over more basic things, but it's the simple things I see people here on SO getting tripped up on.

This is the perfect response and has helped me think through this approach immensely. Thank you good sir! I will most certainly take a look at your guides.

Collectives™ on Stack Overflow

Importing and formatting a CSV in Python for plotting

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related