Python convert comma separated list to pandas dataframe

Question

I am struggling to convert a comma separated list into a multi column (7) data-frame.

print (type(mylist))

<type 'list'>
Print(mylist)


['AN,2__AAS000,26,20150826113000,-283.000,20150826120000,-283.000',         'AN,2__AE000,26,20150826113000,0.000,20150826120000,0.000',.........

The following creates a frame of a single column:

df = pd.DataFrame(mylist)

I have reviewed the inbuilt csv functionality for Pandas, however my csv data is held in a list. How can I simply covert the list into a 7 column data-frame.

Thanks in advance.

I can't reproduce your error : l=[['AA','2__000',26,20150826113000,-283.000,20150826120000,-283.000],['BB','2__DI9',26,20150826113000,0.000,20150826120000,0.000],[ 'CC','2__GH6',26,20150826113000,-269.000,20150826120000,-269.000]] pd.DataFrame(l) works fine — EdChum
– EdChum, Commented Aug 26, 2015 at 10:42
I have limited the results above as the are 2k rows. The dataframe is created however when i print(df) i get all the data followed by [1922 rows x 1 columns] — user636322
– user636322, Commented Aug 26, 2015 at 10:47
Can you post just the first few rows then, you have to show how the data is stored in your list so we can reproduce your error — EdChum
– EdChum, Commented Aug 26, 2015 at 10:49
As further background, the data is originally from a file which had mixture of CSV data and some metadata which i stripped out and passed the CSV rows to a list. — user636322
– user636322, Commented Aug 26, 2015 at 10:50

Padraic Cunningham · Accepted Answer · 2015-08-26 11:48:30Z

46

You need to split each string in your list:

import  pandas as pd

df = pd.DataFrame([sub.split(",") for sub in l])
print(df)

Output:

   0         1   2               3         4               5         6
0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000
1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000
2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000
3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000
4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000

If you know how many lines to skip in your csv you can do it all with read_csv using skiprows=lines_of_metadata:

import  pandas as pd

df = pd.read_csv("in.csv",skiprows=3,header=None)
print(df)

Or if each line of the metadata starts with a certain character you can use comment:

df = pd.read_csv("in.csv",header=None,comment="#")

If you need to specify more then one character you can combine itertools.takewhile which will drop lines starting with xxx:

import pandas as pd
from itertools import dropwhile
import csv
with open("in.csv") as f:
    f = dropwhile(lambda x: x.startswith("#!!"), f)
    r = csv.reader(f)
    df = pd.DataFrame().from_records(r)

Using your input data adding some lines starting with #!!:

#!! various
#!! metadata
#!! lines
AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000
AN,2__A000,26,20150826113000,0.000,20150826120000,0.000
AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000
AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000
AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000

Outputs:

    0         1   2               3         4               5         6
0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000
1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000
2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000
3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000
4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000

edited Aug 26, 2015 at 11:48

answered Aug 26, 2015 at 11:01

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user636322 Over a year ago

Great work, appreciate the help this worked perfectly. Im very happy.

Padraic Cunningham Over a year ago

@user636322, no worries, I added a couple of ways to do it with read_csv, what does the metadata actually look like, do you know how many lines are there or do the lines start with a common character?

user636322 Over a year ago

The metadata is basically repeating header information throughout the csv file. Im not able to predict location so i just used a loop to remove specifically( if row.startswith('xxx')).

Padraic Cunningham Over a year ago

@user636322, you can still do it when reading from the csv, what is the xxx in startswith('xxx')

user636322 Over a year ago

Im actually selecting the valid data with the loop, and therefore eliminating the invalid data, in the example above row.startswith('AN').

|

Mark.F · Accepted Answer · 2019-03-16 12:36:55Z

1

you can covert the list into a 7 column data-frame in the following way:

import pandas as pd

df = pd.read_csv(filename, sep=',')

edited Mar 16, 2019 at 12:36

Mark.F

1,6944 gold badges19 silver badges29 bronze badges

answered Mar 16, 2019 at 8:23

Wanji

192 bronze badges

1 Comment

Md.Sukel Ali Over a year ago

Try to add some description with your code. What is does ? Why it work ?

AFault · Accepted Answer · 2018-08-08 05:29:33Z

-1

I encounter a similar problem. I solve it by this way.

def lrsplit(line):
    left, *_ , right = line.split('-')
    mid = '-'.join(_)
    return left, mid, right.strip()
example = pd.DataFrame(lrsplit(line) for line in open("example.csv"))
example.columns = ['location', 'position', 'company']

Result:

    location    position    company
0   india   manager intel
1   india   sales-manager   amazon
2   banglore    ccm- head - county  jp morgan

answered Aug 8, 2018 at 5:29

AFault

92 bronze badges

Collectives™ on Stack Overflow

Python convert comma separated list to pandas dataframe

3 Answers 3

7 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related