How do I convert csv string to list in pandas?

Question

I'm working with a csv file that has the following format:

"Id","Sequence"
3,"1,3,13,87,1053,28576,2141733,508147108,402135275365,1073376057490373,9700385489355970183,298434346895322960005291,31479360095907908092817694945,11474377948948020660089085281068730"
7,"1,2,1,5,5,1,11,16,7,1,23,44,30,9,1,47,112,104,48,11,1,95,272,320,200,70,13,1,191,640,912,720,340,96,15,1,383,1472,2464,2352,1400,532,126,17,1,767,3328,6400,7168,5152,2464,784,160,19,1,1535,7424"
8,"1,2,4,5,8,10,16,20,32,40,64,80,128,160,256,320,512,640,1024,1280,2048,2560,4096,5120,8192,10240,16384,20480,32768,40960,65536,81920,131072,163840,262144,327680,524288,655360,1048576,1310720,2097152"
11,"1,8,25,83,274,2275,132224,1060067,3312425,10997342,36304451,301432950,17519415551,140456757358,438889687625,1457125820233,4810267148324,39939263006825,2321287521544174,18610239435360217"

I'd like to read this into a data frame with the type of df['Id'] to be integer-like and the type of df['Sequence'] to be list-like.

I currently have the following kludgy code:

def clean(seq_string):
    return list(map(int, seq_string.split(',')))

# Read data
training_data_file = "data/train.csv"    
train = pd.read_csv(training_data_file)
train['Sequence'] = list(map(clean, train['Sequence'].values))

This appears to work, but I feel like the same could be achieved natively using pandas and numpy.

Does anyone have a recommendation?

alecxe · Accepted Answer · 2016-07-03 14:53:06Z

5

You can specify a converter for the Sequence column:

converters: dict, default None

Dict of functions for converting values in certain columns. Keys can either be integers or column labels

train = pd.read_csv(training_data_file, converters={'Sequence': clean})

edited Jul 3, 2016 at 14:53

answered Jul 3, 2016 at 14:51

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

erip Over a year ago

Beautiful. Thought it would be something simple like this. :) Cheers!

akuiper · Accepted Answer · 2016-07-03 15:10:16Z

1

This also works, except that the Sequence is list of string instead of list of int:

df = pd.read_csv(training_data_file)
df['Sequence'] = df['Sequence'].str.split(',')

To convert each element to int:

df = pd.read_csv(training_data_file)
df['Sequence'] = df['Sequence'].str.split(',').apply(lambda s: list(map(int, s)))

edited Jul 3, 2016 at 15:10

answered Jul 3, 2016 at 14:57

akuiper

216k33 gold badges362 silver badges379 bronze badges

2 Comments

erip Over a year ago

And if I wanted to convert it to a list of int, I could just append .convert_objects(convert_numeric=True), right?

akuiper Over a year ago

It seems that the command has been deprecated, need to loop through the list and convert manually. But this gets back to original solution somehow.

Uwais Iqbal · Accepted Answer · 2019-07-04 09:47:56Z

1

An alternative solution is to use literal_eval from the ast module. literal_eval evaluates the string as input to the Python interpreter and should give you back the list as expected.

def clean(x):
    return literal_eval(x)

train = pd.read_csv(training_data_file, converters={'Sequence': clean})

answered Jul 4, 2019 at 9:47

Uwais Iqbal

1,0001 gold badge9 silver badges14 bronze badges

Collectives™ on Stack Overflow

How do I convert csv string to list in pandas?

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related