Pandas DataFrame column to list [duplicate]

Question

I am pulling a subset of data from a column based on conditions in another column being met.

I can get the correct values back but it is in pandas.core.frame.DataFrame. How do I convert that to list?

import pandas as pd

tst = pd.read_csv('C:\\SomeCSV.csv')

lookupValue = tst['SomeCol'] == "SomeValue"
ID = tst[lookupValue][['SomeCol']]
#How To convert ID to a list

I am hesitant to edit a question this old and with so many views, but it should be pointed out that although the title talks about a "dataframe to list", the question is about a "series to list". And note that while tst is a dataframe, tst['SomeCol'] is a series. The distinction matters in that the tolist() method works directly on a series, but not on a dataframe. — JohnE
– JohnE, Commented Dec 7, 2016 at 14:36
Note that it may actually be more convenient to use the DataFrame than lists. — Burrito
– Burrito, Commented Feb 2, 2018 at 20:20
If you came here looking to find out how to convert a DATAFRAME to a list (of lists), check out this thread: stackoverflow.com/questions/28006793/… — cs95
– cs95, Commented Apr 17, 2019 at 10:16

AMC · Accepted Answer · 2020-05-01 11:29:15Z

348

You can use the Series.to_list method.

For example:

import pandas as pd

df = pd.DataFrame({'a': [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9],
                   'b': [3, 5, 6, 2, 4, 6, 7, 8, 7, 8, 9]})

print(df['a'].to_list())

Output:

[1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]

To drop duplicates you can do one of the following:

>>> df['a'].drop_duplicates().to_list()
[1, 3, 5, 7, 4, 6, 8, 9]
>>> list(set(df['a'])) # as pointed out by EdChum
[1, 3, 4, 5, 6, 7, 8, 9]

edited May 1, 2020 at 11:29

AMC

2,6977 gold badges15 silver badges35 bronze badges

answered May 20, 2014 at 0:09

Akavall

86.8k58 gold badges214 silver badges261 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3646105 Over a year ago

Thanks that works! I want to delete the duplicates out of the ID list. I tried using set(ID) but gives an error saying TypeError: unhashable type: 'list'

EdChum Over a year ago

you can just do list(set(df['a'])

Coder Over a year ago

@MartienLubberink why is it better? wouldnt a set data structure use up more space?

MarredCheese · Accepted Answer · 2017-08-10 14:33:52Z

I'd like to clarify a few things:

As other answers have pointed out, the simplest thing to do is use pandas.Series.tolist(). I'm not sure why the top voted answer leads off with using pandas.Series.values.tolist() since as far as I can tell, it adds syntax/confusion with no added benefit.
tst[lookupValue][['SomeCol']] is a dataframe (as stated in the question), not a series (as stated in a comment to the question). This is because tst[lookupValue] is a dataframe, and slicing it with [['SomeCol']] asks for a list of columns (that list that happens to have a length of 1), resulting in a dataframe being returned. If you remove the extra set of brackets, as in tst[lookupValue]['SomeCol'], then you are asking for just that one column rather than a list of columns, and thus you get a series back.
You need a series to use pandas.Series.tolist(), so you should definitely skip the second set of brackets in this case. FYI, if you ever end up with a one-column dataframe that isn't easily avoidable like this, you can use pandas.DataFrame.squeeze() to convert it to a series.
tst[lookupValue]['SomeCol'] is getting a subset of a particular column via chained slicing. It slices once to get a dataframe with only certain rows left, and then it slices again to get a certain column. You can get away with it here since you are just reading, not writing, but the proper way to do it is tst.loc[lookupValue, 'SomeCol'] (which returns a series).
Using the syntax from #4, you could reasonably do everything in one line: ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()

Demo Code:

import pandas as pd
df = pd.DataFrame({'colA':[1,2,1],
                   'colB':[4,5,6]})
filter_value = 1

print "df"
print df
print type(df)

rows_to_keep = df['colA'] == filter_value
print "\ndf['colA'] == filter_value"
print rows_to_keep
print type(rows_to_keep)

result = df[rows_to_keep]['colB']
print "\ndf[rows_to_keep]['colB']"
print result
print type(result)

result = df[rows_to_keep][['colB']]
print "\ndf[rows_to_keep][['colB']]"
print result
print type(result)

result = df[rows_to_keep][['colB']].squeeze()
print "\ndf[rows_to_keep][['colB']].squeeze()"
print result
print type(result)

result = df.loc[rows_to_keep, 'colB']
print "\ndf.loc[rows_to_keep, 'colB']"
print result
print type(result)

result = df.loc[df['colA'] == filter_value, 'colB']
print "\ndf.loc[df['colA'] == filter_value, 'colB']"
print result
print type(result)

ID = df.loc[rows_to_keep, 'colB'].tolist()
print "\ndf.loc[rows_to_keep, 'colB'].tolist()"
print ID
print type(ID)

ID = df.loc[df['colA'] == filter_value, 'colB'].tolist()
print "\ndf.loc[df['colA'] == filter_value, 'colB'].tolist()"
print ID
print type(ID)

Result:

df
   colA  colB
0     1     4
1     2     5
2     1     6
<class 'pandas.core.frame.DataFrame'>

df['colA'] == filter_value
0     True
1    False
2     True
Name: colA, dtype: bool
<class 'pandas.core.series.Series'>

df[rows_to_keep]['colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df[rows_to_keep][['colB']]
   colB
0     4
2     6
<class 'pandas.core.frame.DataFrame'>

df[rows_to_keep][['colB']].squeeze()
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[rows_to_keep, 'colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[df['colA'] == filter_value, 'colB']
0    4
2    6
Name: colB, dtype: int64
<class 'pandas.core.series.Series'>

df.loc[rows_to_keep, 'colB'].tolist()
[4, 6]
<type 'list'>

df.loc[df['colA'] == filter_value, 'colB'].tolist()
[4, 6]
<type 'list'>

Yushan · Accepted Answer · 2017-08-01 12:36:03Z

21

You can use pandas.Series.tolist

e.g.:

import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

Run:

>>> df['a'].tolist()

You will get

>>> [1, 2, 3]

edited Aug 1, 2017 at 12:36

Yushan

6436 silver badges19 bronze badges

answered Aug 20, 2016 at 11:57

zhql0907

3693 silver badges11 bronze badges

Comments

ShikharDua · Accepted Answer · 2016-04-21 22:10:58Z

4

The above solution is good if all the data is of same dtype. Numpy arrays are homogeneous containers. When you do df.values the output is an numpy array. So if the data has int and float in it then output will either have int or float and the columns will loose their original dtype. Consider df

a  b 
0  1  4
1  2  5 
2  3  6 

a    float64
b    int64

So if you want to keep original dtype, you can do something like

row_list = df.to_csv(None, header=False, index=False).split('\n')

this will return each row as a string.

['1.0,4', '2.0,5', '3.0,6', '']

Then split each row to get list of list. Each element after splitting is a unicode. We need to convert it required datatype.

def f(row_str): 
  row_list = row_str.split(',')
  return [float(row_list[0]), int(row_list[1])]

df_list_of_list = map(f, row_list[:-1])

[[1.0, 4], [2.0, 5], [3.0, 6]]

answered Apr 21, 2016 at 22:10

ShikharDua

10.1k1 gold badge28 silver badges22 bronze badges

2 Comments

JohnE Over a year ago

Easier way is just to do df['b'].values. If you select the column before using .values it will avoid a conversion and preserve the original dtype. This is also much more efficient.

tommy.carstensen Over a year ago

Which one is the above solution? All of the answers appear above this one. Thanks!

Collectives™ on Stack Overflow

Pandas DataFrame column to list [duplicate]

4 Answers 4

3 Comments

Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

2 Comments

Linked

Related