How do I find the maximum value in an array within a dataframe column?

Question

I have a dataframe (df) that looks like this:

a                      b
loc.1  [1, 2, 3, 4, 7, 5, 6]
loc.2  [3, 4, 3, 7, 7, 8, 6]
loc.3  [1, 4, 3, 1, 7, 8, 6]
...

I want to find the maximum of the array in column b and append this to the original data frame. My thought was something like this:

for line in df: 
    split = map(float,b.split(','))
    count_max = max(split)
print count

Ideal output should be:

a                      b           max_val
    loc.1  [1, 2, 3, 4, 7, 5, 6]   7
    loc.2  [3, 4, 3, 7, 7, 8, 6]   8
    loc.3  [1, 4, 3, 1, 7, 8, 6]   8
    ...

But this does not work, as I cannot use b.split as it is not defined...

Instead of b, I think you should have df.loc[, 'b'].

Colin Burke
– Colin Burke

2018-04-17 17:45:25 +00:00
Commented Apr 17, 2018 at 17:45 — Colin Burke
– Colin Burke, Commented Apr 17, 2018 at 17:45
pd.DataFrame(df['b'].values.tolist()).max(1)

BENY
– BENY

2018-04-17 17:46:29 +00:00
Commented Apr 17, 2018 at 17:46 — BENY
– BENY, Commented Apr 17, 2018 at 17:46

jpp · Accepted Answer · 2018-04-17 18:49:10Z

4

If working with lists without NaNs best is use max in list comprehension or map:

a['max'] = [max(x) for x in a['b']]

a['max'] = list(map(max, a['b']))

Pure pandas solution:

a['max'] = pd.DataFrame(a['b'].values.tolist()).max(axis=1)

Sample:

array = {'loc.1': np.array([  1,2,3,4,7,5,6]),
         'loc.2': np.array([  3,4,3,7,7,8,6]),
         'loc.3': np.array([  1,4,3,1,7,8,6])}

L = [(k, v) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b']).set_index('a')

a['max'] = [max(x) for x in a['b']]
print (a)
                           b  max
a                                
loc.1  [1, 2, 3, 4, 7, 5, 6]    7
loc.2  [3, 4, 3, 7, 7, 8, 6]    8
loc.3  [1, 4, 3, 1, 7, 8, 6]    8

EDIT:

You can also get max in list comprehension:

L = [(k, v, max(v)) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b', 'max']).set_index('a')

print (a)
                           b  max
a                                
loc.1  [1, 2, 3, 4, 7, 5, 6]    7
loc.2  [3, 4, 3, 7, 7, 8, 6]    8
loc.3  [1, 4, 3, 1, 7, 8, 6]    8

edited Apr 17, 2018 at 18:49

jpp

166k37 gold badges301 silver badges363 bronze badges

answered Apr 17, 2018 at 17:46

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jezrael Over a year ago

@chrisz - Thank you.

FadeoN · Accepted Answer · 2018-04-17 18:28:15Z

1

Try this:

df["max_val"] = df["b"].apply(lambda x:max(x))

answered Apr 17, 2018 at 18:28

FadeoN

1014 bronze badges

1 Comment

jpp Over a year ago

This works, but vectorised calculations should be preferred.

jpp · Accepted Answer · 2018-04-17 17:50:05Z

0

You can use numpy arrays for a vectorised calculation:

df = pd.DataFrame({'a': ['loc.1', 'loc.2', 'loc.3'],
                   'b': [[1, 2, 3, 4, 7, 5, 6],
                         [3, 4, 3, 7, 7, 8, 6],
                         [1, 4, 3, 1, 7, 8, 6]]})

df['maxval'] = np.array(df['b'].values.tolist()).max(axis=1)

print(df)

#        a                      b  maxval
# 0  loc.1  [1, 2, 3, 4, 7, 5, 6]       7
# 1  loc.2  [3, 4, 3, 7, 7, 8, 6]       8
# 2  loc.3  [1, 4, 3, 1, 7, 8, 6]       8

answered Apr 17, 2018 at 17:50

jpp

166k37 gold badges301 silver badges363 bronze badges

Collectives™ on Stack Overflow

How do I find the maximum value in an array within a dataframe column?

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related