2

I have a dataframe (df) that looks like this:

a                      b
loc.1  [1, 2, 3, 4, 7, 5, 6]
loc.2  [3, 4, 3, 7, 7, 8, 6]
loc.3  [1, 4, 3, 1, 7, 8, 6]
...

I want to find the maximum of the array in column b and append this to the original data frame. My thought was something like this:

for line in df: 
    split = map(float,b.split(','))
    count_max = max(split)
print count

Ideal output should be:

a                      b           max_val
    loc.1  [1, 2, 3, 4, 7, 5, 6]   7
    loc.2  [3, 4, 3, 7, 7, 8, 6]   8
    loc.3  [1, 4, 3, 1, 7, 8, 6]   8
    ...

But this does not work, as I cannot use b.split as it is not defined...

2
  • Instead of b, I think you should have df.loc[, 'b']. Commented Apr 17, 2018 at 17:45
  • pd.DataFrame(df['b'].values.tolist()).max(1) Commented Apr 17, 2018 at 17:46

3 Answers 3

4

If working with lists without NaNs best is use max in list comprehension or map:

a['max'] = [max(x) for x in a['b']]

a['max'] = list(map(max, a['b']))

Pure pandas solution:

a['max'] = pd.DataFrame(a['b'].values.tolist()).max(axis=1)

Sample:

array = {'loc.1': np.array([  1,2,3,4,7,5,6]),
         'loc.2': np.array([  3,4,3,7,7,8,6]),
         'loc.3': np.array([  1,4,3,1,7,8,6])}

L = [(k, v) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b']).set_index('a')

a['max'] = [max(x) for x in a['b']]
print (a)
                           b  max
a                                
loc.1  [1, 2, 3, 4, 7, 5, 6]    7
loc.2  [3, 4, 3, 7, 7, 8, 6]    8
loc.3  [1, 4, 3, 1, 7, 8, 6]    8

EDIT:

You can also get max in list comprehension:

L = [(k, v, max(v)) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b', 'max']).set_index('a')

print (a)
                           b  max
a                                
loc.1  [1, 2, 3, 4, 7, 5, 6]    7
loc.2  [3, 4, 3, 7, 7, 8, 6]    8
loc.3  [1, 4, 3, 1, 7, 8, 6]    8
Sign up to request clarification or add additional context in comments.

1 Comment

@chrisz - Thank you.
1

Try this:

df["max_val"] = df["b"].apply(lambda x:max(x))

1 Comment

This works, but vectorised calculations should be preferred.
0

You can use numpy arrays for a vectorised calculation:

df = pd.DataFrame({'a': ['loc.1', 'loc.2', 'loc.3'],
                   'b': [[1, 2, 3, 4, 7, 5, 6],
                         [3, 4, 3, 7, 7, 8, 6],
                         [1, 4, 3, 1, 7, 8, 6]]})

df['maxval'] = np.array(df['b'].values.tolist()).max(axis=1)

print(df)

#        a                      b  maxval
# 0  loc.1  [1, 2, 3, 4, 7, 5, 6]       7
# 1  loc.2  [3, 4, 3, 7, 7, 8, 6]       8
# 2  loc.3  [1, 4, 3, 1, 7, 8, 6]       8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.