Issues with list comprehension in Lambda function

Question

I'm trying to building a lambda function within an apply that:

checks whether a inputted list is inside of another list
If it exists, append a value to another column in the same dataframe

Example:

initial dataframe:

id list_1 list_2
1 [1,2,3] []
2 [1,2,4] []
3 [1,3,4] []

Now, I want to check if [1,2] is present within list_1. If it is, append test to list_2 lists.

final_dataframe:

id list_1 list_2
1 [1,2,3] ['test']
2 [1,2,4] ['test']
3 [1,3,4] []

Here is my first attempt:

df.apply(lambda row: row['list_2'].append(['test' if all(elem in [1,2] for elem in row['list_1'])]), axis = 1)

But i'm getting an invalid systax error. I feel like this is probably pretty simple but I can't figure out what the issues

Here is the full error:

  File "<ipython-input-126-f45b74393598>", line 3
    movies_test.apply(lambda row: row['displayable'].append(['comedy_drama' if all(elem in ['comedy','drama'] for elem in row['test'])]), axis = 1)
                                                                                                                                      ^
SyntaxError: invalid syntax

Invalid syntax errors usually point you exactly where you need to fix something. Share the full error traceback so we can see it. Either way, this function is way too long for a lambda and is totally unreadable. Code shouldn't fit in one line at all costs, you'll have a better time debugging if you can actually read the code. — Ofer Sadan
– Ofer Sadan, Commented Sep 13, 2019 at 20:15
That's fair, it seemed simple enough while writing it but I guess it got a little out of hand. Would recommend I build a separate function and pass it in my apply function? — madsthaks
– madsthaks, Commented Sep 13, 2019 at 20:17
I would, yes. Also, the reason you're getting the error is because you have a one-liner if-statement without and else - you can't do that — Ofer Sadan
– Ofer Sadan, Commented Sep 13, 2019 at 20:18

Martijn Pieters · Accepted Answer · 2019-09-13 20:39:24Z

You are using an expression (lambdas can only hold an expression), and so must use a conditional expression. Expressions always produce an object, so conditional expressions must have a true expression and a false expression, in the form

<true> if <condition> else <false>`

You left out the else <false> part.

You are making three additional mistakes:

You are treating a single column as a list of lists
You are appending whether or not your test is true
You are appending a list to a list, where you wanted to append a string instead.

Your test should just see if both elements are in the list; you can use set operations; you want to know if {1, 2} is a subset of the column values:

{1, 2}.issubset(row['list_1'])

and only then append something to the other column, so you want to execute row['list_2'].append() only if the above is true. And you want to append a single string, so call .append('test').

For the else part, you could return None, so not make an append call:

row['list_2'].append('test') if {1, 2}.issubset(row['list_1']) else None

or, in situ in the df.apply() call:

df.apply(lambda row: row['list_2'].append('test') if {1, 2}.issubset(row['list_1']) else None)

However, it would be better if you used the apply() only to return a boolean value, so you can select your rows with it, then use a separate action to append to the selected rows:

test = df.list_1.apply(lambda c: {1, 2}.issubset(c))
df.list_2[test].apply(lambda c: c.append('test'))

Here, test holds a series of boolean False and True values, corresponding with the rows where list_1 values are a superset of {1, 2}. That series can be used to select rows in df.list_2, where you can do other operations, including appending to the list objects in each cell.

It's a whole lot more readable, and easier to change if you wanted to switch from appending to nested list objects to just assigning a different value; e.g. setting the df.outcome column to 'tested' when the subset test passes, or to 'failed' if not, using numpy.where():

test = df.list_1.apply(lambda c: {1, 2}.issubset(c))
df.outcome = np.where(test, 'tested', 'failed')

Paul M. · Accepted Answer · 2019-09-13 20:19:10Z

1

This:

['test' if all(elem in [1,2] for elem in row['list_1'])]

Is an incomplete ternary expression. The syntax should be:

a if condition else b

you have

a if condition

answered Sep 13, 2019 at 20:19

Paul M.

10.8k2 gold badges11 silver badges18 bronze badges

Comments

Mason Caiby · Accepted Answer · 2019-09-13 20:31:11Z

0

This should get you your desired result. It looks like your list comprehension is missing some brackets.

mask = [key for key, value in df['list_1'].items() if 1 in value and 2 in value]
df.loc[mask]['list_2'].apply(lambda x: x.append('test_2'))

edited Sep 13, 2019 at 20:31

answered Sep 13, 2019 at 20:12

Mason Caiby

1,94411 silver badges21 bronze badges

Collectives™ on Stack Overflow

Issues with list comprehension in Lambda function

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related