1

I'm trying to building a lambda function within an apply that:

  • checks whether a inputted list is inside of another list
  • If it exists, append a value to another column in the same dataframe

Example:

initial dataframe:

id list_1 list_2
1 [1,2,3] []
2 [1,2,4] []
3 [1,3,4] []

Now, I want to check if [1,2] is present within list_1. If it is, append test to list_2 lists.

final_dataframe:

id list_1 list_2
1 [1,2,3] ['test']
2 [1,2,4] ['test']
3 [1,3,4] []

Here is my first attempt:

df.apply(lambda row: row['list_2'].append(['test' if all(elem in [1,2] for elem in row['list_1'])]), axis = 1)

But i'm getting an invalid systax error. I feel like this is probably pretty simple but I can't figure out what the issues

Here is the full error:

  File "<ipython-input-126-f45b74393598>", line 3
    movies_test.apply(lambda row: row['displayable'].append(['comedy_drama' if all(elem in ['comedy','drama'] for elem in row['test'])]), axis = 1)
                                                                                                                                      ^
SyntaxError: invalid syntax
3
  • Invalid syntax errors usually point you exactly where you need to fix something. Share the full error traceback so we can see it. Either way, this function is way too long for a lambda and is totally unreadable. Code shouldn't fit in one line at all costs, you'll have a better time debugging if you can actually read the code. Commented Sep 13, 2019 at 20:15
  • That's fair, it seemed simple enough while writing it but I guess it got a little out of hand. Would recommend I build a separate function and pass it in my apply function? Commented Sep 13, 2019 at 20:17
  • 1
    I would, yes. Also, the reason you're getting the error is because you have a one-liner if-statement without and else - you can't do that Commented Sep 13, 2019 at 20:18

3 Answers 3

4

You are using an expression (lambdas can only hold an expression), and so must use a conditional expression. Expressions always produce an object, so conditional expressions must have a true expression and a false expression, in the form

<true> if <condition> else <false>`

You left out the else <false> part.

You are making three additional mistakes:

  • You are treating a single column as a list of lists
  • You are appending whether or not your test is true
  • You are appending a list to a list, where you wanted to append a string instead.

Your test should just see if both elements are in the list; you can use set operations; you want to know if {1, 2} is a subset of the column values:

{1, 2}.issubset(row['list_1'])

and only then append something to the other column, so you want to execute row['list_2'].append() only if the above is true. And you want to append a single string, so call .append('test').

For the else part, you could return None, so not make an append call:

row['list_2'].append('test') if {1, 2}.issubset(row['list_1']) else None

or, in situ in the df.apply() call:

df.apply(lambda row: row['list_2'].append('test') if {1, 2}.issubset(row['list_1']) else None)

However, it would be better if you used the apply() only to return a boolean value, so you can select your rows with it, then use a separate action to append to the selected rows:

test = df.list_1.apply(lambda c: {1, 2}.issubset(c))
df.list_2[test].apply(lambda c: c.append('test'))

Here, test holds a series of boolean False and True values, corresponding with the rows where list_1 values are a superset of {1, 2}. That series can be used to select rows in df.list_2, where you can do other operations, including appending to the list objects in each cell.

It's a whole lot more readable, and easier to change if you wanted to switch from appending to nested list objects to just assigning a different value; e.g. setting the df.outcome column to 'tested' when the subset test passes, or to 'failed' if not, using numpy.where():

test = df.list_1.apply(lambda c: {1, 2}.issubset(c))
df.outcome = np.where(test, 'tested', 'failed')
Sign up to request clarification or add additional context in comments.

Comments

1

This:

['test' if all(elem in [1,2] for elem in row['list_1'])]

Is an incomplete ternary expression. The syntax should be:

a if condition else b

you have

a if condition

Comments

0

This should get you your desired result. It looks like your list comprehension is missing some brackets.

mask = [key for key, value in df['list_1'].items() if 1 in value and 2 in value]
df.loc[mask]['list_2'].apply(lambda x: x.append('test_2'))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.