1

In a dataframe with two columns I can easily create a third without a function if it is a numerical operation such as multiply df["new"] =df["one"] * df["two"].

However what if I need to pass in more than two parameters to a function and those parameters are columns from a dataframe.

Passing one column at a time is simple using: df.apply(my_func) but if the functions definition is, and requires three columns:

def WordLength(col1,col2,col3):
return max(len(col1),len(col2),len(col3))

For example, A function WordLength would return the maximum length of the word from any of the three columns passed into it.

I know for example this doesn't work but I imagine something like this to return a result of a function requiring three parameters into a dataframe column:

df["word_length"]= df.apply(WordLength, [[param1,param2,param3]])

Update Jon, when trying to use your method of passing in three parameters (values from three dataframe columns for a given row I am getting the following error:

def get(name,start_date,end_date):
    try:
        df = ...

response = df.apply(get, axis=1, args=('name', 'date', 'today')) 

Error relating to arguments - I don't understand why it mentions 4 arguments when I have passed in three and the function only requires three arguments...

Error:

TypeError: ('getprice() takes exactly 3 arguments (4 given)', u'occurred at index 0')

1
  • Try to add an exemple of input/output to your question to make it easier to address please Commented Jul 11, 2016 at 8:13

2 Answers 2

3

I think you need a lambda function in your apply:

def WordLength(words):
    return max(len(words[0]),len(words[1]),len(words[2]))

df['wordlength'] = df[['col1','col2','col3']].apply(lambda x: WordLength(x),axis=1)

Output:

    col1            col2        col3                wordlength
0   word1           word10      wordover9000        12
1   anotherword     wooooord    test                11
2   yetanotherword  letter      Ihavenootheridea    16
Sign up to request clarification or add additional context in comments.

Comments

1

Unless you really want a function to do this, you can use DataFrame operations, eg:

df[['col1', 'col2', 'col3']].applymap(len).max(axis=1)

You can use apply's args argument to pass in the columns to be processed and make the target function take a variable number of arguments for unpacking, eg:

def max_word_length(row, *cols):
    return row[list(cols)].map(len).max()

# Make sure `axis=1` so rows are passed in and we can access columns
df.apply(max_word_length, axis=1, args=('col1', 'col2', 'col3'))

4 Comments

i'd really like to pass multiple columns from a pandas dataframe into a function, and output to one dataframe column yes. that is why I specifically asked the question. The example doesn't matter it's being able to pass multiple columns to the function that matters.
@yoshiserry okay - gimme a few minutes
@yoshiserry I think that should cover the base and fairly similar to the syntax you want without overcomplicating it.
Jon, if you use applymap and you pass in col1, col2, and col3 to a function how do you know which column you pass in ends up as what parameter in the funtion? For example i want to pass three items to a function which creates a url out of them, so the parameters need to be in a specific order for the function to work. However similarly if you had three numbers in a row of a dataframe and wanted to do (item1 + item2) * item3 they would also need to be in a specific order. how do you specify the order?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.