return various variables in nested function based on conditional statement

Question

I have an original dataframe from which I am able to create a modified dataframe, however there will be cases that I am interested in selecting a subset of my data and not using the dataframe as a whole, but I want this all to be done in an entire function for which I am opting to use a subset of the data, however is it possible to return different variables based on a conditional or would this be incorrect.

The function below works fine when I run

modified_df = modify_data(protein_embeddings, protein_df, subset = False)

but when I try executing:

gal_subset_first, gal_subset_second = modify_data(protein_embeddings, protein_df, subset = True)

I get the error:

ValueError: too many values to unpack (expected 2)

The Function

def modify_data(embeddings, df, subset = False):

    """
    Modifies Original Dataframe with respective embedddings

    :return: Final Dataframe to be used in data split and modelling
    """
    #Original_DF
    OD_df = df.copy(deep = True)
    OD_df = df.reset_index()
    OD_df.loc[:,'task'] = 'stability'
    
    #Embeddings Df
    embeddings_df = pd.DataFrame(data=embeddings)
    embeddings_df = embeddings_df.reset_index()   
    
    embedded_df = pd.merge(embeddings_df, OD_df, on='index')
    embedded_df = embedded_df.drop(['index', 'sequence', 'temperature'], axis = 1)
    
    def subsetting(embedded_df, sample_no, row_no):
        "Select a Subset of rows desired from original dataframe"
        #Selecting subset
        embedded_df = embedded_df.sample(n = sample_no)
        subset_first = gal_subset[:row_no]
        subset_second = gal_subset[row_no:]
    
        return subset_first, subset_second

    if subset == True:
        gal_subset_first, gal_subset_second = subsetting(embedded_df, sample_no = 2000, row_no = 1000)
    else:
        pass  
         
    
    return embedded_df

It helps if you add an example data frame and a full working minimal code example (including imports etc), so people can try out your code. But from what I can tell you're never returning gal_subset_first, gal_subset_second after calling subsetting in your if statement. Try replacing that line with return subsetting(embedded_df, sample_no = 2000, row_no = 1000). — char
– char, Commented Sep 29, 2020 at 12:50

Yann · Accepted Answer · 2020-09-29 12:59:58Z

1

Your function returns an iterable data frame. When you assign the result to one variable, the whole data frame will be written to the variable. However, if you assign the result multiple variables, Python will iterate over the returned value and check if the number of variables matches the data frame iterator items.

Compare the code samples:

def f():
    return (1,2,3)

a = f()  # a is a tuple (1, 2, 3)
a, b = f()  # raises the same exception ValueError: too many values to unpack (expected 2)
a, b, c = f()  # a=1 b=2 c=3 because the number of returned values matches the number of the assigned variables.

answered Sep 29, 2020 at 12:59

Yann

2,5722 gold badges21 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

machine_apprentice Over a year ago

Yes, I understand this concept but I'm just curious on how I can return more than one variable if I'm using a nested function for example.

Yann Over a year ago

The return value of the nested function doesn't affect the main function. When you return from subsetting it won't be the return value of modify_data. If you would like to return the result of the nested function, write return subsetting(embedded_df, sample_no = 2000, row_no = 1000)

Collectives™ on Stack Overflow

return various variables in nested function based on conditional statement

The Function

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

The Function

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related