3

I have an original dataframe from which I am able to create a modified dataframe, however there will be cases that I am interested in selecting a subset of my data and not using the dataframe as a whole, but I want this all to be done in an entire function for which I am opting to use a subset of the data, however is it possible to return different variables based on a conditional or would this be incorrect.

The function below works fine when I run

modified_df = modify_data(protein_embeddings, protein_df, subset = False)

but when I try executing:

gal_subset_first, gal_subset_second = modify_data(protein_embeddings, protein_df, subset = True)

I get the error:

ValueError: too many values to unpack (expected 2)

The Function

def modify_data(embeddings, df, subset = False):

    """
    Modifies Original Dataframe with respective embedddings

    :return: Final Dataframe to be used in data split and modelling
    """
    #Original_DF
    OD_df = df.copy(deep = True)
    OD_df = df.reset_index()
    OD_df.loc[:,'task'] = 'stability'
    
    #Embeddings Df
    embeddings_df = pd.DataFrame(data=embeddings)
    embeddings_df = embeddings_df.reset_index()   
    
    embedded_df = pd.merge(embeddings_df, OD_df, on='index')
    embedded_df = embedded_df.drop(['index', 'sequence', 'temperature'], axis = 1)
    
    def subsetting(embedded_df, sample_no, row_no):
        "Select a Subset of rows desired from original dataframe"
        #Selecting subset
        embedded_df = embedded_df.sample(n = sample_no)
        subset_first = gal_subset[:row_no]
        subset_second = gal_subset[row_no:]
    
        return subset_first, subset_second

    if subset == True:
        gal_subset_first, gal_subset_second = subsetting(embedded_df, sample_no = 2000, row_no = 1000)
    else:
        pass  
         
    
    return embedded_df
1
  • It helps if you add an example data frame and a full working minimal code example (including imports etc), so people can try out your code. But from what I can tell you're never returning gal_subset_first, gal_subset_second after calling subsetting in your if statement. Try replacing that line with return subsetting(embedded_df, sample_no = 2000, row_no = 1000). Commented Sep 29, 2020 at 12:50

1 Answer 1

1

Your function returns an iterable data frame. When you assign the result to one variable, the whole data frame will be written to the variable. However, if you assign the result multiple variables, Python will iterate over the returned value and check if the number of variables matches the data frame iterator items.

Compare the code samples:

def f():
    return (1,2,3)

a = f()  # a is a tuple (1, 2, 3)
a, b = f()  # raises the same exception ValueError: too many values to unpack (expected 2)
a, b, c = f()  # a=1 b=2 c=3 because the number of returned values matches the number of the assigned variables.
Sign up to request clarification or add additional context in comments.

2 Comments

Yes, I understand this concept but I'm just curious on how I can return more than one variable if I'm using a nested function for example.
The return value of the nested function doesn't affect the main function. When you return from subsetting it won't be the return value of modify_data. If you would like to return the result of the nested function, write return subsetting(embedded_df, sample_no = 2000, row_no = 1000)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.