1

I implemented a function that goes to the first occurence of a valued in a panda dataframe but I feel the implementation is kindda ugly. Would you have a nicer way to implement it??

[mots] is an array of strings

# Sans doutes la pire implémentation au monde...
def find_singular_value(self, mots):
    bool_table = self.document.isin(mots)
    for i in range(bool_table.shape[0]):
        for j in range(bool_table.shape[1]):
            boolean = bool_table.iloc[i][j]
            if boolean:
                return self.document.iloc[i][j + 1]

2 Answers 2

1

Here's a solution for getting the j+1 value. It uses df.unstack and df.shift

df = self.document.unstack()
vals = df[df.isin(mots).shift().fillna(False)]

vals will contain all of the j+1 values in self.documents. You can then select the first one as in my previous answer. Hopefully this works for you.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! You helped me get a lot of new keywords in Panda :)
1

This one liner should give you what you need.

self.document[self.document.isin(mots)].melt()["value"].dropna().values[0]

It applies your isin mask to the original df then finds the first non nan value using pd.melt and df.dropna

Here's a simple breakdown:

>>> df = pd.DataFrame({"a":[1,2,3],"b":[4,5,6],"c":[7,8,9]})
>>> df.isin([4,6])
       a      b      c
0  False   True  False
1  False  False  False
2  False   True  False
>>> df[df.isin([4,6])]
    a    b   c
0 NaN  4.0 NaN
1 NaN  NaN NaN
2 NaN  6.0 NaN
>>> df[df.isin([4,6])].melt()
  variable  value
0        a    NaN
1        a    NaN
2        a    NaN
3        b    4.0
4        b    NaN
5        b    6.0
6        c    NaN
7        c    NaN
8        c    NaN
>>> df[df.isin([4,6])].melt()["value"]
0    NaN
1    NaN
2    NaN
3    4.0
4    NaN
5    6.0
6    NaN
7    NaN
8    NaN
Name: value, dtype: float64
>>> df[df.isin([4,6])].melt()["value"].dropna()
3    4.0
5    6.0
Name: value, dtype: float64
>>> df[df.isin([4,6])].melt()["value"].dropna().values
array([ 4.,  6.])
>>> df[df.isin([4,6])].melt()["value"].dropna().values[0]
4.0
>>>

6 Comments

Hmm doesnt seem to output nothing (empty value). I'll investigate this later with pd.melt . Thanks for your answer anyway!!
Is "value" suppose to be a string like this? Should't ut be True?
Yes value should should be a string. Using melt transforms the dataframe into two columns; 'variable' and 'value'. Then I take the 'value' series, drop the nan values and return the first result.
I've added a breakdown of the operations to my answer. Does this help?
Oh yes, seems good, but i'm actually taking the cell just after the one I found! (j+1) I don't find the way to do it with your method... Would you have a way to get the coordinate of the cell found with isin?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.