Making a list ouf of values in a DataFrame depending on values in another column

Question

I have a pandas dataframe as shown here. There are many more columns in that frame that are not important concerning the task.

id    pos      value       sente
1     a         I           21
2     b         have        21
3     b         a           21
4     a         cat         21
5     d         !           21
1     a         My          22
2     a         cat         22
3     b         is          22
4     a         cute        22
5     d         .           22

I would like to make a list out of certain colums so the first sentence (sente=21) and every other looks something like that. Meaing that every sentence has an unique entry for itself.

`[('I', 'a', '1'), ..., ('!','d','5')]`

I already have a function to do this for one sentence but I can not figure out how to do it for all sentences (sentences that have the same sente value) in the frame.

`class SentenceGetter(object):
  def __init__(self, data):
    self.n_sent = 1
    self.data = data
    self.empty = False
  def get_next(self):
    for t in self.data:
        try:
            s = self.data[(self.data["sente"] == 21)]
            self.n_sent += 1
            return 
              s["id"].values.tolist(),   
              s["pos"].values.tolist(),
              s["value"].values.tolist() 
        except:
            self.empty = True
            return None,None,None

foo = SentenceGetter(df)
sent, pos, token = foo.get_next()
in = zip(token, pos, sent)

`

As my frame is very large there is no way to use constructions like this:

df.loc[((df["sente"] == df["sente"].shift(-1)) & (df["sente"] == df["sente"].shift(+1))), ["pos","value","id"]]

Any ideas?

jpp · Accepted Answer · 2018-04-29 19:09:56Z

2

If you are open to using the standard library, collections.defaultdict offers an O(n) solution:

from collections import defaultdict

d = defaultdict(list)

for _, num, *data in df[['sente', 'value', 'pos', 'id']].itertuples():
    d[num].append(data)

Result:

defaultdict(list,
            {21: [('I', 'a', 1),
                  ('have', 'b', 2),
                  ('a', 'b', 3),
                  ('cat', 'a', 4),
                  ('!', 'd', 5)],
             22: [('My', 'a', 1),
                  ('cat', 'a', 2),
                  ('is', 'b', 3),
                  ('cute', 'a', 4),
                  ('.', 'd', 5)]})

edited Apr 29, 2018 at 19:09

answered Apr 29, 2018 at 18:02

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

YOLO · Accepted Answer · 2018-04-29 18:07:34Z

2

You can also use groupby and apply functions.

Method 1: It gives a data frame

(df
 .groupby('sente')
 .apply(lambda df: list(tuple(x) for x in df[['value','pos','id']].values))
 .reset_index()
 .rename(columns={0: 'values'}))

   sente                                             values
0     21  [(I, a, 1), (have, b, 2), (a, b, 3), (cat, a, ...
1     22  [(My, a, 1), (cat, a, 2), (is, b, 3), (cute, a...

Method 2: It gives a dictionary

(df
 .groupby('sente')
 .apply(lambda df: list(tuple(x) for x in df[['value','pos','id']].values))
 .reset_index()
 .set_index('sente')[0].to_dict())

answered Apr 29, 2018 at 18:07

YOLO

22k5 gold badges25 silver badges42 bronze badges

Comments

wwii · Accepted Answer · 2018-04-29 18:56:28Z

Essentially the same as @YOLO's answer

def f(df):
    s = df[['value','pos','id']].apply(tuple, axis=1)
    return s.tolist()
g = df.groupby('sente')
q = g.apply(f)

>>> type(q)
<class 'pandas.core.series.Series'>
>>> q[21]
[('I', 'a', 1), ('have', 'b', 2), ('a', 'b', 3), ('cat', 'a', 4), ('!', 'd', 5)]
>>> q[22]
[('My', 'a', 1), ('cat', 'a', 2), ('is', 'b', 3), ('cute', 'a', 4), ('.', 'd', 5)]

>>> q.tolist()
[[('I', 'a', 1), ('have', 'b', 2), ('a', 'b', 3), ('cat', 'a', 4), ('!', 'd', 5)], [('My', 'a', 1), ('cat', 'a', 2), ('is', 'b', 3), ('cute', 'a', 4), ('.', 'd', 5)]]
>>>
>>> q.to_dict()
{21: [('I', 'a', 1), ('have', 'b', 2), ('a', 'b', 3), ('cat', 'a', 4), ('!', 'd', 5)], 22: [('My', 'a', 1), ('cat', 'a', 2), ('is', 'b', 3), ('cute', 'a', 4), ('.', 'd', 5)]}
>>>

Collectives™ on Stack Overflow

Making a list ouf of values in a DataFrame depending on values in another column

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related