I have a pandas dataframe as shown here. There are many more columns in that frame that are not important concerning the task.
id pos value sente
1 a I 21
2 b have 21
3 b a 21
4 a cat 21
5 d ! 21
1 a My 22
2 a cat 22
3 b is 22
4 a cute 22
5 d . 22
I would like to make a list out of certain colums so the first sentence (sente=21) and every other looks something like that. Meaing that every sentence has an unique entry for itself.
`[('I', 'a', '1'), ..., ('!','d','5')]`
I already have a function to do this for one sentence but I can not figure out how to do it for all sentences (sentences that have the same sente value) in the frame.
`class SentenceGetter(object):
def __init__(self, data):
self.n_sent = 1
self.data = data
self.empty = False
def get_next(self):
for t in self.data:
try:
s = self.data[(self.data["sente"] == 21)]
self.n_sent += 1
return
s["id"].values.tolist(),
s["pos"].values.tolist(),
s["value"].values.tolist()
except:
self.empty = True
return None,None,None
foo = SentenceGetter(df)
sent, pos, token = foo.get_next()
in = zip(token, pos, sent)
`
As my frame is very large there is no way to use constructions like this:
df.loc[((df["sente"] == df["sente"].shift(-1)) & (df["sente"] == df["sente"].shift(+1))), ["pos","value","id"]]
Any ideas?