1

I have the following data structure:

enter image description here

The columns s and d are indicating the transition of object in column x. What I want to do is get a transition string per object present in the column x. For e.g. with a new column as follows:

enter image description here

Is there a lean way to do it using pandas, without using too many loops?

This was the code I tried:

obj = df['x'].tolist()
rows = []

for o in obj:
    locs = df[df['x'] == o]['s'].tolist()
    str_locs = '->'.join(str(l) for l in locs)
    print(str_locs)
    d = dict()
    d['x'] = o
    d['new'] = str_locs
    rows.append(d)

tmp = pd.DataFrame(rows)

This give the output temp as:

    x   new
    a   1->2->4->8
    a   1->2->4->8
    a   1->2->4->8
    a   1->2->4->8
    b   1->2
    b   1->2
2
  • 1
    What have you tried so far? Can you post some code so that we can reproduce the issue? Commented Apr 29, 2021 at 12:05
  • Hi, I've got the optimal solution from @Hamza usman ghani, but I'll add the code I had tried anyways in the post. Commented Apr 29, 2021 at 13:49

1 Answer 1

1

Example df:

df = pd.DataFrame({"x":["a","a","a","a","b","b"], "s":[1,2,4,8,5,11],"d":[2,4,8,9,11,12]})

print(df)

       x    s   d
    0   a   1   2
    1   a   2   4
    2   a   4   8
    3   a   8   9
    4   b   5   11
    5   b   11  12

Following code will generate a transition string of all objects present in the column x.

  • groupby with respect to column x and get list of lists of s and d for every object available in x
  • Merge the list of lists sequentially
  • Remove consecutive duplicates from the merged list using itertools.groupby
  • Join the items of merged list with -> to make it a single string.
  • Finally map the series to column x of input df
from itertools import groupby 

grp = df.groupby('x')[['s', 'd']].apply(lambda x: x.values.tolist())
grp = grp.apply(lambda x: [str(item) for tup in x for item in tup])
sr = grp.apply(lambda x: "->".join([i[0] for i in groupby(x)]))
df["new"] = df["x"].map(sr)
print(df)

       x    s   d   new
    0   a   1   2   1->2->4->8->9
    1   a   2   4   1->2->4->8->9
    2   a   4   8   1->2->4->8->9
    3   a   8   9   1->2->4->8->9
    4   b   5   11  5->11->12
    5   b   11  12  5->11->12

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! That works really great! (Only a small thing I noticed, in the second last line, it should be .map(grp) instead of sr, but probably just a typo). Thanks again!
My pleasure and Yes, it was a typo while optimizing the code. I've edited the code :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.