0

i have a table in pandas df

 id_x             id_y
  a                 b
  b                 c
  c                 d
  d                 a
  b                 a
and so on around (1000 rows)

i want to find the total combinations for each id_x with id_y. something like chaining

ie. a has combinations with a-b,b-c,c-d similarly b has combinations(b-c,c-d,d-a) and also a-b to be considered as a combination for b( a-b = b-a)

and create a dataframe df2 which has

id   combinations  count
a          b,c,d     3
b          c,d,a     3
c          d,a,b     3
d          a,b,c     3
and so on ..(distinct product_id_'s)

and also if i could put each combinations in a different column in dataframe

id   c1  c2   c3...&so on   count
a     b   c   d               3              
b     c   d   a               3

what approach should i follow? my skills on python are at a beginner level. Thanks in advance.

3
  • 2
    You'll need to be more explicit about what you want to do. Also, try writing some code to do it. Commented Nov 21, 2016 at 12:11
  • It is more complicated - I think you can add all output combination from input - it is a bit unclear what exactly need. Thank you. Commented Nov 21, 2016 at 12:35
  • @jezrael in short a chaining rule, if a->b and b->c and c->d thus chains for a should have a-> b,c,d Commented Nov 21, 2016 at 12:37

1 Answer 1

1

You could try something like:

#generate dataframe    
pdf = pd.DataFrame(dict(id_x = ['a','b','c','d','b'], id_y = ['b', 'c', 'd', 'a', 'a']))

#generate second dataframe with swapped columns:
pdf_swapped = pdf.rename(columns = dict(id_x= 'id_y', id_y= 'id_x'))

#append both dataframes to each other
pdf_doubled = pd.concat([pdf, dummy_pdf])

#evaluate the frequency of each combination:
result = pdf_doubled.groupby('id_x').apply(lambda x: x.id_y.value_counts())

This gives the following result:

a     b    2
      d    1
b     a    2
      c    1
c     b    1
      d    1
d     c    1
      a    1

To figure out, how frequent the combination a-b is, you can simply do:

result['a', 'b']
Sign up to request clarification or add additional context in comments.

2 Comments

@ for column a the combinations are b and d but i wanted b,c and d since a->b and b->c and c->d thus chains for a should have a-> b,c,d
I see. How should loops like a->b, b->c, c->d , d->a be handled?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.