1

I have created a matrix:

items = [0, 1, 2, 3]
item_to_item = pd.DataFrame(index=items, columns=items)

I've put values in it so:

  1. Its symetric
  2. Its diagonal is all 0's

for example:

   0  1  2  3
0  0  4  5  9
1  4  0  3  7
2  5  3  0  3
3  9  7  3  0

I want to create a data frame of all possible pairs (from [0, 1, 2, 3]) so that there wont be pairs of (x, x) and if (x, y) is in, I dont want (y, x) becuase its symetric and holds the same value. In the end I will have the following Dataframe (or numpy 2d array)

item, item, value
 0     1     4
 0     2     5
 0     3     9
 1     2     3
 1     3     7
 2     3     3

2 Answers 2

2

numpy's np.triu gives you the upper triangle with all other elements set to zero. You can use that to construct your DataFrame and replace them with NaNs (so that they are dropped when you stack the columns):

pd.DataFrame(np.triu(df), index=df.index, columns=df.columns).replace(0, np.nan).stack()
Out: 
0  1    4.0
   2    5.0
   3    9.0
1  2    3.0
   3    7.0
2  3    3.0
dtype: float64

You can use reset_index at the end to convert indices to columns.

Another alternative would be resetting the index and stacking again but this time use a callable to slice the DataFrame:

df.stack().reset_index()[lambda x: x['level_0'] < x['level_1']]
Out: 
    level_0  level_1  0
1         0        1  4
2         0        2  5
3         0        3  9
6         1        2  3
7         1        3  7
11        2        3  3

This one requires pandas 0.18.0.

Sign up to request clarification or add additional context in comments.

Comments

1

Here's a NumPy solution with np.triu_indices -

In [453]: item_to_item
Out[453]: 
   0  1  2  3
0  0  4  5  9
1  4  0  3  7
2  5  3  0  3
3  9  7  3  0

In [454]: r,c = np.triu_indices(len(items),1)

In [455]: pd.DataFrame(np.column_stack((r,c, item_to_item.values[r,c])))
Out[455]: 
   0  1  2
0  0  1  4
1  0  2  5
2  0  3  9
3  1  2  3
4  1  3  7
5  2  3  3

3 Comments

Do you know how i keep the original ids? I've noticed they are changing to sequence from 0 to len(items)
@EranMoshe If you meant using the row indexes, then you could do : np.column_stack((item_to_item.index[r], item_to_item.index[c],..)). Let me know if it works for you.
Works great my friend!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.