How to generate this simple dataframe from these numbers?

Question

I have N numbers, call it 3 for now: A1, A2, A3. I'd like to generate the following dataframe in Pandas:

Category	1	2	3	4	5	6	7
1	A1	A1+A2	A1+A2+A3	A2+A3	A3	0	0
2	0	A2	A2+A3	A2+A3+A1	A3+A1	A1	0
3	0	0	A3	A3+A1	A3+A1+A2	A1+A2	A2

I am completely at a loss as to how to generate the logic for this.

My only thoughts are: take the 1st column, add A2 to all values, then add A3. Then add A1, although if it's already there then remove it. Seems thorny though.

I'd like to run this for general N.

For N = 4

Category	1	2	3	4	5	6	7	8	9	10
1	A1	A1+A2	A1+A2+A3	A1+A2+A3+A4	A2+A3+A4	A3+A4	A4	0	0	0
2	0	A2	A2+A3	A2+A3+A4	A2+A3+A4+A1	A3+A4+A1	A4+A1	A1	0	0
3	0	0	A3	A3+A4	A3+A4+A1	A3+A4+A1+A2	A4+A1+A2	A1+A2	A2	0
4	0	0	0	A4	A4+A1	A4+A1+A2	A4+A1+A2+A3	A1+A2+A3	A2+A3	A3

Would the dataframe have a different shape for another N, or always 3 rows, 8 columns? And what would the dataframe look like for e.g. N=4? — Ennpeh
– Ennpeh, Commented Jul 29 at 10:47
Don't generate the data frame directly. Generate a numpy 2d array and then convert. About the logic it's relatively straightforward. You have a main diagonal, the next diagonal just adds the next number in the sequence until you have N numbers added, then you start removing the first number for further diagonals. Numbers cycle, so after A3 comes A1 again — Cincinnatus
– Cincinnatus, Commented Jul 29 at 10:48
What problem are you trying to solve? This might be a known problem with existing solutions, you just need to search for the right thing. On that note, beware the XY problem, not that I think this is one. — wjandrea
– wjandrea, Commented Jul 29 at 13:12
I agree NumPy makes more sense here, since the structure you want to generate has homogenous dtype and uses numeric indices. — wjandrea
– wjandrea, Commented Jul 29 at 13:20
@wjandrea, This problem comes from reinsurance and writing patterns. We are reinsuring a set of insurers but our reinsurance contracts start in different months. Our reinsurance policies are known as 'risks attaching during' (RAD). We assume all of our insurers write A1% of their book in January, and A2% of their book in February etc. Thus, the first row tells us about the amount written onto our books as a result of January incepting reinsurance contracts. This builds up and tails off as the contracts expire. We also write reinsurance contracts in February which builds up similarly. — Matta
– Matta, Commented Jul 31 at 11:55

wjandrea · Accepted Answer · 2025-08-01 15:25:04Z

5

The pattern I found in there is as follows:

import numpy as np

N = 3

for S in range(N):  # S for "shift"
    A = np.zeros((N, 3*N - 2))

    M = N + S 
    M_ = 2*N + S 
    
    A[:S+1, S:M] = 1
    A[S+1:, M:M_] = 1

so basically for A_n you start from column n, and fill n next rows. For the rest of the rows you fill next n columns. Here's how those matrices look for N = 3:

[[1. 1. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 1. 0.]
 [0. 0. 0. 1. 1. 1. 0.]]

[[0. 1. 1. 1. 0. 0. 0.]
 [0. 1. 1. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 1. 1.]]

[[0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0.]
 [0. 0. 1. 1. 1. 0. 0.]]

Then, you can simply multiply those by the numbers you desire and sum them up:

import numpy as np
import pandas as pd 

N = 3
As = [1, 2, 3]
result = np.zeros((N, 3*N - 2))

for S in range(N):  # S for "shift"
    A = np.zeros((N, 3*N - 2))

    M = N + S 
    M_ = 2*N + S 
    
    A[:S+1, S:M] = 1
    A[S+1:, M:M_] = 1

    result += As[S] * A

pd.DataFrame(result)

    0   1   2   3   4   5   6
0   1.0 3.0 6.0 5.0 3.0 0.0 0.0
1   0.0 2.0 5.0 6.0 4.0 1.0 0.0
2   0.0 0.0 3.0 4.0 6.0 3.0 2.0

edited Aug 1 at 15:25

wjandrea

33.9k10 gold badges69 silver badges105 bronze badges

answered Jul 29 at 12:27

Jokilos

864 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Matta Jul 31 at 11:56

It is, much appreciated. Marking correct since N is small but huge appreciation for the meatier answers for large N.

chrslg · Accepted Answer · 2025-07-29 12:55:34Z

5

Let's play with pad and sliding_window_view

Firstly I define an example array convenient to check sums

A=np.array([1,10,100,1000])

(so if I see 1011, I know it is probably the sum of elements 3, 1, and 0)

Firstly, I want to build a 2d array of all rolling of this A (one per line of result)

I can do it, for example, by stacking 2 A together, and then create a sliding window view of that stach

AA=np.hstack([A,A])
# array([   1,   10,  100, 1000,    1,   10,  100, 1000])
B=np.lib.stride_tricks.sliding_window_view(AA[:-1], A.shape)
#array([[   1,   10,  100, 1000],
#       [  10,  100, 1000,    1],
#       [ 100, 1000,    1,   10],
#       [1000,    1,   10,  100]])

I need that [:-1d] to avoid a 5th line identical to the first

Now I need to pad some 0 around that, so that I can slide another window for the sum

Bpad = np.pad(B, ((0,0), (len(A)-1, len(A)-1)))
#array([[   0,    0,    0,    1,   10,  100, 1000,    0,    0,    0],
#       [   0,    0,    0,   10,  100, 1000,    1,    0,    0,    0],
#       [   0,    0,    0,  100, 1000,    1,   10,    0,    0,    0],
#       [   0,    0,    0, 1000,    1,   10,  100,    0,    0,    0]])

From there I can compute another sliding window view, to have the "convolution" you want (pretty sure I could do it with np.convolve, but since I started using stride_tricks...)

Bslide = np.lib.stride_tricks.sliding_window_view(Bpad, (len(A),), axis=1)

Which gives

array([[[   0,    0,    0,    1],
        [   0,    0,    1,   10],
        [   0,    1,   10,  100],
        [   1,   10,  100, 1000],
        [  10,  100, 1000,    0],
        [ 100, 1000,    0,    0],
        [1000,    0,    0,    0]],

       [[   0,    0,    0,   10],
        [   0,    0,   10,  100],
        [   0,   10,  100, 1000],
        [  10,  100, 1000,    1],
        [ 100, 1000,    1,    0],
        [1000,    1,    0,    0],
        [   1,    0,    0,    0]],

       [[   0,    0,    0,  100],
        [   0,    0,  100, 1000],
        [   0,  100, 1000,    1],
        [ 100, 1000,    1,   10],
        [1000,    1,   10,    0],
        [   1,   10,    0,    0],
        [  10,    0,    0,    0]],

       [[   0,    0,    0, 1000],
        [   0,    0, 1000,    1],
        [   0, 1000,    1,   10],
        [1000,    1,   10,  100],
        [   1,   10,  100,    0],
        [  10,  100,    0,    0],
        [ 100,    0,    0,    0]]])

We can now sum it along the last axis to have the sum you want

Bconv = Bslide.sum(axis=2)
#array([[   1,   11,  111, 1111, 1110, 1100, 1000],
#       [  10,  110, 1110, 1111, 1101, 1001,    1],
#       [ 100, 1100, 1101, 1111, 1011,   11,   10],
#       [1000, 1001, 1011, 1111,  111,  110,  100]])

Almost there. We just need some 0 padding and sliding again

C=C=np.pad(Bconv, ((0,0), (len(A)-1, len(A)-1)))
#array([[   0,    0,    0,    1,   11,  111, 1111, 1110, 1100, 1000,    0,    0,    0],
#       [   0,    0,    0,   10,  110, 1110, 1111, 1101, 1001,    1,    0,    0,    0],
#       [   0,    0,    0,  100, 1100, 1101, 1111, 1011,   11,   10,    0,    0,    0],
#       [   0,    0,    0, 1000, 1001, 1011, 1111,  111,  110,  100,    0,    0,    0]])

D=np.lib.stride_tricks.as_strided(C[:,len(A)-1], shape=(len(C), C.shape[1]-len(A)+1), strides=(C.strides[0]-C.strides[1], C.strides[1]))

Which results in

array([[   1,   11,  111, 1111, 1110, 1100, 1000,    0,    0,    0],
       [   0,   10,  110, 1110, 1111, 1101, 1001,    1,    0,    0],
       [   0,    0,  100, 1100, 1101, 1111, 1011,   11,   10,    0],
       [   0,    0,    0, 1000, 1001, 1011, 1111,  111,  110,  100]])

The last line deserve some explanation.

as_strided is what sliding_window_view uses under the hood. It means, "use the same data, but with different shape and strides.

Important thing to understand is that a numpy array is a "pointer" to some data, a dtype, a shape and strides.

If an array arr points to some raw data, has shape (w1,w2,...,wn), and strides (s1, s2, ..., sn), that means that arr[i1, i2, ..., in] is to be found at data[i1*s1+i2*s2+...+in*sn]

Usually we don't need to care about that. But point this, all N-Dim arrays are 1D array under the hood. A bit like a 2D image can be represented as a 1D array of W×H pixels, whose pixel (x,y) is at address arr[y*W+x*1].

So here, D is made of the data found in C. Well, not exactly, in C[:, len(A)-1], that is where the first 1 is found in C.

Its shape is the same as D but with 3 elements less per row (we need either 3 0 to the right, 0 to the left, or in between, but not both)

And because its strides for axis 1 (second element of the stride pair) is the same as the one of C, that means that D[0,1] is the element after D[0,0] on the same row in C. Generally speaking D[y,x+k] is the element that is found k columns after the one where you find D[y,x] in C

And where it becomes trickier (but after all, the module is name stride_tricks), is what happens when you change row. D[1,0] is what you find in C one row under and one column before where you find D[0,0] in C (because 1st stride of D is the 1st stride of C minus the second 1. So going one row down in D is the same as going one row down and one column left in C)

Just for fun, one-liner

N=len(A)
S=A.strides[0]
D=np.lib.stride_tricks.as_strided(np.pad(np.lib.stride_tricks.sliding_window_view(np.pad(np.lib.stride_tricks.sliding_window_view(np.hstack([A,A])[:-1], (N,)), ((0,0), (N-1, N-1))), (N,), axis=1).sum(axis=2), ((0,0), (N-1, N-1)))[:,N-1], shape=(N, 3*N-2), strides=(S*(4*N-3)-S, S))

(Well, almost one-liner. I use some useful aliases to avoid an even longer line. N is 4, and S, probably, 8)

Dataframe

And of course, since you asked a dataframe

df=pd.DataFrame(D, columns = list(np.arange(1,3*N-1)))
df['category']=list(range(1,N+1))

It may seems convoluted (pun intended, since that is some convolution variants), but there are no for loops. Which in the long run beats any solution using a for loop

edited Jul 29 at 12:55

answered Jul 29 at 12:35

chrslg

15.2k11 gold badges26 silver badges42 bronze badges

6 Comments

simon Jul 29 at 12:42

I tried with A = np.array([1,10,100,1000,10000]) and it gives me an empty column both to the very left and very right. Likewise, with A = np.array([1,10,100]), results don't seem to be correct (first row starts with 11 rather than 1).

simon Jul 29 at 12:57

Indeed, I tried before you changed Bslide = np.lib.stride_tricks.sliding_window_view(Bpad, (4,), axis=1) to Bslide = np.lib.stride_tricks.sliding_window_view(Bpad, (len(A),), axis=1). Looks good now.

chrslg Jul 29 at 13:06

(Note that I've used [1, 10, 100, 1000] as array to make it easy to debug. But it would have been even easier to visualize with [1,20,300,4000] :D)

simon Jul 30 at 14:36

Using the walrus operator := to define your aliases N, S in-line, you can make your one-liner a true one-liner without making the line much longer (I tried with Python 3.11):

np.lib.stride_tricks.as_strided(np.pad(np.lib.stride_tricks.sliding_window_view(np.pad(np.lib.stride_tricks.sliding_window_view(np.hstack([A,A])[:-1], (N:=len(A),)), ((0,0), (N-1, N-1))), (N,), axis=1).sum(axis=2), ((0,0), (N-1, N-1)))[:,N-1], shape=(N, 3*N-2), strides=((S:=A.strides[0])*(4*N-3)-S, S))

chrslg Jul 30 at 14:47

@simon, yes indeed. I usually avoid := in answers (not in my code. I love it), because it obfuscate things (especially when they are already naturally obfuscated). But for the one-liner, since the whole point of one-liner is to be one-liner, not to be readable, you're right, I should have used it :D

|

simon · Accepted Answer · 2025-07-30 10:02:49Z

Here is an implementation that requires looping once over the given numbers (called a_s below):

import numpy as np
import pandas as pd

def blocked_matrix_for(a_s, as_dataframe=False):
    rows, cols = len(a_s), (3 * len(a_s) - 2)
    # Provide zero-padded block diagonal matrix of zeros and ones
    block = np.eye(2, 3).repeat(rows, axis=0).repeat(rows, axis=1)[:, :cols]
    result = np.zeros((rows, cols))
    for i, a in enumerate(a_s):
        # Cut lower/upper bounds, roll left/right bounds, multiply value, add up
        lo, up = (rows - 1 - i), (2 * rows - 1 - i)
        result += a * np.roll(block[lo:up], shift=i, axis=1)
    if as_dataframe:
        index = pd.Index(range(1, rows + 1), name="Category")
        result = pd.DataFrame(result, index=index, columns=range(1, cols + 1))
    return result

This produces:

np.set_printoptions(linewidth=120)  # Avoid line breaks

print(blocked_matrix_for((1, 10, 100), as_dataframe=True).astype(int))
#           1   2    3    4    5   6   7
# Category                              
# 1         1  11  111  110  100   0   0
# 2         0  10  110  111  101   1   0
# 3         0   0  100  101  111  11  10
print(blocked_matrix_for((1, 10, 100, 1000)).astype(int))
# [[   1   11  111 1111 1110 1100 1000    0    0    0]
#  [   0   10  110 1110 1111 1101 1001    1    0    0]
#  [   0    0  100 1100 1101 1111 1011   11   10    0]
#  [   0    0    0 1000 1001 1011 1111  111  110  100]]
print(blocked_matrix_for((1, 10, 100, 1000, 10000)).astype(int))
# [[    1    11   111  1111 11111 11110 11100 11000 10000     0     0     0     0]
#  [    0    10   110  1110 11110 11111 11101 11001 10001     1     0     0     0]
#  [    0     0   100  1100 11100 11101 11111 11011 10011    11    10     0     0]
#  [    0     0     0  1000 11000 11001 11011 11111 10111   111   110   100     0]
#  [    0     0     0     0 10000 10001 10011 10111 11111  1111  1110  1100  1000]]

The basic idea is to provide 2×2 block matrices for all numbers, then cutting and rolling them to the required ranges and positions, and adding them up. There are probably also solutions that avoid the loop and that waste less intermediate memory.

Timings

Here is some timing code of the currently available four answers that provide the required results. I left out the creation of Pandas dataframes from the intermediate NumPy arrays but otherwise made no code adjustments. This includes mozway's answer as well as the changes to Jokilos's answer from earlier today (2025-07-30):

from timeit import Timer
import matplotlib.pyplot as plt
import numpy as np

rand = np.random.default_rng(42)

def chrslg(A):
    N=len(A)
    S=A.strides[0]
    return np.lib.stride_tricks.as_strided(np.pad(np.lib.stride_tricks.sliding_window_view(np.pad(np.lib.stride_tricks.sliding_window_view(np.hstack([A,A])[:-1], (N,)), ((0,0), (N-1, N-1))), (N,), axis=1).sum(axis=2), ((0,0), (N-1, N-1)))[:,N-1], shape=(N, 3*N-2), strides=(S*(4*N-3)-S, S))

def simon(a_s):
    rows, cols = len(a_s), (3 * len(a_s) - 2)
    block = np.eye(2, 3).repeat(rows, axis=0).repeat(rows, axis=1)[:, :cols]
    result = np.zeros((rows, cols))
    for i, a in enumerate(a_s):
        lo, up = (rows - 1 - i), (2 * rows - 1 - i)
        result += a * np.roll(block[lo:up], shift=i, axis=1)
    return result

def jokilos(As):
    N = len(As)
    result = np.zeros((N, 3 * N - 2))
    for shift in range(N):
        A = np.zeros((N, 3 * N - 2))
        S = shift
        M = N + S 
        M_ = 2 * N + S 
        A[:(S + 1), S:M] = 1
        A[(S + 1):, M:M_] = 1
        result += As[shift] * A
    return result

def mozway(A):
    N = len(A)
    X = np.arange(N)
    Y = (X[:, None] + (X[:, None] < X)*N)[..., None] + X[None, None]
    out = np.zeros((N, 3*N-2), dtype=A.dtype)
    np.add.at(out, (X[:, None], Y), A[:, None, None])
    return out

functions = (chrslg, simon, jokilos, mozway)

# Sanity check: Do all produce the same result?
vals = rand.normal(size=10)
ref = functions[0]
for fct in functions[1:]:
    assert np.allclose(ref(vals), fct(vals))

# Actual timings
n_s = [1, 3, 10, 30, 100, 300, 1000]
num_timings = 10
times_by_name = {fct.__name__: [] for fct in functions}
for n in n_s:
    vals = rand.normal(size=n)
    for fct in functions:
        t = Timer(lambda: fct(vals)).timeit(num_timings)
        times_by_name[fct.__name__].append(t)

# Plot results        
for name, times in times_by_name.items():
    plt.loglog(n_s, np.divide(times, num_timings), label=name.capitalize())
plt.xlabel("number of values")
plt.ylabel("time (s)")
plt.legend()
plt.show()

This produces on my system: So, for "small" matrices (i.e. up to about N=30 input values), Jokilos's approach seems to be the fastest; for "large" matrices after the crossing point, chrslg's approach appears to be the winner (by about an order of magnitude in the end – note that this is a log-log plot).

mozway · Accepted Answer · 2025-07-30 09:16:54Z

Following the same observation as @Jokilos, it is possible to solve this problem with indexing.

Assuming A = np.array([1, 10, 100, 1000]), the values to be added follow this pattern:

array([[[   1,    1,    1,    1,    0,    0,    0,    0,    0,    0],
        [   0,    0,    0,    0,    1,    1,    1,    1,    0,    0],
        [   0,    0,    0,    0,    1,    1,    1,    1,    0,    0],
        [   0,    0,    0,    0,    1,    1,    1,    1,    0,    0]],

       [[   0,   10,   10,   10,   10,    0,    0,    0,    0,    0],
        [   0,   10,   10,   10,   10,    0,    0,    0,    0,    0],
        [   0,    0,    0,    0,    0,   10,   10,   10,   10,    0],
        [   0,    0,    0,    0,    0,   10,   10,   10,   10,    0]],

       [[   0,    0,  100,  100,  100,  100,    0,    0,    0,    0],
        [   0,    0,  100,  100,  100,  100,    0,    0,    0,    0],
        [   0,    0,  100,  100,  100,  100,    0,    0,    0,    0],
        [   0,    0,    0,    0,    0,    0,  100,  100,  100,  100]],

       [[   0,    0,    0, 1000, 1000, 1000, 1000,    0,    0,    0],
        [   0,    0,    0, 1000, 1000, 1000, 1000,    0,    0,    0],
        [   0,    0,    0, 1000, 1000, 1000, 1000,    0,    0,    0],
        [   0,    0,    0, 1000, 1000, 1000, 1000,    0,    0,    0]]])

Therefore, carefully designing an indexer should make it possible to assign the correct values directly in the output.

On can consider different points:

the output array has a (N, 3*N-2) shape
the first value appears once in the first row before being shifted, the second twice, the third three times, etc.
the shift is equal to N
in addition, each value is shifted by its own position (0 for the first row, 1 for the second, etc.)

One can express this as:

N = len(A)       # 4

X = np.arange(N) # array([0, 1, 2, 3])

mask = (X[:, None] < X)
# array([[False,  True,  True,  True],
#        [False, False,  True,  True],
#        [False, False, False,  True],
#        [False, False, False, False]])

# compute indexer
Y = (X[:, None] + mask*N)[..., None] + X[None, None]
# array([[[0, 1, 2, 3],
#         [4, 5, 6, 7],
#         [4, 5, 6, 7],
#         [4, 5, 6, 7]],
# 
#        [[1, 2, 3, 4],
#         [1, 2, 3, 4],
#         [5, 6, 7, 8],
#         [5, 6, 7, 8]],
# 
#        [[2, 3, 4, 5],
#         [2, 3, 4, 5],
#         [2, 3, 4, 5],
#         [6, 7, 8, 9]],
# 
#        [[3, 4, 5, 6],
#         [3, 4, 5, 6],
#         [3, 4, 5, 6],
#         [3, 4, 5, 6]]])

Which allows us to perform indexing. There are two approaches, using an intermediate 3D array and summing, or assigning and summing directly to an output 2D array with numpy.add.at:

tmp = np.zeros((N, N, 3*N-2), dtype=A.dtype)

tmp[X[:, None, None], X[None, :, None], Y] = A[:, None, None]
# array([[[   1,    1,    1,    1,    0,    0,    0,    0,    0,    0],
#         [   0,    0,    0,    0,    1,    1,    1,    1,    0,    0],
#         [   0,    0,    0,    0,    1,    1,    1,    1,    0,    0],
#         [   0,    0,    0,    0,    1,    1,    1,    1,    0,    0]],
#  
#        [[   0,   10,   10,   10,   10,    0,    0,    0,    0,    0],
#         [   0,   10,   10,   10,   10,    0,    0,    0,    0,    0],
#         [   0,    0,    0,    0,    0,   10,   10,   10,   10,    0],
#         [   0,    0,    0,    0,    0,   10,   10,   10,   10,    0]],
#  
#        [[   0,    0,  100,  100,  100,  100,    0,    0,    0,    0],
#         [   0,    0,  100,  100,  100,  100,    0,    0,    0,    0],
#         [   0,    0,  100,  100,  100,  100,    0,    0,    0,    0],
#         [   0,    0,    0,    0,    0,    0,  100,  100,  100,  100]],
#  
#        [[   0,    0,    0, 1000, 1000, 1000, 1000,    0,    0,    0],
#         [   0,    0,    0, 1000, 1000, 1000, 1000,    0,    0,    0],
#         [   0,    0,    0, 1000, 1000, 1000, 1000,    0,    0,    0],
#         [   0,    0,    0, 1000, 1000, 1000, 1000,    0,    0,    0]]])

out = tmp.sum(axis=0)
# array([[   1,   11,  111, 1111, 1110, 1100, 1000,    0,    0,    0],
#        [   0,   10,  110, 1110, 1111, 1101, 1001,    1,    0,    0],
#        [   0,    0,  100, 1100, 1101, 1111, 1011,   11,   10,    0],
#        [   0,    0,    0, 1000, 1001, 1011, 1111,  111,  110,  100]])

Without the intermediate 3D array, using numpy.add.at:

out = np.zeros((N, 3*N-2), dtype=A.dtype)

np.add.at(out, (X[:, None], Y), A[:, None, None])

out
# array([[   1,   11,  111, 1111, 1110, 1100, 1000,    0,    0,    0],
#        [   0,   10,  110, 1110, 1111, 1101, 1001,    1,    0,    0],
#        [   0,    0,  100, 1100, 1101, 1111, 1011,   11,   10,    0],
#        [   0,    0,    0, 1000, 1001, 1011, 1111,  111,  110,  100]])

full code and DataFrame creation

N = len(A)
X = np.arange(N)
Y = (X[:, None] + (X[:, None] < X)*N)[..., None] + X[None, None]
out = np.zeros((N, 3*N-2), dtype=A.dtype)
np.add.at(out, (X[:, None], Y), A[:, None, None])

df = pd.DataFrame(out).rename(columns=lambda x: x+1, index=lambda x: x+1)
print(df)

Output:

    1   2    3     4     5     6     7    8    9   10
1   1  11  111  1111  1110  1100  1000    0    0    0
2   0  10  110  1110  1111  1101  1001    1    0    0
3   0   0  100  1100  1101  1111  1011   11   10    0
4   0   0    0  1000  1001  1011  1111  111  110  100

strawdog · Accepted Answer · 2025-07-29 12:01:24Z

1

I didn't have enought time to review and optimize this code, but first that came to my mins was using deque, makeing dataframe, reindexing and shifting:

from collections import deque
import pandas as pd

l = ["a", "b", "c", "d"]
m = deque(l)
total = []
for i in range(len(m)):
    d = deque()
    data = []
    for j in range(len(m)*2-1):
        if j < len(m):
            d.append(m[j])
        else:
            d.popleft()
        data.append(list(d))
    total.append(data)
    m.rotate(-1)
df = pd.DataFrame(total)

res = df.T.reindex(range(0,len(l)*3-2),fill_value=0)
res = res.apply(lambda x:x.shift(x.name, fill_value=0), axis=0)
res = res.T
print(res)

     0       1          2             3             4             5             6          7       8    9
0  [a]  [a, b]  [a, b, c]  [a, b, c, d]     [b, c, d]        [c, d]           [d]          0       0    0
1    0     [b]     [b, c]     [b, c, d]  [b, c, d, a]     [c, d, a]        [d, a]        [a]       0    0
2    0       0        [c]        [c, d]     [c, d, a]  [c, d, a, b]     [d, a, b]     [a, b]     [b]    0
3    0       0          0           [d]        [d, a]     [d, a, b]  [d, a, b, c]  [a, b, c]  [b, c]  [c]

answered Jul 29 at 12:01

strawdog

7321 gold badge7 silver badges14 bronze badges

2 Comments

Reinderien Jul 29 at 12:09

This is wrong - those should be sums, not inner lists.

simon Jul 29 at 12:20

@Reinderien Shouldn't be too difficult to replace the list creations by additions though.

Collectives™ on Stack Overflow

How to generate this simple dataframe from these numbers?

5 Answers 5

1 Comment

Just for fun, one-liner

Dataframe

6 Comments

Timings

Comments

full code and DataFrame creation

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Just for fun, one-liner

Dataframe

6 Comments

Timings

Comments

full code and DataFrame creation

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related