How to create pandas dataframe and fill it from function?

Question

I have that function:

def count (a,b):
    x = a*b

Values of 'a' and 'b' must be 1...99 for 'a' and 100...800 for 'b'. So the question is how to create pandas dataframe with a-values vertically and b-values horizontally and x-values inside that are counted with 'count' function (using all combinations of a and b)? It must look like that: example

Epsi95 · Accepted Answer · 2020-05-15 15:50:50Z

2

Hope this may help

import pandas as pd

def count(a,b):
    x = a*b
    return x

a = list(range(1,100))
b = list(range(100,801))
data = []
for i in a:
    temp = [i]
    for j in b:
        temp.append(count(i,j))
    data.append(temp)

df = pd.DataFrame(data, columns=["a/b"]+b)
# to save as csv
df.to_csv("data.csv", index=False)

answered May 15, 2020 at 15:50

Epsi95

9,1071 gold badge19 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

cegarza · Accepted Answer · 2020-05-15 15:50:15Z

0

This can be easily done with matrix multiplication:

import pandas as pd
import numpy as np

a = np.arange(1,100)
b = np.arange(100,801)

df = pd.DataFrame(np.matmul(a.reshape(-1,1),b.reshape(1,-1), index=a, columns=b)

answered May 15, 2020 at 15:50

cegarza

1358 bronze badges

1 Comment

Bill Over a year ago

This only works in the case of the specific function in the question. Maybe the OP wants a general method that works with any arbitrary function.

Bill · Accepted Answer · 2021-11-29 07:51:52Z

In this case where your function count is vectorizable, you can use Numpy's fromfunction with a little help from a lambda function to convert the grid co-ordinate indices arrays into your a, b values:

import numpy as np
import pandas as pd

def count(a, b):
    return a*b

a = list(range(1,100))
b = list(range(100,801))

data = np.fromfunction(
    lambda ii, jj: count(np.array(a)[ii], np.array(b)[jj]),
    shape=(len(a), len(b)), 
    dtype='int'
)
df1 = pd.DataFrame(data, index=a, columns=b)
print(df1.iloc[:5, :5])

   100  101  102  103  104
1  100  101  102  103  104
2  200  202  204  206  208
3  300  303  306  309  312
4  400  404  408  412  416
5  500  505  510  515  520

This takes about 731 µs ± 12.6 µs to execute.

The only other solutions I can think of are:

df2 = pd.DataFrame(None, index=a, columns=b)
ii, jj = np.meshgrid(a, b)
for i, j in zip(ii.ravel(), jj.ravel()):
    df2.loc[i, j] = count(i, j)  # this is slow
assert(np.array_equal(df1, df2))

This takes about 4.17 s ± 62.7 ms.

and

df3 = pd.DataFrame(None, index=a, columns=b)
df3 = df3.apply(lambda col: [count(i, col.name) for i in col.index])
assert(np.array_equal(df1, df3))

This takes about 97.3 ms ± 1.96 ms.

I actually think it would be great if there were a Pandas function for this, like pd.from_function(count, index=a, columns=b) perhaps.

Collectives™ on Stack Overflow

How to create pandas dataframe and fill it from function?

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related