1

I need help with a function that accepts no input and returns a list representing the next row of data extracted from a dataframe

I have tried some iterators but this approach requires me to have an input parameter

def get_next_data_as_list():
    out = list(data.iloc[i])
    i= i + 1
    return out

get_next_data_as_list()

Example output: [1619.5, 1620.0, 1621.0, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1.0, 10.0,
     24.0, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 1615.0, 1614.0, 1613.0, 1612.0, 1611.0, 1610.0,
     1607.0, 1606.0, 1605.0, 1604.0, 1603.0, 1602.0, 1601.5, 1601.0, 1600.0, 7.0, 10.0, 1.0, 10.0, 20.0, 3.0, 20.0,
     27.0, 11.0, 14.0, 35.0, 10.0, 1.0, 10.0, 13.0]

2 Answers 2

1

Thanks so much, MichaelD. I realized that the generator resets when the function is re-called. I actually was able to fix it by creating an init function which returns the generator and assigns it to a variable;

```
def get_next_data_as_list_init():
    for i in range(len(data)):
        yield list(data.iloc[i])

x = get_next_data_as_list_init()
```

and then the main function;

```
def get_next_data_as_list():
    return x.__next__()
```

which only calls the next batch. Thank you very much MichaelD!

Sign up to request clarification or add additional context in comments.

Comments

0

One way to do this is to treat your function as a generator:

In [42]: df = pd.DataFrame({'x1':np.random.randn(5), 'x2':np.random.randn(5)})

In [43]: df
Out[43]:
         x1        x2
0  0.891725  0.653889
1  2.260866 -1.521131
2  0.453874  1.416261
3 -0.821557  0.586106
4  1.042644  0.556396

In [44]: def get_next_data_as_list():
    ...:     for i in range(len(df)):
    ...:         yield list(df.iloc[i])
    ...:

In [45]: for x in get_next_data_as_list():
    ...:     print(x)
    ...:
[0.8917247724868814, 0.6538894234684837]
[2.2608656845849993, -1.521131045383185]
[0.4538742078414329, 1.416260697660083]
[-0.8215569227294447, 0.5861059443795276]
[1.0426436741732399, 0.5563956233997533]

To be more explicit:

In [46]: x = get_next_data_as_list()
In [47]: x.__next__()
Out[47]: [0.8917247724868814, 0.6538894234684837]

In [48]: x.__next__()
Out[49]: [2.2608656845849993, -1.521131045383185]

7 Comments

Thanks very much for your response @MichaelD!!! How can I possibly return each individual row successively only when the function is called and not printing entire rows at once because my dataset has over 1m rows
I only used print() as an example to show that each successive call to the get_next_data_as_list() will return the next item in the list. What is it you want to do with each row? There may be a better way to do what you are looking for using vector functions (which operate on entire rows or columns at once) or you may be better off using apply or map - depending on the use case.
The function is going to pass the rows to an ML model for prediction. I clearly understand your implementation, only that I wasn't able t isolate the result without iteration.
If this answer helped then consider selecting it as the answer so that others searching similar questions later will know this was the best answer to your question.
Done! def get_next_data_as_list(): for i in range(len(df)): yield list(df.iloc[i]) x = get_next_data_as_list() return x.__next__() I was trying to call the __next__() method within the function, however, the output format is <generator object get_next_data_as_list at 0x00000221B06F1B10> Is there a walkaround?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.