0

I have daily stock data inside a list of n dataframes (each stock has its own dataframe). I want to select m rows on equal time intervals from each dataframe and append them to dataframes inside another list. Basically the new list should have m dataframes - which is the number the number of days, and each dataframe length n - the number of stocks. I tried with nested for loops but it just didn't work

cross_section = []
cross_sections_list = []

for m in range(0, len(datalist[0]), 100):    
    for n in range(len(datalist)):
        cross_section.append(datalist[n].iloc[m])
        cross_sections_list.append(cross_section)

this code didnt do anything. my machine just stacked on it. if there is another way like multiindexing for example I would love trying it too.

For example

input:

[
             Adj Close   Ticker  
 Date                           
 2020-06-01  321.850006   AAPL  
 2020-06-02  323.339996   AAPL  
 2020-06-03  325.119995   AAPL  
 2020-06-04  322.320007   AAPL  
 2020-06-05  331.500000   AAPL  
 2020-06-08  333.459991   AAPL  
 2020-06-09  343.989990   AAPL  
 2020-06-10  352.839996   AAPL  ,

             Adj Close    Ticker  
 Date                           
 2020-06-01  182.830002   MSFT  
 2020-06-02  184.910004   MSFT  
 2020-06-03  185.360001   MSFT  
 2020-06-04  182.919998   MSFT  
 2020-06-05  187.199997   MSFT  
 2020-06-08  188.360001   MSFT  
 2020-06-09  189.800003   MSFT  
 2020-06-10  196.839996   MSFT  ]

output:

 [
             Adj Close   Ticker  
 Date                           
 2020-06-01  321.850006   AAPL  
 2020-06-01  182.830002   MSFT  ,

             Adj Close   Ticker  
 Date                           
 2020-06-03  325.119995   AAPL  
 2020-06-03  185.360001   MSFT  ,

             Adj Close   Ticker  
 Date                           
 2020-06-05  331.500000   AAPL  
 2020-06-05  187.199997   MSFT  ]

and so on.

Thank you

4
  • Do you mean to have the step set at 100? Commented Jul 6, 2020 at 22:25
  • Please add an example of data input and expected output. Commented Jul 6, 2020 at 22:46
  • user13802115, Yes Commented Jul 6, 2020 at 22:48
  • Edgar Ramirez, done it. thank you Commented Jul 6, 2020 at 23:11

2 Answers 2

0

Not exactly clear what you want, but here is some code that hopefully helps.

list_of_df = [] #list of all the df's

alldf = pd.concat(list_of_df) #brings all df's into one df

list_of_grouped = [y for x, y in alldf.groupby('Date')] #separates df's by date and puts them in a list

number_of_days = alldf.groupby('Date').ngroups #Total number of groups (Dates)

list_of_Dates = [x for x, y in alldf.groupby('Date')] #List of all the dates that were grouped

count_of_stocks = []
for i in range(len(list_of_grouped)):
    count_of_stocks.append(len(list_of_grouped[i])) #puts count of each grouped df into a list

zipped = list(zip(list_of_data,count_of_stocks)) #combines the dates and count of stocks in a list to see how many stocks are in each date.
Sign up to request clarification or add additional context in comments.

2 Comments

thanks it worked great. I simplified it a bit. all i couldn't do was the 3rd step. will post the answer
Glad I could help. Please reference the following link on how to close the question out. stackoverflow.com/help/someone-answers
0
data_global = pd.DataFrame()
for i in datalist:
    data_global = data_global.append(i) #first merge all dataframes into one

data_by_date = [i for x, i in data_global.groupby('Date')] #step 2: group the data by date

each_nth_day = []
for i in range(0, len(data_by_date), 21):
    each_nth_day.append(data_by_date[i]) #lastly take each n-th dataframe (21 in this case)

thanks for your help user13802115

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.