shifting down rows of specific columns from a specific index in python

Question

I am scraping multiple tables from multiple pages of a website. The issue is there is a row missing from the initial table. Basically, this is how the dataframe looks.

                               mar2018 feb2018 jan2018 dec2017 nov2017                              
                                                                                      oct2017 sep2017 aug2017  

                balls faced      345     561    295       0      645     balls faced    200    58      0
                runs scored      156     281    183       0      389     runs scored    50     20      0
                strike rate      52.3    42.6   61.1      0      52.2    strike rate    25     34      0
                dot balls        223     387    173       0      476     dot balls      125    34      0
                fours            8       12     19        0      22      sixes          2      0       0   
                doubles          20      38     16        0      36      fours          4      2       0
                notout           2       0      0         0      4       doubles        2      0       0
                                                                         notout         4      2       0

the column 'sixes' is missing in the first page and present in the subsequent pages. So, I am trying to move the rows starting from 'fours' to 'not out' to a position down and leave nan's in row 4 for first 5 columns starting from mar2018 to nov2017.

I tried the following code but it isn't working. This is moving the values horizontally but not vertically downward.

df.iloc[4][0:6] = df.iloc[4][0:6].shift(1)

and also

df2 =  pd.DataFrame(index = 4)
df = pd.concat([df.iloc[:], df2, df.iloc[4:]]).reset_index(drop=True)

did not work.

df['mar2018'] = df['mar2018'].shift(1)

But this moves all the values of that column down by 1 row.

So, I was wondering if it is possible to shift down rows of specific columns from a specific index?

jezrael · Accepted Answer · 2018-04-09 13:25:17Z

1

I think need reindex by union by numpy.union1d of all index values:

idx = np.union1d(df1.index, df2.index)

df1 = df1.reindex(idx)
df2 = df2.reindex(idx)

print (df1)
             mar2018  feb2018  jan2018  dec2017  nov2017
balls faced    345.0    561.0    295.0      0.0    645.0
dot balls      223.0    387.0    173.0      0.0    476.0
doubles         20.0     38.0     16.0      0.0     36.0
fours            8.0     12.0     19.0      0.0     22.0
notout           2.0      0.0      0.0      0.0      4.0
runs scored    156.0    281.0    183.0      0.0    389.0
sixes            NaN      NaN      NaN      NaN      NaN
strike rate     52.3     42.6     61.1      0.0     52.2

print (df2)
             oct2017  sep2017  aug2017
balls faced      200       58        0
dot balls        125       34        0
doubles            2        0        0
fours              4        2        0
notout             4        2        0
runs scored       50       20        0
sixes              2        0        0
strike rate       25       34        0

If multiple DataFrames in list is possible use list comprehension:

from functools import reduce

dfs = [df1, df2]
idx = reduce(np.union1d, [x.index for x in dfs])

dfs1 = [df.reindex(idx) for df in dfs]

print (dfs1)
[             mar2018  feb2018  jan2018  dec2017  nov2017
balls faced    345.0    561.0    295.0      0.0    645.0
dot balls      223.0    387.0    173.0      0.0    476.0
doubles         20.0     38.0     16.0      0.0     36.0
fours            8.0     12.0     19.0      0.0     22.0
notout           2.0      0.0      0.0      0.0      4.0
runs scored    156.0    281.0    183.0      0.0    389.0
sixes            NaN      NaN      NaN      NaN      NaN
strike rate     52.3     42.6     61.1      0.0     52.2,      oct2017  sep2017  aug2017
balls faced      200       58        0
dot balls        125       34        0
doubles            2        0        0
fours              4        2        0
notout             4        2        0
runs scored       50       20        0
sixes              2        0        0
strike rate       25       34        0]

edited Apr 9, 2018 at 13:25

answered Apr 9, 2018 at 13:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Jhonny Over a year ago

Thanks a lot @jezrael. np.union1d method is working when I'm printing the tables in each page to different csv's and then joining them. my method while scraping was to concat all the tables to one dataframe and then clean that single dataframe. I just added a picture of the csv file and edited the question so that it's clear. Is there any way to shift the cells in the same dataframe?

jezrael Over a year ago

@Johny I am confused. Your structure is weird, do you create it by concat? Data are confidental? Webpage is confidental? Because in my opinion is best create lost of all dataframes first. And then apply my solution.

jezrael Over a year ago

@Jhonny - OK, thank you. But one thing, what is your code for extract tables? Use pd.read_html or beatifulsopa or something else?

Jhonny Over a year ago

I wasn't able to use pd.read_html as the url is static for multiple pages. I am using beautiful soup - table = BeautifulSoup(url, 'html5lib').find_all('table')[4] for tr in table: #extract tr for td in table: #extract td

jezrael Over a year ago

@Jhonny - thank you. I create one DataFrame with table = soup.find_all('table')[4] and df = pd.read_html(str(table), header=0, index_col=0)[0], but how is possible extract more tables if same url?

|

Collectives™ on Stack Overflow

shifting down rows of specific columns from a specific index in python

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related