Database with pandas: adding new data

Question

I have a lot of Excel plains and I load them using pandas, process the data and as an output it writes all data in a Excel plain that is my "database".

The Database has to follow a pattern in the date index, e.g. 2017-01-01 (yyyy-mm-dd), 2017-01-02, 2017-01-03 ... 2017-12-31 ... and so on.

But the plains that are my inputs do not follow a rule with the date. My processing deals with it and do the correctly match with the input plain and output database indexes creating a new file: pd.to_excel('database\databaseFinal.xlsx'). My problem is adding new values to the existing database and still process the indexes to respect the pattern.

for example:

DATABASE.xlsx:

    date         Name1  Name2
    2017-01-01   23.2   18.4
    2017-01-02   21.5   27.7
    2017-01-03   0      0
    2017-01-04   0      0

plain input to update the database:

    date         Name1  
    2017-01-04   32.5

process data... after merging data:

    date         Name1_x  Name2  Name1_y
    2017-01-01   23.2     18.4   0
    2017-01-02   21.5     27.7   0
    2017-01-03   0        0      0
    2017-01-04   0        0      32.5

What I want:

    date         Name1  Name2  
    2017-01-01   23.2   18.4  
    2017-01-02   21.5   27.7   
    2017-01-03   0      0      
    2017-01-04   32.5   0

In this problem I must have as output an excel file. I know that must be an easy and efficient way of dealing with this, but I dont want to my work was in vain

Sean · Accepted Answer · 2018-01-11 17:20:40Z

1

# Make the dataframe
df = pd.DataFrame([['2017-01-01', 23.2, 18.4],
['2017-01-02', 21.5, 27.7],
['2017-01-03', 0.0, 0.0],
['2017-01-04', 0.0, 0.0]]) 
df.columns = ["date","Name1","Name2"] 
df.index = df["date"] 
df = df.drop("date",axis=1)

# Change the value
df.loc["2017-01-04"]["Name1"] = 32.5

answered Jan 11, 2018 at 17:20

Sean

2151 gold badge3 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

P.Tillmann · Accepted Answer · 2018-01-11 17:32:12Z

1

Instead of using merge you can simple append and fill the NAN values with zero.

df1
         date  Name1  Name2
0  2017-01-01   23.2   18.4
1  2017-01-02   21.5   27.7
2  2017-01-03    0.0    0.0
3  2017-01-04    0.0    0.0
df2
         date  Name1
0  2017-01-04   32.5

df1.append(df2).fillna(0)
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
3    0.0    0.0  2017-01-04
0   32.5    0.0  2017-01-04

If you always want to keep the value from the second dataframe you can use drop_duplicate with date as subset:

df1.append(df2).fillna(0).drop_duplicates(subset=['date'], keep='last')
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
0   32.5    0.0  2017-01-04

edited Jan 11, 2018 at 17:32

answered Jan 11, 2018 at 17:11

P.Tillmann

2,12012 silver badges17 bronze badges

3 Comments

Matheus Martins Jerônimo Over a year ago

But in this case 2017-01-04 will be duplicated. And not always my input has the date information and also there are cases that rather than getting the next index value (2017-01-04) i get 2017-01-03

P.Tillmann Over a year ago

What logic defines which values should be kept and which should be thrown away? Do you want to always keep the value from the second df?

Matheus Martins Jerônimo Over a year ago

Yes. I replace the 0 in the database for the non zeros from the input

Collectives™ on Stack Overflow

Database with pandas: adding new data

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related