1

I have a lot of Excel plains and I load them using pandas, process the data and as an output it writes all data in a Excel plain that is my "database".

The Database has to follow a pattern in the date index, e.g. 2017-01-01 (yyyy-mm-dd), 2017-01-02, 2017-01-03 ... 2017-12-31 ... and so on.

But the plains that are my inputs do not follow a rule with the date. My processing deals with it and do the correctly match with the input plain and output database indexes creating a new file: pd.to_excel('database\databaseFinal.xlsx'). My problem is adding new values to the existing database and still process the indexes to respect the pattern.

for example:

DATABASE.xlsx:

    date         Name1  Name2
    2017-01-01   23.2   18.4
    2017-01-02   21.5   27.7
    2017-01-03   0      0
    2017-01-04   0      0

plain input to update the database:

    date         Name1  
    2017-01-04   32.5

process data... after merging data:

    date         Name1_x  Name2  Name1_y
    2017-01-01   23.2     18.4   0
    2017-01-02   21.5     27.7   0
    2017-01-03   0        0      0
    2017-01-04   0        0      32.5

What I want:

    date         Name1  Name2  
    2017-01-01   23.2   18.4  
    2017-01-02   21.5   27.7   
    2017-01-03   0      0      
    2017-01-04   32.5   0     

In this problem I must have as output an excel file. I know that must be an easy and efficient way of dealing with this, but I dont want to my work was in vain

2 Answers 2

1
# Make the dataframe
df = pd.DataFrame([['2017-01-01', 23.2, 18.4],
['2017-01-02', 21.5, 27.7],
['2017-01-03', 0.0, 0.0],
['2017-01-04', 0.0, 0.0]]) 
df.columns = ["date","Name1","Name2"] 
df.index = df["date"] 
df = df.drop("date",axis=1)

# Change the value
df.loc["2017-01-04"]["Name1"] = 32.5
Sign up to request clarification or add additional context in comments.

Comments

1

Instead of using merge you can simple append and fill the NAN values with zero.

df1
         date  Name1  Name2
0  2017-01-01   23.2   18.4
1  2017-01-02   21.5   27.7
2  2017-01-03    0.0    0.0
3  2017-01-04    0.0    0.0
df2
         date  Name1
0  2017-01-04   32.5

df1.append(df2).fillna(0)
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
3    0.0    0.0  2017-01-04
0   32.5    0.0  2017-01-04

If you always want to keep the value from the second dataframe you can use drop_duplicate with date as subset:

df1.append(df2).fillna(0).drop_duplicates(subset=['date'], keep='last')
   Name1  Name2        date
0   23.2   18.4  2017-01-01
1   21.5   27.7  2017-01-02
2    0.0    0.0  2017-01-03
0   32.5    0.0  2017-01-04

3 Comments

But in this case 2017-01-04 will be duplicated. And not always my input has the date information and also there are cases that rather than getting the next index value (2017-01-04) i get 2017-01-03
What logic defines which values should be kept and which should be thrown away? Do you want to always keep the value from the second df?
Yes. I replace the 0 in the database for the non zeros from the input

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.