0

I'm new to python. I started using jupyter notebook on a project that i'm doing to get into programming school I wanted to work with covid data. I took the raw data from John Hopskins Git hub via URLs i got data for confirmed cases, deaths and recovered cases. Each set of data is on a different url Everything works fine except recovered cases. apparently i can't access the data since in my code, it returns NaN values for every country. I pushed my code on github so a friend could take a look and he can access some data (not a lot), when i can't I don't get why...

I have another issue; i tried to make a figure with different curves showing the progression of the covid cases in France (i picked France beacuse i'm french) and there's several issues with those curves.

the "recovered"(green) and "deaths"(orange) curves are flat. I was expecting it for the recovered cases since i can't access the data, but i don't get why it would happen witht the deaths cases, since i have values Also, i've been trying to find another way to display the dates (on the y axis). There are so many values, (1 entry a day for the whole covid crisis) that they overlap each other. I put them on vertical but it's not enough

My code is available at : https://github.com/aaanoushka/Projet-OCR-Covid19/blob/main/Analyse_covid19_pays.ipynb?fbclid=IwAR3cjmCze1vJQ101l8wlD4tAx_slhOZQ1YgJ8jpnmso05CLmYoyFL2DofXc

I'd appreciate so much if someone wold be willing to take a look! Feel free to ask me anything, i'll try my best to give you any detail needed

Thank you

2
  • Hey, Anna, welcome to Stack Overflow. You'll have more success (and others will appreciate) if you limit a post to just a single question. In addition, including the specific code that isn't working as expected as code here in your post rather than linking to github will also make it easier for folks to read and answer. That said, in your code, your line recovered_df.head(30) is showing the County/Region column populated for ever row... What do you mean you are getting NaN for country? Commented Feb 11, 2022 at 23:27
  • oh i'm sorry, i'm new to this! i really thought people would find it easier to go on github haha!! thank for your advice!! Commented Feb 12, 2022 at 19:35

1 Answer 1

1

the "recovered"(green) and "deaths"(orange) curves are flat.

There are two issues here.

  1. The data source you are using has discontinued publishing the 'recovery' statistic. You can read the details here. It seems that their concern is that there isn't really a globally consistent definition of 'recovery.' Some places only count confirmed recoveries. Other places say that if a patient is not reported as dead, then they must have recovered.

    You may be able to find another source of this data elsewhere.

  2. The death count is not flat on that plot. It is just very hard to see. If you comment out the confirmed case count plotting, you'll see what I mean:

    plot of just recovery and deaths

    Another way to check this is to compare the last element of confirmed and the last element of deaths:

    print("Most recent death count in France", deaths_fr.iloc[-1])
    print("Most recent case count in France", confirmed_fr.iloc[-1])
    

    Output:

    Most recent death count in France 135264
    Most recent case count in France 21511997
    

    If you plot these two on the same scale, the death count will be squished - there are about 100 times more cases than deaths.

Also, i've been trying to find another way to display the dates (on the y axis)

It looks like the indexes of the dataframes are defined as strings, and not as dates. Try converting them to dates:

deaths_fr.index = pd.to_datetime(deaths_fr.index)
recovered_fr.index = pd.to_datetime(recovered_fr.index)
confirmed_fr.index = pd.to_datetime(confirmed_fr.index)

I get more reasonable axis labels when I do that.

axis as dates

Sign up to request clarification or add additional context in comments.

1 Comment

wow thank you so much! ok i see what the problem was. I've tried this code out of a youtube video from someone who worked on those dataset in april 2020 so obviously i would have so much bigger numbers than he did!! thank you anyway, this helped a lot!!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.