0

I'm trying to do some exploratory data analysis on the data that is provided by CSSE at Johns Hopkins University. They have it on Github at this link https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports I'm trying to download the entire file using python that will save it to my current directory. That way I'll have all the up to date, data and can reload it to use. I'm using two functions fetch_covid_daily_data() that will go to the website and download all the CSV files. Then ill have a load_covid_daily_data() that will go in the current repo and read the data so I can process it with pandas.

I'm doing this way because if I go back to my code I can call the function fetch_covid_daily_data() and it will download all the new changes made such as another daily CSV added.

2 Answers 2

1

You can read data directly from online CSV to Pandas DataFrame:

Examples:

import pandas as pd

CONFIRMED_URL = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'

df = pd.read_csv(CONFIRMED_URL)

# df now contains data from time of call.

You can also create a class to get and manipulate all data


import pandas as pd

class Corona:


    def __init__(self):

        BASE_URL = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series'

        self.URLS = {'confirmed': f'{BASE_URL}/time_series_covid19_confirmed_global.csv',
                'deaths': f'{BASE_URL}/time_series_covid19_deaths_global.csv',
                'recovered':f'{BASE_URL}/time_series_covid19_recovered_global.csv', 
        }


        self.data = {case:pd.read_csv(url) for case, url in self.URLS.items()}

    # create other useful functions to work with data
    def current_status(self):
        # function to show current status
        pass 


To get current data:

# returns data as dictionary with DataFrames as Values
corona = Corona()
confirmed_df = corona.data['confirmed']

# If you want to save them to csv
confirmed_df.to_csv('confirmed.csv', index=False)

# show first five rows
print(corona_df.head())

# check other DataFrame
print(corona.data.keys())
Sign up to request clarification or add additional context in comments.

3 Comments

Hi yes, I have done your first example multiple times but my problem is that I want to collect all those csv in the daily reports and join them together myself. I want to know if theres an easy way to do this in case I come across data that's in multiple csv files and ill need to join them. Im trying to do this on google colab so I don't want to download the data
I also love your idea of using a class!
You can easily do that too. What I love about classes is that they help organise your code. To answer your multiple csv, if there is a pattern in csv names, you can still use the class above with list comprehension to get all csvs and the merge/concat/join then to one. I am happy to help if you provide a sample url if csvs and what you will like to do. See stackoverflow.com/questions/20906474/…
0

Assuming you have git installed, you need to clone the repository from your terminal

git clone https://github.com/CSSEGISandData/COVID-19

hope this helps!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.