2

What I have:

1) a list of GPS coordinates: latitude, longitude, and ID.

2) a defined function to scrape the last 24 hours of hourly temperature and humidity data. it returns a dataframe of 3 columns: temperature, humidity, ID, and hourly data as a DatetimeIndex. the function accepts 3 arguments: lat, lon, ID.

What I want:

  • To edit the function to join the ID column for each time iterrows is passed

Here is the function that works for one lat/lon/ID set:

# grab only weather of interest
attributes = [u'temperature', u'humidity']

# 24 hours ago #round to closest hour
date = dt.datetime.now().replace(microsecond=0,second=0,minute=0) - 
dt.timedelta(hours=24)

#initalize
times = []
data = {}
for attr in attributes:
    data[attr] = []

def scrape_weather(LAT, LON, Id):
    for offset in range(1,2): #i.e 1 day
        forecast = forecastio.load_forecast(api_key, LAT, LON, 
    time=date+dt.timedelta(offset), units = 'ca' )
        h = forecast.hourly()
        d = h.data
        for p in d:
            times.append(p.time)
            try:
                for i in attributes:
                    data[i].append(p.d[i])
            except:
                print(KeyError)

    df2 = pd.DataFrame(data)
    df1 = pd.DataFrame(times)

    df1.reset_index(drop=True, inplace=True)
    df2.reset_index(drop=True, inplace=True)
    dfweather = pd.concat([df1, df2], axis=1)

    dfweather['ID'] = Id
    dfweather = dfweather.set_index(pd.DatetimeIndex(dfweather[0]))
    dfweather = dfweather.drop([0], axis=1)

    return dfweather

This works fine when passing a singe column of the dataframe with the lat/lon/Ids

scrape_weather(df.at[0,'latitude'],df.at[0,'longitude'], df.at[0,'Id'])

But when I pass

for index, row in dummy_gps.iterrows():
    test = scrape_weather(row['longitude'],row['latitude'], row['Id'])

The expected results of the look something like this:

                 temperature humidity ID

2019-05-14 07:00:00 22.58   0.34    1
2019-05-14 08:00:00 20.50   0.42    1
.... 
2019-05-14 07:00:00 22.58   0.34    2
2019-05-14 08:00:00 20.50   0.42    2
....

But instead the Ids are wrong and only one ID is copy pasted for over everyone like this:

                 temperature humidity ID

2019-05-14 07:00:00 22.58   0.34    2
2019-05-14 08:00:00 20.50   0.42    2
.... 
2019-05-14 07:00:00 22.58   0.34    2
2019-05-14 08:00:00 20.50   0.42    2
....

So i am unsure where in the weather scraper function to add the ID logic to ensure each ID is being associated with each forecast

1 Answer 1

1

New answer

import pandas as pd
import forecastio
import datetime as dt


def scrape_weather(row):
    forecast = forecastio.load_forecast(api_key,
                                        lat = row['latitude'], 
                                        lng = row['longitude'], 
                                        time = date,
                                        units = 'ca' )
    h = forecast.hourly()
    d = h.data
    dfweather = pd.DataFrame({'times': [p.time for p in d],
                              'temps': [p.temperature for p in d],
                              'humidity': [p.humidity for p in d],
                              'gatewayID': row['Id']
                             })

    return dfweather


# Sample dataframe
id_col = [1, 2, 3, 4, 5, 6, 7]
lng = ['86.44511', '-121.13295', '-162.74005', '22.34765', '-152.18709', '-152.18709', '-107.65340']
lat = ['-18.67825', '-20.84215', '57.31227', '6.15070', '-27.72616', '-27.72616', '6.15863']
df = pd.DataFrame({'Id':id_col, 'latitude':lat, 'longitude':lng})

api_key = ###############################

# 24 hours ago #round to closest hour
date = dt.datetime.now().replace(microsecond=0,second=0,minute=0) - dt.timedelta(hours=24)

out = df.apply(scrape_weather, axis=1)
out = pd.concat([df for df in out])

Old answer

If I understand you correctly, could you do something like this?

df = pd.DataFrame({'LAT':[1,2,3],'LON':[1,2,3],'ID':[1,2,3]})

def scrape_weather(row):
    temperature = row['LAT'] # change this to what you need to do
    humidity = row['LON'] # change this to what you need to do
    id = row['ID'] # change this to what you need to do
    return temperature, humidity, id

new_df = pd.DataFrame(columns=['temp', 'hum', 'id'])
new_df['temp'], new_df['hum'], new_df['id'] = df.apply(scrape_weather, axis=1)

Which gives me

    temp    hum     id
0   1       2       3
1   1       2       3
2   1       2       3
Sign up to request clarification or add additional context in comments.

3 Comments

cool idea. thanks. let me give this a shot. and will come back
Hey Bertil, can you take a look at my modified post? Using iterrows I managed to get the result I almost want, except the ID is not being treated correctly.
It works thank you - i had to use the same try - except return keyError around each "temps = [p.temperature for p in d]" to handle hours with missing data. thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.