How use iterrows to loop dataframe through function that takes 3 arguments to populate new dataframe

Question

What I have:

1) a list of GPS coordinates: latitude, longitude, and ID.

2) a defined function to scrape the last 24 hours of hourly temperature and humidity data. it returns a dataframe of 3 columns: temperature, humidity, ID, and hourly data as a DatetimeIndex. the function accepts 3 arguments: lat, lon, ID.

What I want:

To edit the function to join the ID column for each time iterrows is passed

Here is the function that works for one lat/lon/ID set:

# grab only weather of interest
attributes = [u'temperature', u'humidity']

# 24 hours ago #round to closest hour
date = dt.datetime.now().replace(microsecond=0,second=0,minute=0) - 
dt.timedelta(hours=24)

#initalize
times = []
data = {}
for attr in attributes:
    data[attr] = []

def scrape_weather(LAT, LON, Id):
    for offset in range(1,2): #i.e 1 day
        forecast = forecastio.load_forecast(api_key, LAT, LON, 
    time=date+dt.timedelta(offset), units = 'ca' )
        h = forecast.hourly()
        d = h.data
        for p in d:
            times.append(p.time)
            try:
                for i in attributes:
                    data[i].append(p.d[i])
            except:
                print(KeyError)

    df2 = pd.DataFrame(data)
    df1 = pd.DataFrame(times)

    df1.reset_index(drop=True, inplace=True)
    df2.reset_index(drop=True, inplace=True)
    dfweather = pd.concat([df1, df2], axis=1)

    dfweather['ID'] = Id
    dfweather = dfweather.set_index(pd.DatetimeIndex(dfweather[0]))
    dfweather = dfweather.drop([0], axis=1)

    return dfweather

This works fine when passing a singe column of the dataframe with the lat/lon/Ids

scrape_weather(df.at[0,'latitude'],df.at[0,'longitude'], df.at[0,'Id'])

But when I pass

for index, row in dummy_gps.iterrows():
    test = scrape_weather(row['longitude'],row['latitude'], row['Id'])

The expected results of the look something like this:

                 temperature humidity ID

2019-05-14 07:00:00 22.58   0.34    1
2019-05-14 08:00:00 20.50   0.42    1
.... 
2019-05-14 07:00:00 22.58   0.34    2
2019-05-14 08:00:00 20.50   0.42    2
....

But instead the Ids are wrong and only one ID is copy pasted for over everyone like this:

                 temperature humidity ID

2019-05-14 07:00:00 22.58   0.34    2
2019-05-14 08:00:00 20.50   0.42    2
.... 
2019-05-14 07:00:00 22.58   0.34    2
2019-05-14 08:00:00 20.50   0.42    2
....

So i am unsure where in the weather scraper function to add the ID logic to ensure each ID is being associated with each forecast

Bertil Johannes Ipsen · Accepted Answer · 2019-05-15 20:46:52Z

1

New answer

import pandas as pd
import forecastio
import datetime as dt


def scrape_weather(row):
    forecast = forecastio.load_forecast(api_key,
                                        lat = row['latitude'], 
                                        lng = row['longitude'], 
                                        time = date,
                                        units = 'ca' )
    h = forecast.hourly()
    d = h.data
    dfweather = pd.DataFrame({'times': [p.time for p in d],
                              'temps': [p.temperature for p in d],
                              'humidity': [p.humidity for p in d],
                              'gatewayID': row['Id']
                             })

    return dfweather


# Sample dataframe
id_col = [1, 2, 3, 4, 5, 6, 7]
lng = ['86.44511', '-121.13295', '-162.74005', '22.34765', '-152.18709', '-152.18709', '-107.65340']
lat = ['-18.67825', '-20.84215', '57.31227', '6.15070', '-27.72616', '-27.72616', '6.15863']
df = pd.DataFrame({'Id':id_col, 'latitude':lat, 'longitude':lng})

api_key = ###############################

# 24 hours ago #round to closest hour
date = dt.datetime.now().replace(microsecond=0,second=0,minute=0) - dt.timedelta(hours=24)

out = df.apply(scrape_weather, axis=1)
out = pd.concat([df for df in out])

Old answer

If I understand you correctly, could you do something like this?

df = pd.DataFrame({'LAT':[1,2,3],'LON':[1,2,3],'ID':[1,2,3]})

def scrape_weather(row):
    temperature = row['LAT'] # change this to what you need to do
    humidity = row['LON'] # change this to what you need to do
    id = row['ID'] # change this to what you need to do
    return temperature, humidity, id

new_df = pd.DataFrame(columns=['temp', 'hum', 'id'])
new_df['temp'], new_df['hum'], new_df['id'] = df.apply(scrape_weather, axis=1)

Which gives me

    temp    hum     id
0   1       2       3
1   1       2       3
2   1       2       3

edited May 15, 2019 at 20:46

answered May 14, 2019 at 22:58

Bertil Johannes Ipsen

1,7761 gold badge16 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Evan Over a year ago

cool idea. thanks. let me give this a shot. and will come back

Evan Over a year ago

Hey Bertil, can you take a look at my modified post? Using iterrows I managed to get the result I almost want, except the ID is not being treated correctly.

Evan Over a year ago

It works thank you - i had to use the same try - except return keyError around each "temps = [p.temperature for p in d]" to handle hours with missing data. thanks again!

Collectives™ on Stack Overflow

How use iterrows to loop dataframe through function that takes 3 arguments to populate new dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related