0

I am trying to complete the below question and am running into some creating this function and assigning string values to the various dates. How would I go about creating this function to return the various string values.

1 Answer 1

2

Here is my suggestion :

EDIT : using this question for the regex to check time validity

import pandas as pd
import numpy as np
import re 


def compute_time_day_year(data_dict):
    """

    returns: pandas DataFrame with variables weekday (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday),
    time_of_day (Morning - 06:00-11:59, Afternoon - 12:00-17:59, Evening - 18:00-23:59, Night - 00:00 - 05:59),
    and season (Summer for June, July and August; Autumn for September, October, November; Winter for
    December, January, February; Spring for March, April, May). If either of the input parameters is in incorrect
    form, the function returns INVALID in all outputs
    """

    df = pd.DataFrame({'year': (data_dict['Year']),
                       'month': (data_dict['Crash_Month']),
                       'day': (data_dict['Crash_Day']),
                       'time': (data_dict['Crash_Time'])})
    df["invalid_time"] = df["time"].apply(lambda x: not re.match("^([0-1]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]$", x))

    df.loc[~df["invalid_time"], 'date'] = pd.to_datetime(df.loc[~df["invalid_time"],'year'].astype(str)
                               + df.loc[~df["invalid_time"],'month'].astype(str).str.zfill(2)
                               + df.loc[~df["invalid_time"],'day'].astype(str).str.zfill(2)
                               + ' '
                               + df.loc[~df["invalid_time"],'time'].astype(str), format='%Y%m%d %H:%M:%S')

    df.loc[~df["invalid_time"],'weekday'] = df.loc[~df["invalid_time"],"date"].dt.day_name(locale="english")
    df.loc[~df["invalid_time"],'season'] = (df.loc[~df["invalid_time"],'date'] - pd.DateOffset(months=1))\
        .dt.quarter
    df.loc[~df["invalid_time"],'season'] = df.loc[~df["invalid_time"],'season']\
        .map({1: 'Winter', 2: 'Spring', 3: 'Summer', 4: 'Autumn'})
    df.loc[~df["invalid_time"],"time_of_day"] = df.loc[~df["invalid_time"],"date"].dt.hour\
        .apply(lambda x: np.select(
            [0 < x <= 6,
             6 < x <= 12,
             12 < x <= 18,
             18 < x <= 24],
            ["Night", "Morning", "Afternoon", "Evening"]
        ))

    df.loc[df["invalid_time"],["weekday", "season", "time_of_day"]] = "Invalid"

    return df.loc[:, ["weekday", "season", "time_of_day"]]

data_dict = {'Year': [2018, 2019, 2020],
             'Crash_Month': [1, 2, 3],
             'Crash_Day': [4, 5, 6],
             'Crash_Time': ["8:00:00", '26:22:00', '8:12:00']}
compute_time_day_year(data_dict)

And it returns for this example:

   weekday   season time_of_day
0  Thursday   Autumn     Morning
1   Invalid  Invalid     Invalid
2    Friday   Winter     Morning

Hope it helps.

Sign up to request clarification or add additional context in comments.

7 Comments

thanks a lot for that. I actually imported the data from a CSV, hence why I reference fields as data_dict[...]. This coding doesn't seem to work for the data_dict CSV, do you have any other ideas as to how to code it
@nickp I changed my answer to match your question.
Thanks again. Just to clarify, I want an invalid output to come up if one of the values in Crash_Time is not a time. How would I code this? I have tried if type != time, which does not work
thanks so much! just one more thing, I only want the invalid output to come up for those entries that are invalid, and not the whole data frame being invalid. Is there a way to do this? At the moment, if I add in the If statement, it only returns 1 row saying INVALID, however, I would still like to be able to see all the other rows if they are not INVALID. Also, I would like to print the first 10 records from the data frame, I assume there is a way to do that?
You can filter the resulting df based on values for instance if df is the output of my example, so see only valid use df.loc[df["time_of_day"] != "invalid", :] and to see only invalid df.loc[df["time_of_day"] == "invalid", :]. If the answer suits you could you mark it as accepted? There is no obligation to do so :).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.