2

On Pandas 1.3.4 and Python 3.9.

So I'm having issues filtering for a partial piece of the string. The "Date" column is listed in the format of MM/DD/YYYY HH:MM:SS A/PM where the most recent one is on top. If the date is single digit (example: November 3rd), it does not have the 0 such that it is 11/3 instead of 11/03. Basically I'm looking to go look at column named "Date" and have python read parts of the string to filter for only today.

This is what the original csv looks like. This is what I want to do to the file. Basically looking for a specific date but not any time of that date and implement the =RIGHT() formula. However this is what I end up with with the following code.

from datetime import date
import pandas as pd


df = pd.read_csv(r'file.csv', dtype=str)

today = date.today()
d1 = today.strftime("%m/%#d/%Y")  # to find out what today is

df = pd.DataFrame(df, columns=['New Phone', 'Phone number', 'Date'])
df['New Phone'] = df['Phone number'].str[-10:]

df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)

df_today.to_csv(r'file.csv', index=False)
9
  • You set df_today to the results of your filter, instead you want to save the value of df_today if the results of the filter is true. Contains returns true/false if the rows are matching. Commented Nov 8, 2021 at 20:55
  • 3
    Why don't you convert your datetime column to datetime type, e.g.df = pd.read_csv(..., parse_dates=['Date'])? Then you can do, e.g. df[df.Date.dt.normalize()=='2021-11-03']. Commented Nov 8, 2021 at 20:57
  • I've got a solution, but @FinestRyeBread, what are the possible formats for the Date columns? are they all consistent? Commented Nov 8, 2021 at 21:24
  • @user17242583 they're all consistent in the format of MM/DD/YYYY HH:MM:SS AM/PM so they'll follow the format of something like 11/8/2021 3:31:00 PM or 11/20/2021 3:31:00 PM Commented Nov 8, 2021 at 21:32
  • df=df.assign(NewPhone=df['Phone number'].str[-10:])#Create new column df[df['date']==pd.Timestamp.today().date().strftime("%m/%d/%Y") ]# filter out dates that are not today Commented Nov 8, 2021 at 21:32

1 Answer 1

2

This line is wrong:

df_today = df['Date'].str.contains(f'{d1}',case=False, na=False)

All you're doing there is creating a mask; essentially what that is is just a Pandas series, containg True or False in each row, according to the condition you created the mask in. The spreadsheet get's only FALSE as you showed because non of the items in the Date contain the string that the variable d1 holds...

Instead, try this:

from datetime import date
import pandas as pd

# Load the CSV file, and change around the columns
df = pd.DataFrame(pd.read_csv(r'file.csv', dtype=str), columns=['New Phone', 'Phone number', 'Date'])

# Take the last ten chars of each phone number
df['New Phone'] = df['Phone number'].str[-10:]

# Convert each date string to a pd.Timestamp, removing the time
df['Date'] = pd.to_datetime(df['Date'].str.split(r'\s+', n=1).str[0])

# Get the phone numbers that are from today
df_today = df[df['Date'] == date.today().strftime('%m/%d/%Y')]

# Write the result to the CSV file
df_today.to_csv(r'file.csv', index=False)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.