1

For some reason, none of the solutions previously posted about this seem to answer my question.

I am reading in an excel page with 150+ sheets. I am looping through them and preparing the data to be concatenated together. (doing things like deleting unneeded/blank columns, and transforming some data) However, for some reason, I cannot get rid of any of the newline characters, no matter what I try. Here are some variations that I've tried so you can see what DIDN'T work.

import pandas as pd
import os

os.chdir(r'C:\Users\agray\Downloads')

sheets_dict = pd.read_excel('2022_Advanced_Control.xlsx', sheet_name=None)
df_list = list(sheets_dict.values())
df_list_clean = []

The top part stays the same, this loop portion is what changes.

for df in df_list:
    df.columns = [c.replace(' ', '_') for c in df.columns]

    df.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1, inplace=True)
    df.drop(df.tail(3).index, inplace=True)
    df.loc[:, 'Prescription_Drug_Name'] = df.loc[:, 'Prescription_Drug_Name'].replace("\n", "", inplace=True)
    df_list_clean.append(df)

This gives me a column that has nothing but blank values.

Here's another way I tried

for df in df_list:
    df.columns = [c.replace(' ', '_') for c in df.columns]

    df.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1, inplace=True)
    df.drop(df.tail(3).index, inplace=True)
    df['Prescription_Drug_Name'] = df['Prescription_Drug_Name'].replace(r'\n','', regex=True, inplace=True)
    df_list_clean.append(df)

This version is only applying to a copy, so none of the changes it says it's making are actually being made to my df. Any ideas how to get rid of all these "/n" characters in my column? Thanks!

9
  • df['Prescription_Drug_Name'].str.strip()? Commented Nov 11, 2021 at 20:59
  • That only removes newline characters from the beginning and end of string. I have strings that look like this: AAAAAA/nBBBBBBBB/nCCCCCCCCCCCCC @Corralien Commented Nov 11, 2021 at 21:00
  • If you use inplace=True. you must not use df['Prescription_Drug_Name'] = Commented Nov 11, 2021 at 21:02
  • /n is not newline. Newline is \n. Commented Nov 11, 2021 at 21:05
  • the /n was a typo. all my text data is \n. Commented Nov 11, 2021 at 21:07

2 Answers 2

1

Use str.strip():

df['Prescription_Drug_Name'] = df['Prescription_Drug_Name'].str.replace(r'\n', '')
Sign up to request clarification or add additional context in comments.

Comments

1

I always advise against inplace=True. Make an explicit copy where you mean to. This version is only applying to a copy, so none of the changes... being made to my df. Why don't you clone your data like this:

for df in df_list:
    clean = df.copy()
    clean.columns = [c.replace(' ', '_') for c in df.columns]

    clean = clean.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1)

    # drop last three rows
    clean = clean.iloc[:-3]

    # modify column, remove `inplace` here
    clean['Prescription_Drug_Name'] = clean['Prescription_Drug_Name'].replace(r'\n','', regex=True)
    df_list_clean.append(clean)

That being said, all of the above can be chained, so you can do something like this:

for df in df_list
    clean = (df.rename(columns=lambda x: x.replace(' ', '_'))
               .reindex(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes'], axis=1).dropna(axis=0, how='all')   # select only the columns
               .iloc[:-3]
               .assign(Prescription_Drug_Name=lambda x: x.replace(r'\n', '', regex=True)
             )
    df_list_clean.append(clean)

1 Comment

Thank you, this is actually unexpectedly useful. I really like that idea of using a copy within my loop. That'll save me some headaches of figuring out if a change was made to the original or a copy.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.