1

I have a DataFrame A with one column location_ms. I want to split by ; and : to get DataFrame B.

DataFrame A(Beginning):

Beginning

DataFrame B(Final):

Final

My code below seems to be very roundabout and would love to see a better implementation towards the problem. By doing the splits I create a DataFrame with each element being a list of lists. Then I flatten that list of lists to create the final DataFrame.

def locpapersrc_table(df):
    toflattenrows = df['location_ms'].str.split(';').apply(lambda x:[c.split(':') for c in x]).values.tolist()
    singlelistoflist = [item for sublist in toflatten for item in sublist]
    tmp = pd.DataFrame(singlelistoflist)
    return tmp

This version2 is slower than the first but is another method that is also very roundabout.

def version2(df):
   xx = df["location_ms"].str.split(';',expand = True).T
   tmp = pd.melt(xx).dropna().drop(['variable'],axis=1)['value'].str.split(':',expand=True)
   return tmp

Thank You!

1

1 Answer 1

2

Try something like this.

split_df = df['location_ms'].str.split(pat=";", expand=True)

Throw in something like this if you want to merge it back into the original dataframe.

df = df.merge(split_df, left_index=True, right_index=True)
df = df.drop('location_ms')

For your new problem (splitting by ; and :):

split_df = df['location_ms'].str.split(pat=";", expand=True)
subsplit_df = pd.DataFrame(index = split_df.index)
for i in range(split_df.shape[1]):
    subsplit_df = subsplit_df.merge(split_df.iloc[:, i].str.split(pat=":", expand=True), left_index=True, right_index=True)
subsplit_df.columns = range(subsplit_df.shape[1])

You can merge it back in as above if you want.

Sign up to request clarification or add additional context in comments.

4 Comments

You need to split by both delimiters, the ";" and the ":"
This does not work since you have a list of lists when delimiting by both characters which then has to be manipulating into the format of the final dataframe.
"I want to split by ; and ; to get DataFrame B" is a direct quote from your problem. I've edited the answer to match your new criteria.
Oops sorry about that typo! If you look at the initial Dataframe and code that sentence would not make sense. Sorry about the mistake! This is more roundabout than the code that I have. You should be using apply instead of iterating through the dataframe.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.