0

What I'm looking to do is create new columns that capture each word in a string. Example is here:

df

Col1        Col2  
 38       'My Name is John'
 11       'Hello friend'
 134      'My favorite city is New Orleans'

desired df:

Col1    Col2    Col3    Col4    Col5    Col6    Col7
38      'My'   'Name'    'is'   'John'   NA      NA
11     'Hello' 'friend'   NA     NA      NA      NA
134     'My'  'favorite' 'city' 'is'    'New' 'Orleans'

Does anybody have any ideas for this? Thanks!

6 Answers 6

4

You can create it using this:

import pandas as pd

df = pd.DataFrame({'Col1': [38, 11, 134], 
                    'Col2':['My Name is John', 'Hello friend', 'My favorite city is New Orleans']})
 

df1 = df.Col2.str.split(expand=True) 

df1.columns = ['Col1', 'Col2', 'Col3', 'Col4', 'Col5', 'Col6']
Sign up to request clarification or add additional context in comments.

Comments

3

The method str.split will split the strings in the columns in a list of words. Then you can fill the lists so that they all have the same length and create a new dataframe out of this:

words = df.text.str.split()
maxlen = words.map(len).max()

def pad_list(l):
    return l + [None] * (maxlen - len(l))

words = pd.DataFrame(np.stack(words.map(pad_list), axis=0))

Comments

0

You can convert Col2 into a series that contains a list with the words and then convert the series back into a DataFrame.

import pandas as pd

# OPs data
x = ['My Name is John', 'Hello friend', 'My favorite city is New Orleans']
df = pd.DataFrame(x, columns=['Col2'])

# convert to series with lists
s = df['Col2'].apply(lambda x: x.split())

# lists to dataframe
df_new = pd.DataFrame(item for item in s)

Comments

0

This function should do. It does not use pandas dataframes but it calls the colums by their keys and stores them in their keys, again. So, it should be completely compatible with pandas.

def splitcolumn(data,colname):
    ncol = 0
    for s in data[colname]:
        n = len(s.split(' '))
        if n>ncol:ncol=n   
    addcol = {}
    for i in range(ncol):
        addcol["{0}_{1}".format(colname,i)] = []
    for s in data[colname]:
        elements = s.split(' ')
        for i in range(ncol):
            try:
                addcol["{0}_{1}".format(colname,i)].append(elements[i])
            except:
                addcol["{0}_{1}".format(colname,i)].append(None)
    for a in addcol.keys():
        data[a] = addcol[a]
    return data
                
data = {'Col2': ['My Name is John','Hello friend','My favorite city is New Orleans']}

data = splitcolumn(data,'Col2')

Other than in your example, the column Col2 is not overwritten or removed, instead the new columns are added to the end of the data as Col2_0, Col2_1, etc.

Comments

0

Try this,

import pandas as pd 

data = [[38, 'My Name is John'], [11, 'Hello friend'], [114, 'My favorite city is New Orleans']] 
df = pd.DataFrame(data)
df1 = df.join(df[1].str.split(' ', expand=True).rename(columns={0:'A', 1:'B', 2:'C'}))
df1.drop([1], axis = 1, inplace = True) 
print(df1)

Output:

     0      A         B     C     3     4        5
0   38     My      Name    is  John  None     None
1   11  Hello    friend  None  None  None     None
2  114     My  favorite  city    is   New  Orleans

Comments

0

try this,

import pandas as pd

split_ = df.Col2.str.split(expand=True)
split_.columns = [f"Col{x}" for x in range(2, split_.columns.size + 2)]

print(
    pd.concat([df.Col1, split_], axis=1)
)

   Col1   Col2      Col3  Col4  Col5  Col6     Col7
0    38     My      Name    is  John  None     None
1    11  Hello    friend  None  None  None     None
2   134     My  favorite  city    is   New  Orleans

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.