Create new columns from splitting string column

Question

What I'm looking to do is create new columns that capture each word in a string. Example is here:

df

Col1        Col2  
 38       'My Name is John'
 11       'Hello friend'
 134      'My favorite city is New Orleans'

desired df:

Col1    Col2    Col3    Col4    Col5    Col6    Col7
38      'My'   'Name'    'is'   'John'   NA      NA
11     'Hello' 'friend'   NA     NA      NA      NA
134     'My'  'favorite' 'city' 'is'    'New' 'Orleans'

Does anybody have any ideas for this? Thanks!

Olasimbo · Accepted Answer · 2020-07-24 12:47:36Z

4

You can create it using this:

import pandas as pd

df = pd.DataFrame({'Col1': [38, 11, 134], 
                    'Col2':['My Name is John', 'Hello friend', 'My favorite city is New Orleans']})
 

df1 = df.Col2.str.split(expand=True) 

df1.columns = ['Col1', 'Col2', 'Col3', 'Col4', 'Col5', 'Col6']

answered Jul 24, 2020 at 12:47

Olasimbo

1,0637 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

qmeeus · Accepted Answer · 2020-07-24 12:40:40Z

3

The method str.split will split the strings in the columns in a list of words. Then you can fill the lists so that they all have the same length and create a new dataframe out of this:

words = df.text.str.split()
maxlen = words.map(len).max()

def pad_list(l):
    return l + [None] * (maxlen - len(l))

words = pd.DataFrame(np.stack(words.map(pad_list), axis=0))

answered Jul 24, 2020 at 12:40

qmeeus

2,4612 gold badges15 silver badges22 bronze badges

Comments

drops · Accepted Answer · 2020-07-24 12:41:00Z

0

You can convert Col2 into a series that contains a list with the words and then convert the series back into a DataFrame.

import pandas as pd

# OPs data
x = ['My Name is John', 'Hello friend', 'My favorite city is New Orleans']
df = pd.DataFrame(x, columns=['Col2'])

# convert to series with lists
s = df['Col2'].apply(lambda x: x.split())

# lists to dataframe
df_new = pd.DataFrame(item for item in s)

answered Jul 24, 2020 at 12:41

drops

1,6141 gold badge15 silver badges21 bronze badges

Comments

Martin Wettstein · Accepted Answer · 2020-07-24 12:41:39Z

This function should do. It does not use pandas dataframes but it calls the colums by their keys and stores them in their keys, again. So, it should be completely compatible with pandas.

def splitcolumn(data,colname):
    ncol = 0
    for s in data[colname]:
        n = len(s.split(' '))
        if n>ncol:ncol=n   
    addcol = {}
    for i in range(ncol):
        addcol["{0}_{1}".format(colname,i)] = []
    for s in data[colname]:
        elements = s.split(' ')
        for i in range(ncol):
            try:
                addcol["{0}_{1}".format(colname,i)].append(elements[i])
            except:
                addcol["{0}_{1}".format(colname,i)].append(None)
    for a in addcol.keys():
        data[a] = addcol[a]
    return data
                
data = {'Col2': ['My Name is John','Hello friend','My favorite city is New Orleans']}

data = splitcolumn(data,'Col2')

Other than in your example, the column Col2 is not overwritten or removed, instead the new columns are added to the end of the data as Col2_0, Col2_1, etc.

Vignesh · Accepted Answer · 2020-07-24 12:51:14Z

0

Try this,

import pandas as pd 

data = [[38, 'My Name is John'], [11, 'Hello friend'], [114, 'My favorite city is New Orleans']] 
df = pd.DataFrame(data)
df1 = df.join(df[1].str.split(' ', expand=True).rename(columns={0:'A', 1:'B', 2:'C'}))
df1.drop([1], axis = 1, inplace = True) 
print(df1)

Output:

     0      A         B     C     3     4        5
0   38     My      Name    is  John  None     None
1   11  Hello    friend  None  None  None     None
2  114     My  favorite  city    is   New  Orleans

answered Jul 24, 2020 at 12:51

Vignesh

1,6311 gold badge12 silver badges26 bronze badges

Comments

sushanth · Accepted Answer · 2020-07-24 12:52:37Z

0

try this,

import pandas as pd

split_ = df.Col2.str.split(expand=True)
split_.columns = [f"Col{x}" for x in range(2, split_.columns.size + 2)]

print(
    pd.concat([df.Col1, split_], axis=1)
)

   Col1   Col2      Col3  Col4  Col5  Col6     Col7
0    38     My      Name    is  John  None     None
1    11  Hello    friend  None  None  None     None
2   134     My  favorite  city    is   New  Orleans

edited Jul 24, 2020 at 12:52

answered Jul 24, 2020 at 12:45

sushanth

8,2923 gold badges20 silver badges31 bronze badges

Collectives™ on Stack Overflow

Create new columns from splitting string column

6 Answers 6

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related