89

I want to create a new column in Pandas using a string sliced for another column in the dataframe.

For example.

Sample  Value  New_sample
AAB     23     A
BAB     25     B

Where New_sample is a new column formed from a simple [:1] slice of Sample

I've tried a number of things to no avail - I feel I'm missing something simple.

What's the most efficient way of doing this?

4 Answers 4

149

You can call the str method and apply a slice, this will be much quicker than the other method as this is vectorised (thanks @unutbu):

df['New_Sample'] = df.Sample.str[:1]

You can also call a lambda function on the df but this will be slower on larger dataframes:

In [187]:

df['New_Sample'] = df.Sample.apply(lambda x: x[:1])
df
Out[187]:
  Sample  Value New_Sample
0    AAB     23          A
1    BAB     25          B
Sign up to request clarification or add additional context in comments.

Comments

24

Adding solution to a common variation when the slice width varies across DataFrame Rows:

#--Here i am extracting the ID part from the Email (i.e. the part before @)

#--First finding the position of @ in Email
d['pos'] = d['Email'].str.find('@')

#--Using position to slice Email using a lambda function
d['new_var'] = d.apply(lambda x: x['Email'][0:x['pos']],axis=1)

#--Imagine x['Email'] as a string on which, slicing is applied

Hope this Helps !

2 Comments

Thanks for adding this common variation solution, just what I was looking for! And to combine into a single line: d['new_var'] = d.apply(lambda x: x['Email'][0:x['Email'].find('@')],axis=1)
You could also do d["new_var"] = np.vectorize(lambda x : x.split("@")[0])(np.array(d["Email"],dtype=str)), which would spare you an extra column.
18

You can also use slice() to slice string of Series as following:

df['New_sample'] = df['Sample'].str.slice(0,1)

From pandas documentation:

Series.str.slice(start=None, stop=None, step=None)

Slice substrings from each element in the Series/Index

For slicing index (if index is of type string), you can try:

df.index = df.index.str.slice(0,1)

1 Comment

is there any preference between df.somecolumn.str[0:1] and df.somecolumn.str.slice(0,1)?
0

Adding a solution for when you want to take the second element from your pandas dataframe index, which is a tuple, and move it into its own column. Not sure if there is a shorter way to do this but this way works:

df["newcol"]=df.index
df["newcol"]=df["newcol"].apply(lambda x: x[1])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.