2

I want to slice a column in a dataframe (which contains only strings) based on the integers from a series. Here is an example:

data = pandas.DataFrame(['abc','scb','dvb'])
indices = pandas.Series([0,1,0])

Then apply some function so I get the following:

   0
0  a
1  c
2  d

2 Answers 2

1

You can use python to manipulate the lists beforehand.

l1 = ['abc','scb','dvb']
l2 = [0,1,0]
l3 = [l1[i][l2[i]] for i in range(len(l1))]

You get l3 as

['a', 'c', 'd']

Now converting it to DataFrame

data = pd.DataFrame(l3)

You get the desired dataframe

Sign up to request clarification or add additional context in comments.

3 Comments

this is an interesting idea. If you could implemet it using numpy - it could be pretty fast...
Not a numpy geek yet but let me try. Thanks for the reply:)
Thanks, this seems to be for the more generalizable solution. I say that because I have another case where I might want to get a range slice (i.e. I want multiple letters for each row in the final output data frame). I could not find a way to adapt the other solution from @MaxU
1

You can use the following vectorized approach:

In [191]: [tuple(x) for x in indices.reset_index().values]
Out[191]: [(0, 0), (1, 1), (2, 0)]

In [192]: data[0].str.extractall(r'(.)') \
                 .loc[[tuple(x) for x in indices.reset_index().values]]
Out[192]:
         0
  match
0 0      a
1 1      c
2 0      d

In [193]: data[0].str.extractall(r'(.)') \
                 .loc[[tuple(x) for x in indices.reset_index().values]] \
                 .reset_index(level=1, drop=True)
Out[193]:
   0
0  a
1  c
2  d

Explanation:

In [194]: data[0].str.extractall(r'(.)')
Out[194]:
         0
  match
0 0      a
  1      b
  2      c
1 0      s
  1      c
  2      b
2 0      d
  1      v
  2      b

In [195]: data[0].str.extractall(r'(.)').loc[ [ (0,0), (1,1) ] ]
Out[195]:
         0
  match
0 0      a
1 1      c

Numpy solution:

In [259]: a = np.array([list(x) for x in data.values.reshape(1, len(data))[0]])

In [260]: a
Out[260]:
array([['a', 'b', 'c'],
       ['s', 'c', 'b'],
       ['d', 'v', 'b']],
      dtype='<U1')

In [263]: pd.Series(a[np.arange(len(data)), indices])
Out[263]:
0    a
1    c
2    d
dtype: object

1 Comment

Thanks, runs pretty quick on the larger dataset I' am applying it to as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.