6

I am trying to add a new column with some values in my dataframe using pandas and have it repeat the same values until it reaches the end of the index:

I have tried:

df['Fruit Type']=['Bananas','Oranges','Strawberries']

it says:

ValueError: length of values does not match length of index

**My index is about 8000 rows long, so there is a mismatch between index and the number of new column values

I want the column to look like:

Fruit Type: Bananas Oranges Strawberries Bananas Oranges Strawberries Bananas Oranges Strawberries

I found a solution after a while:

df.insert(0, 'Fruit Type', ['Bananas', 'Oranges','Strawberries']*int(((len(df))/3)))

The 0 stands for column number, followed by column name, then column values. The *int...takes the index divided by 3 and repeats the values for that amount. Thanks to @acai for the multiplier at the end

2
  • Did you try to create the column manually? df['Fruit Type']= (['Bananas','Oranges','Strawberries'] * int(len(df) / 3) + 1 )[ : len(df)] Commented Jun 11, 2018 at 19:14
  • This actually brings up an error: TypeError: can only concatenate list (not "int") to list. Commented Jun 11, 2018 at 19:25

2 Answers 2

13

Method 1:

Let's say your dataframe were 10 elements long (and you want to repeat your list of 3 fruits).

>>> df
  column_a
0        a
1        b
2        c
3        d
4        f
5        e
6        x
7        s
8        n
9        i

Using itertools.cycle, you can turn your list into an iterator and cycle through it until the end of the dataframe:

from itertools import cycle

fruits = cycle(['Bananas','Oranges','Strawberries'])
df['Fruit_Type'] = [next(fruits) for fruit in range(len(df))]

>>> df
  column_a    Fruit_Type
0        a       Bananas
1        b       Oranges
2        c  Strawberries
3        d       Bananas
4        f       Oranges
5        e  Strawberries
6        x       Bananas
7        s       Oranges
8        n  Strawberries
9        i       Bananas

Method 2

Here is an ugly hack that you can use as an alternative:

You can use pandas.np.tile (which is a wrapper for numpy.tile) to repeat your list however many times is necessary (using the // operator), and then just add the list up to the nth element necessary to fill the dataframe:

fruits = ['Bananas','Oranges','Strawberries']

df['Fruit Type']= pd.np.tile(fruits, len(df) // len(fruits)).tolist() + fruits[:len(df)%len(fruits)]

>>> df
  column_a    Fruit Type
0        a       Bananas
1        b       Oranges
2        c  Strawberries
3        d       Bananas
4        f       Oranges
5        e  Strawberries
6        x       Bananas
7        s       Oranges
8        n  Strawberries
9        i       Bananas
Sign up to request clarification or add additional context in comments.

2 Comments

I tried method 1 and it works well. This might be a better solution than the one I mentioned above if you want to use the list throughout python modules. Thanks @sacul
In method 1, it lacks a parenthesis to close range(). Can't edit.
2

You need to repeat the list until the integer fraction allows you to repeat itself. After that the difference of the series that you just had and the length of the dataframe would be the number of elements you need to add to series from the list that you want to repeat.

Consider below example where there are 10 data points in the df.

df = pd.DataFrame({
    'col':range(0,10)
})
list_ = ['Bananas','Oranges','Strawberries']
ser = list_ * int(len(df)/len(list_))
df['Fruit Type'] = ser + list_[:len(df)-len(ser)]

Output:

    col fruit_type
0   0   Bananas
1   1   Oranges
2   2   Strawberries
3   3   Bananas
4   4   Oranges
5   5   Strawberries
6   6   Bananas
7   7   Oranges
8   8   Strawberries
9   9   Bananas

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.