1

hope all is well...I'm making a dataset feed into sklearn algorithms for categorization and couldn't find any easy datasets to start out with so making my own. got a problem, though...

import numpy as np
import random

type_1 = [random.randrange(0, 30, 1) for i in range(50)]
type_1_label = [1 for i in range(50)]

type_2 = [random.randrange(31, 75, 1) for i in range(50)]
type_2_label = [-1 for i in range(50)]

zipped_1 = zip(type_1, type_1_label)
zipped_2 = zip(type_2, type_2_label)

ready = np.array(zipped_1)
print(ready[1])

the problem here is that when I zip type one label with type one, the output is an array, of arrays with two indexes, as is expected, and then I need to feed it into a numpy array which returns IndexError: too many indices for array which does not make sense to me; as surely numpy can read a 2x2 array for its N-dimensional array functions? any help would be appreciated!

6
  • 1
    What's wrong with the available ones here:scikit-learn.org/stable/auto_examples/…? Commented May 20, 2016 at 7:57
  • Please use this tool to help you write a clearer question. Currently your indentation is a complete mess and we could use a full traceback. Also when I run this code (after fixing the indentation) I get no error. Commented May 20, 2016 at 7:59
  • Try to explore the awesome dataset repository: github.com/caesar0301/awesome-public-datasets and you can also create an account on kaggle.com. As @EdChum said you have already a lot of examples embedded with scikit-learn, don't hesitate to look over them. Commented May 20, 2016 at 8:05
  • Is it possible that you use print(ready[1]) because you are using Python 3? – Commented May 20, 2016 at 8:36
  • yes man I am! trying to make the switch now after a clean reinstall of mac osx for other reasons haha, the shift is difficult Commented May 20, 2016 at 9:39

3 Answers 3

1

You can directly create the NumPy arrays you want as a result:

ready1 = np.random.randint(0, 30, size=(50, 2))
ready1[:, 1] = 1

ready2 = np.random.randint(31, 71, size=(50, 2))
ready2[:, 1] = -1
Sign up to request clarification or add additional context in comments.

1 Comment

Im going to start doing this, its much easier thank you g
0

TL;DR                 zipped = list(zip(type_1, type_1_label))                      


Are you using Python 3? in Python 2 zip() returns a list but in Python 3 it returns a zip object, and this makes all the difference when you try to put it into a ndarray...

In [45]: l1 = [1 for i in range(10)]

In [46]: t1 = [randrange(30) for i in range(10)]

In [47]: z1 = zip(t1,l1)

In [48]: z1
Out[48]: <zip at 0x7f3b88044688>

In [49]: a = np.array(z1) ; a
Out[49]: array(<zip object at 0x7f3b88044688>, dtype=object)

as you can see, the content of a is a single object, with no dimensionality.

What can you do to access the inside object? You can add an additional axis, and then index as usual

In [50]: a[None][0]
Out[50]: <zip at 0x7f3b88044688>

In [51]: for t in a[None][0]: print (t)
(6, 1)
(18, 1)
(14, 1)
(27, 1)
(14, 1)
(15, 1)
(10, 1)
(18, 1)
(5, 1)
(9, 1)

This is interesting, I hear you saying... but how can I have the old behaviour, when zip returned a list and numpy was happy with it?

With Python 3 you have to explicitly convert to a list,

In [52]: z1 = list(zip(t1,l1))

In [53]: a = np.array(z1) ; a
Out[53]: 
array([[ 6,  1],
       [18,  1],
       [14,  1],
       [27,  1],
       [14,  1],
       [15,  1],
       [10,  1],
       [18,  1],
       [ 5,  1],
       [ 9,  1]])

and then all it works as usual.

3 Comments

all of the comments gave me a new insight but this was a good explanation, and is making my transition from 2.7 to 3 much easier, kudos to you
What I dont understand however, is what the additional axis of a[None][0] does, surely the data should, be in a[1:50]? even if it returns the object instead of ech data point i should be able to access it with for t in ready[0]: print t right?
@entercaspa A good reply to your legitimate curiosities would exceed the length of a comment — in short, the additional axis lets you have an axis to index (NB you can access an array content only by indexing) and as it happens, creating a ndarray containing a single object and not a sequence of objects doesn't create axes along which to index the content. This is different from your expectations, isn't it? but so it is... re your last point, I don't understand what is your point... I'd say that it depends on how you instantiated ready but I understand that it's not a satisfactory reply.
0

I don't know about your python version and other environment details, but I am guessing that's where the problem is. Your code worked fine for me -

import numpy as np
import random
type_1 = [random.randrange(0, 30, 1) for i in range(50)]
type_1_label = [1 for i in range(50)]
type_2 = [random.randrange(31, 75, 1) for i in range(50)]
type_2_label = [-1 for i in range(50)]
zipped = zip(type_1, type_1_label)
zipped_2 = zip(type_2, type_2_label)
ready = np.array(zipped)
print(ready[1])

Outputted this...

[14  1]

I have Python 2.7 Anaconda distribution

1 Comment

yeah man im on 3.5 or something now, on 2.7 printing it was fine haha

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.