0

I'm a data analyst who recently converted from R to Python. In R I can take input a matrix into a function that can generate a scatter plot. But, is that also possible in Python?

I visited other posts related to my question, but it appears they first created separate lists for each variable then produced a scatterplot from that. But, I'd like to keep my variables under a single data structure as I have below. But, I'm getting an error when the program executes the line with scatter function.

I would appreciate your input on this matter.

x = 0

data = []
for n in range(15):
    x = random.uniform(0, 10)
    b = random.uniform(2,5)
    m = random.uniform(.5,6)
    y = x*m + b
    data.append((round(x,2),round(y,2)))

mat = np.matrix(data)

matplotlib.pyplot.scatter(mat[:,0],mat[:,1])  
matplotlib.pyplot.show()
2
  • always add full error message. Commented Jan 27, 2016 at 5:55
  • Where do you define mat in before last line? Commented Jan 27, 2016 at 6:03

2 Answers 2

1

I believe you are missing a step - you need to use numpy arrays rather than a standard python list:

import numpy as np
#...
for n in range(15):
     #...
     data.append((round(x,2),round(y,2)))
mat = np.array(data)

You will need to install numpy first, if you haven't already.

Sign up to request clarification or add additional context in comments.

Comments

0

I think your problem is with the data structure you are using. In the example you are giving data is a list of tuples. I'm mentioning this because if you want to explore further into python for data analysis you should get comfortable with lists, tuples, sets, dictionaries, and numpy arrays.

For the example you have, scatter needs two "array like" objects, x and y coordinates. This means that it will read any iterable object you give, could be a list, a numpy array, or a tuple.

You have a list of tuples, so you have to create this "array like" objects by getting the first or the second object in each element of the list, with list comprenhension would look like

matplotlib.pyplot.scatter([l[0] for l in data],[l[1] for l in data])

Other way of doing this is to use the zip command in reverse. zip creates an iterator that aggregates elements from each of the iterables given. This means:

zip([x1,x2,x3,x4],[y1,y2,y3,y4])=[[x1,y1],[x2,y2],[x3,y3],[x4,y4]]

Which you can see is exactly the opposite of what you need, but the '*' operator in zip does just that, so for your example would be:

matplotlib.pyplot.scatter(zip(*data)[0],zip(*data)[1])

And as eknumbat said, you can use numpy.array which would give you more options

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.