0

I am a python newbie suffering from how to import CSV file in matplotlib.pyplot I would like to see the relationship between hour (=how many hours people spent to play a video game) and level (=game level). and then I would like to draw a scatter plot with Tax in different colors between female(1) and male(0).So, my x would be 'hour' and my y would be 'level'.

my data csv file looks like this:

          hour gender level
0            8    1   20.00
1            9    1   24.95
2           12    0   10.67
3           12    0   18.00
4           12    0   17.50
5           13    0   13.07
6           10    0   14.45
...
...
499         12    1  19.47
500         16    0  13.28

Here's my code:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df=pd.read_csv('data.csv')
plt.plot(x,y, lavel='some relationship')
plt.title("Some relationship")
plt.xlabel('hour')
plt.ylabel('level')
plt.plot[gender(gender=1), '-b', label=female]
plt.plot[gender(gender=0), 'gD', label=male]
plt.axs()
plt.show()

I would like to draw the following graph. So, there will be two lines of male and female.

y=level|           @----->male
       | @
       | *         *----->female
       |________________ x=hour

However, I am not sure how to solve this problem. I kept getting an error NameError: name 'hour' is not defined.

4
  • 1
    you probably forgot commas around hour, so Python is looking for a variable named hour instead of interpreting it as a string Commented Dec 4, 2017 at 11:39
  • 1
    You need to mind the syntax of python. [ is different from (. Commented Dec 4, 2017 at 11:42
  • @maxou Thank you, I am not sure where to type commas Commented Dec 5, 2017 at 9:49
  • @ImportanceOfBeingErnest Yes! Good thing to know. Thank you. Commented Dec 5, 2017 at 9:49

1 Answer 1

2

Could do it in this way:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.DataFrame(data={"hour": [8,9,12,12,12,13,10], 
                        "gender": [1,1,0,0,0,0,0],
                        "level": [20, 24.95, 10.67, 18, 17.5, 13.07, 14.45]})

df.sort_values("hour", ascending=True, inplace=True)

fig = plt.figure(dpi=80)
ax = fig.add_subplot(111, aspect='equal')

ax.plot(df.hour[df.gender==1], df.level[df.gender==1], c="red", label="male")
ax.plot(df.hour[df.gender==0], df.level[df.gender==0], c="blue", label="female")
plt.xlabel('hour')
plt.ylabel('level')
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks, but the number of each variable would be 500 not just 7. I would like to load the variable from my csv file.
@user What is the problem with loading it and then doing all the steps in my answer? You just have to replace df = pd.DataFrame(data... with df=pd.read_csv('data.csv')
Yes, I did it, but the error message said "NameError: name 'hour' is not defined."
Traceback (most recent call last): ax.plot(df.hour[df.gender==1], df.level[df.gender==1], c="red", label="male") in getattr return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'hour'
as you said, I just replaced df=pd.DataFrame(data...) with df=pd.read_csv("data.csv")
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.