2

So I have this code based on a simple data array that looks like this:

    5020 : 2015 7 11 11 42 54 782705
    5020 : 2015 7 11 11 44 55 575776
    5020 : 2015 7 11 11 46 56 560755
    5020 : 2015 7 11 11 48 57 104872

and the plot looks like the following:

    import scipy as sp
    import matplotlib.pyplot as plt
    data = sp.genfromtxt("E:/Python/data.txt", delimiter=" : ")
    x = data[:,0]
    y = data[:,1]
    plt.scatter(x,y)
    plt.title("Instagram")
    plt.xlabel("Time")
    plt.ylabel("Followers")
    plt.xticks([w*2*60 for w in range(10)],
    ['2-minute interval %i'%w for w in range(10)])
    plt.autoscale(tight=True)
    plt.grid()
    plt.show()

I'm looking for a simple way to use the datetime output as x intervals on the graph, I can't figure out a way to make it understand it and there's this:

    In [15]:sp.sum(sp.isnan(y))
    Out[15]: 77

Which I guess is because of the spaces? I'm new to machine learning in Python, forgive my ignorance.

Thank you very much.

2 Answers 2

1

I would solve this by directly passing datetime.datetime objects to pyplot. Here is a short example:

import datetime as dt
import matplotlib.pyplot as plt
import matplotlib

# Note: please figure out yourself the data input
x     = [dt.datetime(2015,7,11,11,42,54),
     dt.datetime(2015,7,11,11,44,56),
     dt.datetime(2015,7,11,11,46,56),
     dt.datetime(2015,7,11,11,48,57)]

#define the x limit:
xstart= dt.datetime(2015,7,11,11,40,54)
xstop = dt.datetime(2015,7,11,11,50,54)


y     = [782705, 575776, 560755, 104872]

fig,ax= plt.subplots()
ax.scatter(x,y)
xfmt = matplotlib.dates.DateFormatter('%D %H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
ax.set_title("Instagram")
ax.set_xlabel("Time")
ax.set_ylabel("Followers")
ax.set_xlim(xstart,xstop)
plt.xticks(rotation='vertical')
plt.show()

Result: enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

Yes it's because of the spaces. When you're importing the data it's assigning NaN to your x values.

Try this, it's a little longer but should work:

data = []
x=[]
y=[]

with open('data.txt', 'r') as f:
    for line in f:
        data.append(line.split(':'))

for i in data:
y.append(i[0])
x_old.append(i[1])

for t in x_old:
    x.append(float(t[17:19]+'.'+t[20:])/60+int(t[14:16]))

Because of the spaces I had to convert the data into float manually. I divided the seconds+milliseconds by 60 then added to minutes since I'm assuming you're only interested in that (2 min interval).

If the format is done better you can use datetime and extract the information better. For example:

my_time = datetime.strptime('2015 7 11 11 42 54.782705', '&Y &m %d %H:%M:%S.%f')

2 Comments

I tried this and got: ValueError: invalid literal for float(): 2015 7 11 11 42 54 782705 EDIT: it's because of a non printable char '\n'
Ok then, I edited my post thinking it was another conversion factor. Glad it worked.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.