2

I'm a numpy newbie, using numpy 1.10.2 and Python 2.7.6 on Linux. I have a file of 17M datetimes, like "2015-12-24 03:39:02.012". I want to plot the differences, d[n]-d[n-1], as a function of time.

What is the numpy-ish way to get a darray from this file, and then some matplotlib way to plot diff vs. datetime (doesn't matter if diff n-1 or n+1)?

I don't need a blinding speed hack; I'd rather learn the idiomatic numpy techniques.

Data looks like:

2015-12-24 03:39:02.009
2015-12-24 03:39:02.012
2015-12-24 03:39:02.015
2015-12-24 03:39:02.018
2015-12-24 03:39:02.021
2015-12-24 03:39:02.024
2015-12-24 03:39:02.027
2015-12-24 03:39:02.030
2015-12-24 03:39:02.033
2015-12-24 03:39:02.036
2015-12-24 03:39:02.039
2015-12-24 03:39:02.042
2015-12-24 03:39:02.045
2015-12-24 03:39:02.048
2015-12-24 03:39:02.051
2015-12-24 03:39:02.054
2015-12-24 03:39:02.057
2015-12-24 03:39:02.060
2015-12-24 03:39:02.063
2015-12-24 03:39:02.066

... 17M lines

So, to be clear, I want to plot something like

datetime64(2015-12-24 03:39:02.009), 3 # second datetime-first datetime
datetime64(2015-12-24 03:39:02.012), 3 # third datetime-second datetime
datetime64(2015-12-24 03:39:02.015), 3 # fourth datetime-third datetime

...

What I'm really looking for is spikes in the interval and what time the spikes happened.

3
  • can you use pandas? this is wonderfully easy in pandas Commented Dec 28, 2015 at 18:35
  • I have pandas installed, haven't tried it yet. Pandas solution would be welcome, though I also hope to learn some raw numpy. Commented Dec 28, 2015 at 18:42
  • Cool. Now post 10-20 lines of your file so people have something to test our. Commented Dec 28, 2015 at 18:43

1 Answer 1

1

Pandas can read the file in one line:

from matplotlib import pyplot as plt
import pandas as pd

df = pd.read_csv('data.txt', header=None, parse_dates=[0], names=['date'])

The result looks like this:

enter image description here

Calculate the difference

diff = df[1:] - df.shift()[1:]

Plot the result:

plt.plot(df[1:], diff.values)

enter image description here

You can convert the values into seconds:

seconds = diff.date.get_values().astype(float) / 1e9
Sign up to request clarification or add additional context in comments.

2 Comments

Yes, this works, thanks! Now I need to get the y values in seconds, like 0.00315, 0.00300, etc.
This is just one line. Added.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.