4

I need to plot two independent columns: the first one represents data, the second one represents time:

All_packets= df.select("ip_adr_src","asn_val","timestamp")
EB_packets=All_packets.filter("asn_val is not NULL")
EB_packets.show()
plotdf=EB_packets.select("asn_val","timestamp")

I want to plot ans_val group by ip_adr_src per time. If I have 6 ip_adr_src, I expect to have 6 curves.

+--------------------+---------------------------------+-------------+
|     ip_adr_src     |asn_val                          |    timestamp|
+--------------------+---------------------------------+-------------+
|14:15:92:cc:00:01...|                              707|1539071748441|
|14:15:92:cc:00:02...|                             1212|1539071752314|
|14:15:92:cc:00:00...|                             1616|1539071755578|
|14:15:92:cc:00:04...|                             1818|1539071757167|
|14:15:92:cc:00:03...|                             2020|1539071759297|
|14:15:92:cc:00:00...|                             2121|1539071760408|
|14:15:92:cc:00:09...|                             2323|1539071764035|
|14:15:92:cc:00:07...|                             2424|1539071765775|
|14:15:92:cc:00:00...|                             2525|1539071768560|
|14:15:92:cc:00:06...|                             5858|1539071845370|
|14:15:92:cc:00:00...|                             6060|1539071850129|
|14:15:92:cc:00:05...|                             6262|1539071855046|
|14:15:92:cc:00:00...|                             6969|1539071872523|
|14:15:92:cc:00:07...|                             6969|1539071872528|
|14:15:92:cc:00:08...|                             7171|1539071877609|

But, all my tests are wrong and I have this error:

Dataframe doesn't have an object `'plot'`

I would be very grateful if you could help me.

1 Answer 1

6

I'm not sure I understood which column you want to plot, but I suspect you need help on how to plot. This is how I would plot an ans_val column against a timestamp one:

import matplotlib.pyplot as plt

y_ans_val = [val.ans_val for val in df.select('ans_val').collect()]
x_ts = [val.timestamp for val in df.select('timestamp').collect()]

plt.plot(x_ts, y_ans_val)

plt.ylabel('ans_val')
plt.xlabel('timestamp')
plt.title('ASN values for time')
plt.legend(['asn_val'], loc='upper left')

plt.show()

If you need to plot other columns, call the plt.plot(x,y) command multiple time, and add each name in plt.legend(your_cols, loc='upper left') function.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.