4

I am interested in plotting a time series with data from several different pandas data frames. I know how to plot a data for a single time series and I know how to do subplots, but how would I manage to plot from several different data frames in a single plot? I have my code below. Basically what I am doing is I am scanning through a folder of json files and parsing that json file into a panda so that I can plot. When I run this code it is only plotting from one of the pandas instead of the ten pandas created. I know that 10 pandas are created because I have a print statement to ensure they are all correct.

import sys, re
import numpy as np
import smtplib
import matplotlib.pyplot as plt
from random import randint
import csv
import pylab as pl
import math
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import argparse
import matplotlib.patches as mpatches
import os
import json



parser = argparse.ArgumentParser()
parser.add_argument('-file', '--f', help = 'folder where JSON files are stored')
if len(sys.argv) == 1:
    parser.print_help()
    sys.exit(1)
args = parser.parse_args()


dat = {}
i = 0

direc = args.f
directory = os.fsencode(direc)

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

for files in os.listdir(direc):
    filename = os.fsdecode(files)
    if filename.endswith(".json"):
        path = '/Users/Katie/Desktop/Work/' + args.f + "/" +filename
        with open(path, 'r') as data_file:
            data = json.load(data_file)
            for r in data["commits"]:
                dat[i] = (r["author_name"], r["num_deletions"], r["num_insertions"], r["num_lines_changed"],
                          r["num_files_changed"], r["author_date"])
                name = "df" + str(i).zfill(2)
                i = i + 1
                name = pd.DataFrame.from_dict(dat, orient='index').reset_index()
                name.columns = ["index", "author_name", "num_deletions",
                                          "num_insertions", "num_lines_changed",
                                          "num_files_changed",  "author_date"]
                del name['index']
                name['author_date'] = name['author_date'].astype(int)
                name['author_date'] =  pd.to_datetime(name['author_date'], unit='s')
                ax1.plot(name['author_date'], name['num_lines_changed'], '*',c=np.random.rand(3,))
                print(name)
                continue

    else:
        continue
plt.xticks(rotation='35')
plt.title('Number of Lines Changed vs. Author Date')
plt.show()

4 Answers 4

8

Quite straightforward actually. Don't let pandas confuse you. Underneath it every column is just a numpy array.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

ax1.plot(df1['A'])
ax1.plot(df2['B'])

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

could you see my edit above with my code? I have tried all methods listed and it only plots one of the panda dataframes
Kate, honestly your code above is not very "pythonic" and does not look optimised. Why do you need to use dataframe if you overwrite it every time in a cycle? It does not provide any benefit vs dict or array and is in fact slower. Alternatively you can save every "name" as a column in one large dataframe and then plot it. So, short answer - the issue is most likely with your loop and name assignments - you may think you refer to different "name" while in fact it is the last one, i.e. shallow vs deep copy.
I was finally able to get it. My loops where off, but now I have it and it is efficient for what I need to use this for
4

pd.DataFrame.plot method has an argument ax for this:

fig = plt.figure()
ax = plt.subplot(111)
df1['Col1'].plot(ax=ax)
df2['Col2'].plot(ax=ax)

1 Comment

could you see my edit above with my code? I have tried all methods listed and it only plots one of the panda dataframes
2

If you are using pandas plot, the return from datafame.plot is axes, so you can assign the next dataframe.plot equal to that axes.

df1 = pd.DataFrame({'Frame 1':pd.np.arange(5)*2},index=pd.np.arange(5))

df2 = pd.DataFrame({'Frame 2':pd.np.arange(5)*.5},index=pd.np.arange(5))

ax = df1.plot(label='df1')
df2.plot(ax=ax)

Output: enter image description here

Or if your dataframes have the same index, you can use pd.concat:

pd.concat([df1,df2],axis=1).plot()

6 Comments

could you see my edit above with my code? I have tried all methods listed and it only plots one of the panda dataframes
Is it only possible if both Dataframes have the same dimension?
@RogérioOliveira Nope, if the dataframes have different dimension this still works.
I didn't know about that @ScottBoston. Indeed, I'm a newcomer in pythonland. Currently, I have something like a JSON object where's filled up my data: df = [{'x': [], 'y':[]}, {'x': [], 'y':[]}, {'x': [], 'y':[]}] and finally, I'm can plot the object above one by one in the folowing way: DataFrame(df[0]).plot(x='x', y='y', kind='line') DataFrame(df[1]).plot(x='x', y='y', kind='line') and so on... It works fine when using subplots, but I've no idea about overlapping. Could you give me some tips? Where should I start to get the effect shown in your figure? Thanks a lot.
@RogérioOliveira Post a new question with this information sample data and expected outputs. That is a good start. I am sure the SO community will help.
|
0

Trust me. @omdv's answer is the only solution I have found so far. Pandas dataframe plot function doesn't show plotting at all when you pass ax to it.

df_hdf = pd.read_csv(f_hd, header=None,names=['degree', 'rank', 'hits'],
            dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
df_hdf_pt = pd.read_csv(pt_f_hd, header=None,names=['degree', 'rank', 'hits'],
            dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
ax = plt.subplot()
ax.plot(df_hdf_pt['hits'])
ax.plot(df_hdf['hits'])

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.