Plotting data from multiple pandas data frames in one plot

Question

I am interested in plotting a time series with data from several different pandas data frames. I know how to plot a data for a single time series and I know how to do subplots, but how would I manage to plot from several different data frames in a single plot? I have my code below. Basically what I am doing is I am scanning through a folder of json files and parsing that json file into a panda so that I can plot. When I run this code it is only plotting from one of the pandas instead of the ten pandas created. I know that 10 pandas are created because I have a print statement to ensure they are all correct.

import sys, re
import numpy as np
import smtplib
import matplotlib.pyplot as plt
from random import randint
import csv
import pylab as pl
import math
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import argparse
import matplotlib.patches as mpatches
import os
import json



parser = argparse.ArgumentParser()
parser.add_argument('-file', '--f', help = 'folder where JSON files are stored')
if len(sys.argv) == 1:
    parser.print_help()
    sys.exit(1)
args = parser.parse_args()


dat = {}
i = 0

direc = args.f
directory = os.fsencode(direc)

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

for files in os.listdir(direc):
    filename = os.fsdecode(files)
    if filename.endswith(".json"):
        path = '/Users/Katie/Desktop/Work/' + args.f + "/" +filename
        with open(path, 'r') as data_file:
            data = json.load(data_file)
            for r in data["commits"]:
                dat[i] = (r["author_name"], r["num_deletions"], r["num_insertions"], r["num_lines_changed"],
                          r["num_files_changed"], r["author_date"])
                name = "df" + str(i).zfill(2)
                i = i + 1
                name = pd.DataFrame.from_dict(dat, orient='index').reset_index()
                name.columns = ["index", "author_name", "num_deletions",
                                          "num_insertions", "num_lines_changed",
                                          "num_files_changed",  "author_date"]
                del name['index']
                name['author_date'] = name['author_date'].astype(int)
                name['author_date'] =  pd.to_datetime(name['author_date'], unit='s')
                ax1.plot(name['author_date'], name['num_lines_changed'], '*',c=np.random.rand(3,))
                print(name)
                continue

    else:
        continue
plt.xticks(rotation='35')
plt.title('Number of Lines Changed vs. Author Date')
plt.show()

Oleg Medvedyev · Accepted Answer · 2017-06-23 20:38:17Z

8

Quite straightforward actually. Don't let pandas confuse you. Underneath it every column is just a numpy array.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

ax1.plot(df1['A'])
ax1.plot(df2['B'])

answered Jun 23, 2017 at 20:38

Oleg Medvedyev

1,60415 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

K22 Over a year ago

could you see my edit above with my code? I have tried all methods listed and it only plots one of the panda dataframes

Oleg Medvedyev Over a year ago

Kate, honestly your code above is not very "pythonic" and does not look optimised. Why do you need to use dataframe if you overwrite it every time in a cycle? It does not provide any benefit vs dict or array and is in fact slower. Alternatively you can save every "name" as a column in one large dataframe and then plot it. So, short answer - the issue is most likely with your loop and name assignments - you may think you refer to different "name" while in fact it is the last one, i.e. shallow vs deep copy.

K22 Over a year ago

I was finally able to get it. My loops where off, but now I have it and it is efficient for what I need to use this for

Sergey Sergienko · Accepted Answer · 2017-06-23 20:40:29Z

4

pd.DataFrame.plot method has an argument ax for this:

fig = plt.figure()
ax = plt.subplot(111)
df1['Col1'].plot(ax=ax)
df2['Col2'].plot(ax=ax)

answered Jun 23, 2017 at 20:40

Sergey Sergienko

3654 silver badges8 bronze badges

1 Comment

K22 Over a year ago

could you see my edit above with my code? I have tried all methods listed and it only plots one of the panda dataframes

Scott Boston · Accepted Answer · 2017-06-23 20:48:13Z

2

If you are using pandas plot, the return from datafame.plot is axes, so you can assign the next dataframe.plot equal to that axes.

df1 = pd.DataFrame({'Frame 1':pd.np.arange(5)*2},index=pd.np.arange(5))

df2 = pd.DataFrame({'Frame 2':pd.np.arange(5)*.5},index=pd.np.arange(5))

ax = df1.plot(label='df1')
df2.plot(ax=ax)

Output:

Or if your dataframes have the same index, you can use pd.concat:

pd.concat([df1,df2],axis=1).plot()

answered Jun 23, 2017 at 20:48

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

6 Comments

K22 Over a year ago

could you see my edit above with my code? I have tried all methods listed and it only plots one of the panda dataframes

Rogério Oliveira Over a year ago

Is it only possible if both Dataframes have the same dimension?

Scott Boston Over a year ago

@RogérioOliveira Nope, if the dataframes have different dimension this still works.

Rogério Oliveira Over a year ago

I didn't know about that @ScottBoston. Indeed, I'm a newcomer in pythonland. Currently, I have something like a JSON object where's filled up my data: df = [{'x': [], 'y':[]}, {'x': [], 'y':[]}, {'x': [], 'y':[]}] and finally, I'm can plot the object above one by one in the folowing way: DataFrame(df[0]).plot(x='x', y='y', kind='line') DataFrame(df[1]).plot(x='x', y='y', kind='line') and so on... It works fine when using subplots, but I've no idea about overlapping. Could you give me some tips? Where should I start to get the effect shown in your figure? Thanks a lot.

Scott Boston Over a year ago

@RogérioOliveira Post a new question with this information sample data and expected outputs. That is a good start. I am sure the SO community will help.

|

Catarina Ferreira · Accepted Answer · 2018-12-03 10:18:25Z

0

Trust me. @omdv's answer is the only solution I have found so far. Pandas dataframe plot function doesn't show plotting at all when you pass ax to it.

df_hdf = pd.read_csv(f_hd, header=None,names=['degree', 'rank', 'hits'],
            dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
df_hdf_pt = pd.read_csv(pt_f_hd, header=None,names=['degree', 'rank', 'hits'],
            dtype={'degree': np.int32, 'rank': np.float32, 'hits': np.float32})
ax = plt.subplot()
ax.plot(df_hdf_pt['hits'])
ax.plot(df_hdf['hits'])

edited Dec 3, 2018 at 10:18

Catarina Ferreira

1,8645 gold badges19 silver badges27 bronze badges

answered Dec 3, 2018 at 0:26

idleCoder

1

Collectives™ on Stack Overflow

Plotting data from multiple pandas data frames in one plot

4 Answers 4

3 Comments

1 Comment

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related