2

I have a script that takes multiple .csv files and outputs multiple bar plots. The data are daily rainfall totals and so the x-axis is the date in daytime format %d %m %Y. As is, the code tries to include all 365 days in the label but the x-axis gets clogged. What code can I use to only include one label per month in the format "Jan 01", for example.

import pandas as pd
import time
import os
import matplotlib.pyplot as plt

files = ['w.pod.csv',
't.pod.csv',
'r.pod.csv',
'n.pod.csv',
'm.pod.csv',
'k.pod.csv',
'j.pod.csv',
'h.pod.csv',
'g.pod.csv',
'c.pod.csv',
'b.pod.csv']

for f in files:
    fn = f.split('.')[0]
    dat = pd.read_csv(f)
    df0 = dat.loc[:, ['TimeStamp', 'RF']]
    # Change time format
    df0["time"] = pd.to_datetime(df0["TimeStamp"])
    df0["day"] = df0['time'].map(lambda x: x.day)
    df0["month"] = df0['time'].map(lambda x: x.month)
    df0["year"] = df0['time'].map(lambda x: x.year)
    df0.to_csv('{}_1.csv'.format(fn), na_rep="0")  # write to csv

    # Combine for daily rainfall
    df1 = pd.read_csv('{}_1.csv'.format(fn), encoding='latin-1',
              usecols=['day', 'month', 'year', 'RF', 'TimeStamp'])
    df2 = df1.groupby(['day', 'month', 'year'], as_index=False).sum()
    df2.to_csv('{}_2.csv'.format(fn), na_rep="0", header=None)  # write to csv

    # parse date
    df3 = pd.read_csv('{}_2.csv'.format(fn), header=None, index_col='datetime',
             parse_dates={'datetime': [1,2,3]},
             date_parser=lambda x: pd.datetime.strptime(x, '%d %m %Y'))

    def dt_parse(date_string):
        dt = pd.datetime.strptime(date_string, '%d %m %Y')
        return dt

    # sort datetime
    df4 = df3.sort()
    final = df4.reset_index()

    # rename columns
    final.columns = ['date', 'bleh', 'rf']

  [![enter image description here][1]][1]  final[['date','rf']].plot(kind='bar')
    plt.suptitle('{} Rainfall 2015-2016'.format(fn), fontsize=20)
    plt.xlabel('Date', fontsize=18)
    plt.ylabel('Rain / mm', fontsize=16)
    plt.savefig('{}.png'.format(fn))

This is an extension of my previous question: Automate making multiple plots in python using several .csv files

enter image description here

1 Answer 1

4

It is not easy, but this works:

#sample df with dates of one year, rf are random integers
np.random.seed(100)
N = 365
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=N)

final = pd.DataFrame({'date': rng, 'rf': np.random.randint(50, size=N)})  
print (final.head())
        date  rf
0 2015-02-24   8
1 2015-02-25  24
2 2015-02-26   3
3 2015-02-27  39
4 2015-02-28  23

fn = 'suptitle'
#rot - ratation of labels in axis x 
ax = final.plot(x='date', y='rf', kind='bar', rot='45')
plt.suptitle('{} Rainfall 2015-2016'.format(fn), fontsize=20)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Rain / mm', fontsize=16)
#set cusom format of dates
ticklabels = final.date.dt.strftime('%Y-%m-%d')
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))

#show only each 30th label, another are not visible
spacing = 30
visible = ax.xaxis.get_ticklabels()[::spacing]
for label in ax.xaxis.get_ticklabels():
    if label not in visible:
        label.set_visible(False)

plt.show()

graph

Sign up to request clarification or add additional context in comments.

2 Comments

It almost works! I just get the error message Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'ticker' is not defined after the line ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels)). Do I need to import a package for this? @jezrael
You need only import matplotlib.ticker as ticker first.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.