Issue accessing a URL with error "urlopen error Tunnel connection failed" using pandas and matplotlib

Question

the data source/file location on the web is: https://www.newyorkfed.org/medialibrary/media/survey/empire/data/esms_seasonallyadjusted_diffusion.csv However as there were issues connecting I saved it ('esms_seasonallyadjusted_diffusion.csv') locally which is the best to do for speed anyway and I also saved it to github: https://github.com/me50/hlar65/blob/master/ESMS_SeasonallyAdjusted_Diffusion.csv')

2 questionsw:

When trying to access the weblink (even though clicking on it downloads the file) I get a connection error: "URLError: <urlopen error Tunnel connection failed: 403 Forbidden>"
my code (I am a beginner!) looks clunky. Is there a cleaner better way to express?

Thanks everyone for your help '''

import pandas as pd
import numpy as np
from pandas.plotting import scatter_matrix
import scipy as sp
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sn


df = dd.read_csv('https://www.newyorkfed.org/medialibrary/media/survey/empire/data/esms_seasonallyadjusted_diffusion.csv')
 
df = df.rename(columns={'surveyDate':'Date',
                        'GACDISA': 'IndexAll', 
                        'NECDISA': 'NumberofEmployees',
                        'NOCDISA': 'NewOrders',
                        'PPCDISA': 'PricesPaid',
                        'PRCDISA': 'PricesReceived'})
headers = df.columns

df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

IndexAll = df['IndexAll']
NumberofEmployees = df['NumberofEmployees']
NewOrders = df['NewOrders']
PricesReceived = df['PricesReceived']

data = df[['IndexAll', 'NumberofEmployees', 'NewOrders', 'PricesReceived']]
data2 = data.copy()

ds = data2
FS_A = 14
FS_L = 16
FS_T = 20
FS_MT = 25

fig, ((ax0, ax1), (ax2,ax3)) = plt.subplots(nrows=2, ncols=2, figsize=(20,15))

# density=True : probability density i.e. prb of an outcome; False = actual # of frequency

ds['IndexAll'].plot(ax=ax0, color='red')
ax0.set_title('New York Empire Manufacturing Index', fontsize = FS_T)
ax0.set_ylabel('Date', fontsize = FS_A) 
ax0.set_xlabel('Empire Index', fontsize = FS_L) 
ax0.tick_params(labelsize=FS_A)

ds['NumberofEmployees'].plot(ax=ax1, color='blue')
ax1.set_title('Empire: Number of Employees', fontsize = FS_T)
ax1.set_ylabel('Date', fontsize = FS_L) 
ax1.set_xlabel('Number of Employees', fontsize = FS_L) 
ax1.tick_params(labelsize=FS_A)

ds['NewOrders'].plot(ax=ax2, color='green')
ax2.set_title('Empire: New Orders', fontsize = FS_T)
ax2.set_ylabel('Date', fontsize = FS_L) 
ax2.set_xlabel('New Orders', fontsize = FS_L) 
ax2.tick_params(labelsize=FS_A)

ds['PricesReceived'].plot(ax=ax3, color='black')
ax3.set_title('Empire: Prices Received', fontsize = FS_T)
ax3.set_ylabel('Date', fontsize = FS_L) 
ax3.set_xlabel('Prices Received', fontsize = FS_L) 
ax3.tick_params(labelsize=FS_A)

fig.tight_layout()
fig.suptitle('New York Manufacturing Index Main Components - Showing the Depths of COVD19 in 2020', fontsize = FS_MT)
fig.tight_layout()
fig.subplots_adjust(top=0.88)
fig.subplots_adjust(bottom = -0.2) 
fig.savefig("Empire.png")
plt.show()

'''

The first question is not reproducible df = pd.read_csv('https://www.newyorkfed.org/medialibrary/media/survey/empire/data/esms_seasonallyadjusted_diffusion.csv') works without issue to access the csv file. The second question is a duplicate of your previous question. SO questions should focus on one question, not multiple questions. Also, do not repeatedly ask the same question. How to ask a good question — Trenton McKinney
– Trenton McKinney, Commented Oct 12, 2020 at 21:13
Does this answer your question? How to plot columns from a dataframe, as subplots — Trenton McKinney
– Trenton McKinney, Commented Oct 12, 2020 at 21:13

Mehdi Golzadeh · Accepted Answer · 2020-10-11 23:52:39Z

For the first question have you set any proxy? I think it comes from the proxy setting.

About the second one I can do some cleanup but it is very dependent on the developer code style. you can write a script in several different ways.

note that:

You can parse date in read_csv call

Do all the stuff you want to do with df in one step using parentheses

You can define arrays for your parameters and draw all plots in a for loop

df = (
    pd
    .read_csv('https://www.newyorkfed.org/medialibrary/media/survey/empire/data/esms_seasonallyadjusted_diffusion.csv',
             parse_dates=['surveyDate'])
    .rename(columns={'surveyDate':'Date',
                        'GACDISA': 'IndexAll', 
                        'NECDISA': 'NumberofEmployees',
                        'NOCDISA': 'NewOrders',
                        'PPCDISA': 'PricesPaid',
                        'PRCDISA': 'PricesReceived'})
    .set_index('Date')
)

FS_A = 14
FS_L = 16
FS_T = 20
FS_MT = 25

titles = ['New York Empire Manufacturing Index','Empire: Number of Employees','Empire: New Orders','Empire: Prices Received']
xlabels = ['Empire Index','Number of Employees','New Orders','Prices Received']
colors=['red','blue','green','black']
columns = ['IndexAll', 'NumberofEmployees', 'NewOrders', 'PricesReceived']
ds = df[columns]
k=0
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(20,15))
for i in range(2):
    for j in range(2):
        ds[columns[k]].plot(ax=axes[i][j], color=colors[k])
        axes[i][j].set_title(titles[k], fontsize = FS_T)
        axes[i][j].set_ylabel('Date', fontsize = FS_A) 
        axes[i][j].set_xlabel(xlabels[k], fontsize = FS_L) 
        axes[i][j].tick_params(labelsize=FS_A)
        k+=1

Collectives™ on Stack Overflow

Issue accessing a URL with error "urlopen error Tunnel connection failed" using pandas and matplotlib

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related