0

i'm new in the world of data mining. I'm trying to calculate the correlation between 16 variables in a dataset of about 500 rows. I have to do this with pandas. But i have a problem also with the reading of a csv file (i'm on mac i don't know if it is the problem)! This is the code I used:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('https://www.dropbox.com/s/2ps64ditghqj4xv/industrial_project.csv?dl=0', index_col=0)
corr = data.corr()
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(corr,cmap='coolwarm', vmin=-1, vmax=1)
fig.colorbar(cax)
ticks = np.arange(0,len(data.columns),1)
ax.set_xticks(ticks)
plt.xticks(rotation=90)
ax.set_yticks(ticks)
ax.set_xticklabels(data.columns)
ax.set_yticklabels(data.columns)
plt.show()

And the error is:

Traceback (most recent call last):
  File "/Users/myname/eclipse2-workspace/Prova/ciao.py", line 4, in <module>
    data = pd.read_csv('https://www.dropbox.com/s/2ps64ditghqj4xv/industrial_project.csv?dl=0', index_col=0)
  File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "/Users/myname/Library/Python/2.7/lib/python/site-packages/pandas/io/parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2

I have tried in a lot of ways but i can't do this!

1
  • 1
    Choose the header for your question carefully. You are having trouble to read csv, it has nothing to do with correlation. Commented Sep 12, 2018 at 10:49

1 Answer 1

2

What you are trying to download is not a csv file, but an html page that displays a table with the information extracted from the csv file. Tou have to use the link that is created when you click su Download on the top right, and pass that one to .read_csv(). It should look like this:

url = 'https://UGLYUGLYTHINGS.dl.dropboxusercontent.com/cd/0/get/MOREUGLYTHINGSHERE/file?_download_id=ENCODED_ID_OF_THE_FILE&_notify_domain=www.dropbox.com&dl=1'

The parts of the string above written in uppercase letters correspond to whatever dropbox does backend.
Also, don't forget to give as a sep parameter to .read_csv() the char ';', as follows:

data = pd.read_csv(url,sep=';')

If you use the correct url, the rest of the code works.

Also, as mentioned in the comment above, please change the header/title of your question, because it may mislead someone. The issue lies in reading a remote file, rather than computing the correlation.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.