0

I am trying to find the correlation of all the columns in this dataset excluding qualityand then plot the frequency distribution of wine quality.

I am doing it the following way, but how do I remove quality?

import pandas as pd
df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv', sep=';')
df.corr()

It returns this output:

enter image description here

How can I graph the frequency distribution of wine quality with pandas?

I previously used R for correlation and it worked fine for me but on this dataset I am learning use of pandas and python:

winecor = cor(wine[-12])
hist(wine$quality)

So in R I am getting the following output and I am looking for same in Python.

enter image description here

enter image description here

4
  • Could you please reword what you are trying to achieve? Could you also post a sample picture of the histogram you want to get? Commented Jun 6, 2021 at 17:19
  • It is not clear where the numbers 3 through 9 come from. You do not have them anywhere in your dataset. Commented Jun 6, 2021 at 17:32
  • it is in the quality column, archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/… Commented Jun 6, 2021 at 17:35
  • Image above in the question is after applying corr Commented Jun 6, 2021 at 17:36

2 Answers 2

2

1. Histogram

# Import plotting library
import matplotlib.pyplot as plt

### Option 1 - histogram
plt.hist(df['quality'], bins=range(3, 10))
plt.show()

### Option 2 - bar plot (looks nicer)
# Get frequency per quality group
x = df.groupby('quality').size()
# Plot
plt.bar(x.index, x.values)
plt.show()

2. Correlation matrix

In order to get the correlation matrix of features, excluding quality:

# Option 1 - very similar to R
df.iloc[:, :-1].corr()

# Option 2 - more Pythonic
df.drop('quality', axis=1).corr()
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, I tried to clear my question more, I edit it, my major focus is on 2 parts, 1. What is the correlation between the attributes other than Quality? and Graph the frequency distribution of wine quality by using Quality.
Awesome. Thanks for clearing up the question. I edited my answer. Did that help?
1

You can plot histograms with:

import matplotlib.pyplot as plt 

plt.hist(x=df['quality'], bins=30)
plt.show()

Read the docs of plt.hist() in order to understand better all the attributes

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.