Projects

Wine Quality Prediction SkillBridges

Introduction:

Welcome to our Wine Quality Prediction project! We’re using numbers and smart computer tricks to figure out what makes wines taste good. Imagine a puzzle where we learn from lots of wines to guess how yummy a new one might be. This helps both wine lovers and makers understand what makes a tasty sip. So, let’s explore together how data and machines team up to predict the flavors of fantastic wines.

Importing Modules

  • numpy and pandas are used for data manipulation and analysis.
  • matplotlib.pyplot is for data visualization.
  • seaborn provides enhanced data visualization capabilities.
  • train_test_split from sklearn.model_selection is used to split the dataset into training and testing subsets.
  • MinMaxScaler from sklearn.preprocessing is employed to scale the feature values to a specific range.
  • metrics from sklearn contains various metrics to evaluate model performance.
  • SVC from sklearn.svm is the Support Vector Classifier.
  • XGBClassifier from xgboost is the XGBoost classifier.
  • LogisticRegression from sklearn.linear_model is for logistic regression.
  • warnings is used to suppress any warning messages.
  • warnings.filterwarnings('ignore') is used to suppress warning messages that may be displayed during execution.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics
from sklearn.svm import SVC
from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression

import warnings
warnings.filterwarnings('ignore')

Importing DataSet

df = pd.read_csv('winequality.csv')
print(df.head())

Understanding Your Data: Exploring Dataset Information

df.info()

Unveiling Data Insights: Analyzing Statistical Summary

df.describe().T

Spotting Data Gaps: Investigating Missing Values

df.isnull().sum()

Filling the Gaps: Imputing Missing Data Using Column Means in Python

for col in df.columns:
if df[col].isnull().sum() > 0:
	df[col] = df[col].fillna(df[col].mean())

df.isnull().sum().sum()

Visualizing Wine Data Distribution: Exploring Histograms of Key Features

df.hist(bins=20, figsize=(10, 10))
plt.show()

Quality vs. Alcohol Content: Analyzing Wine Quality with a Bar Chart

plt.bar(df['quality'], df['alcohol'])
plt.xlabel('quality')
plt.ylabel('alcohol')
plt.show()

Uncovering Relationships: Exploring Correlations in Wine Data with a Heatmap

plt.figure(figsize=(12, 12))
sb.heatmap(df.corr() > 0.7, annot=True, cbar=False)
plt.show()

Simplifying the Dataset: Removing ‘Total Sulfur Dioxide’ Feature from Wine Data

df = df.drop('total sulfur dioxide', axis=1)

Conclusion :

Through this project, we embarked on a journey to predict the quality of wines using machine learning techniques. Starting with data import and exploration, we gained insights into the dataset’s structure and identified missing values. By employing data preprocessing methods like mean imputation, we ensured the dataset was ready for analysis.

Visualizations, such as histograms and bar charts, provided a deeper understanding of the distribution and relationships among features like alcohol content and quality. The heatmap unveiled correlations between attributes, shedding light on potential influencing factors.

As a final step, simplification was achieved by removing the ‘total sulfur dioxide’ feature, streamlining the dataset for subsequent stages. This project lays the groundwork for more intricate analyses and model development, ultimately contributing to an enhanced comprehension of the intricate world of wine quality prediction.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button