Wine Quality Prediction SkillBridges
Introduction:
Welcome to our Wine Quality Prediction project! We’re using numbers and smart computer tricks to figure out what makes wines taste good. Imagine a puzzle where we learn from lots of wines to guess how yummy a new one might be. This helps both wine lovers and makers understand what makes a tasty sip. So, let’s explore together how data and machines team up to predict the flavors of fantastic wines.
Importing Modules
numpy
andpandas
are used for data manipulation and analysis.matplotlib.pyplot
is for data visualization.seaborn
provides enhanced data visualization capabilities.train_test_split
fromsklearn.model_selection
is used to split the dataset into training and testing subsets.MinMaxScaler
fromsklearn.preprocessing
is employed to scale the feature values to a specific range.metrics
fromsklearn
contains various metrics to evaluate model performance.SVC
fromsklearn.svm
is the Support Vector Classifier.XGBClassifier
fromxgboost
is the XGBoost classifier.LogisticRegression
fromsklearn.linear_model
is for logistic regression.warnings
is used to suppress any warning messages.warnings.filterwarnings('ignore')
is used to suppress warning messages that may be displayed during execution.
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sb from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn import metrics from sklearn.svm import SVC from xgboost import XGBClassifier from sklearn.linear_model import LogisticRegression import warnings warnings.filterwarnings('ignore')
Importing DataSet
df = pd.read_csv('winequality.csv') print(df.head())
Understanding Your Data: Exploring Dataset Information
df.info()
Unveiling Data Insights: Analyzing Statistical Summary
df.describe().T
Spotting Data Gaps: Investigating Missing Values
df.isnull().sum()
Filling the Gaps: Imputing Missing Data Using Column Means in Python
for col in df.columns: if df[col].isnull().sum() > 0: df[col] = df[col].fillna(df[col].mean()) df.isnull().sum().sum()
Visualizing Wine Data Distribution: Exploring Histograms of Key Features
df.hist(bins=20, figsize=(10, 10)) plt.show()
Quality vs. Alcohol Content: Analyzing Wine Quality with a Bar Chart
plt.bar(df['quality'], df['alcohol']) plt.xlabel('quality') plt.ylabel('alcohol') plt.show()
Uncovering Relationships: Exploring Correlations in Wine Data with a Heatmap
plt.figure(figsize=(12, 12)) sb.heatmap(df.corr() > 0.7, annot=True, cbar=False) plt.show()
Simplifying the Dataset: Removing ‘Total Sulfur Dioxide’ Feature from Wine Data
df = df.drop('total sulfur dioxide', axis=1)
Conclusion :
Through this project, we embarked on a journey to predict the quality of wines using machine learning techniques. Starting with data import and exploration, we gained insights into the dataset’s structure and identified missing values. By employing data preprocessing methods like mean imputation, we ensured the dataset was ready for analysis.
Visualizations, such as histograms and bar charts, provided a deeper understanding of the distribution and relationships among features like alcohol content and quality. The heatmap unveiled correlations between attributes, shedding light on potential influencing factors.
As a final step, simplification was achieved by removing the ‘total sulfur dioxide’ feature, streamlining the dataset for subsequent stages. This project lays the groundwork for more intricate analyses and model development, ultimately contributing to an enhanced comprehension of the intricate world of wine quality prediction.