In this project, we will explore how to perform iris classification using logistic regression in Python. We'll use the Iris dataset, split it into training and testing sets, train a logistic regression classifier, and evaluate its accuracy. We'll explain each step of the code and provide the final working code at the end.
Loading and Preparing the Data
To begin, we need to import the necessary libraries:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
We import pandas
for creating a DataFrame, load_iris
to load the Iris dataset, train_test_split
to split the data into training and testing sets, LogisticRegression
for logistic regression modeling, and accuracy_score
to calculate the accuracy of the model.
Next, we load the Iris dataset and create a DataFrame:
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target_names[iris.target]
We load the Iris dataset using load_iris()
and create a DataFrame df
from the data. We also add a 'species' column to the DataFrame, mapping the target values to their corresponding target names.
Displaying the DataFrame
We display the DataFrame to get an overview of the data:
print(df)
We simply print the DataFrame df
to the console.
Splitting the Data
We split the dataset into features (X) and labels (y):
x = iris.data
y = iris.target
We assign iris.data
to x
and iris.target
to y
.
Next, we split the data into training and testing sets:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
We use train_test_split()
to split x
and y
into training and testing sets. Here, we specify a test size of 0.2 (20% of the data) and set random_state
to 42 for reproducibility.
Training and Evaluating the Model
We create an instance of LogisticRegression
with a higher max_iter
value:
classifier = LogisticRegression(max_iter=1000)
We instantiate a logistic regression classifier classifier
and set max_iter
to 1000 to ensure convergence.
Next, we train the classifier using the training data:
classifier.fit(x_train, y_train)
We use fit()
to train the logistic regression classifier on x_train
and y_train
.
After training, we make predictions on the testing set:
y_pred = classifier.predict(x_test)
We use predict()
to make predictions on x_test
.
Finally, we calculate the accuracy of the model:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
We calculate the accuracy by comparing the predicted labels y_pred
with the actual labels y_test
using accuracy_score()
. The accuracy is then printed to the console.
Final Code
Here's the complete Python code for performing iris classification with logistic regression:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load the Iris dataset
iris = load_iris()
# Create a DataFrame from the data
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target_names[iris.target]
# Display the DataFrame
print(df)
# Split the dataset into features (X) and labels (y)
x = iris.data
y = iris.target
# Split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
# Create an instance of LogisticRegression with a higher max_iter value
classifier = LogisticRegression(max_iter=1000)
# Train the classifier
classifier.fit(x_train, y_train)
# Make predictions on the testing set
y_pred = classifier.predict(x_test)
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
That's it! You now have a Python code snippet that allows you to perform iris classification using logistic regression. Feel free to modify the code or use it as a starting point for your own projects. Enjoy classifying the Iris dataset!