Logistic Regression Explained: A Step-by-Step Guide

Introduction

Logistic regression algorithm is a fundamental statistical and machine learning technique used for binary classification tasks. It’s the go-to method for problems where you want to predict the presence or absence of something, be it spam email detection, disease diagnosis, or customer churn prediction. This guide will cover the essentials of logistic regression, provide a practical example, and walk you through the implementation process using a sample dataset.

If you want to know about Linear Regression, you can refer the article Part 1 & Part 2.

Logistic Regression Explained

What is Logistic Regression?

Logistic regression algorithm is a predictive analysis technique used for binary classification problems. Unlike linear regression, which predicts continuous outcomes, logistic regression predicts the probability of a binary outcome (1 or 0, Yes or No, True or False). It uses the logistic function (or sigmoid function) to model the probability that a given input belongs to the positive class (usually denoted as “1”).

How Does Logistic Regression Work?

The logistic regression model computes a weighted sum of input features (plus a bias term), but instead of outputting this sum directly, it passes it through a logistic (sigmoid) function to produce the output probability. The logistic function is an S-shaped curve that maps any real-valued number into a value between 0 and 1, making it perfect for modeling probabilities.

The logistic function formula is:

Why Use Logistic Regression?

Simplicity: Logistic regression is straightforward to understand and interpret, making it a good baseline model for binary classification problems.
Efficiency: It’s computationally inexpensive, making it suitable for problems with a large number of features.
Probability Estimates: It provides probabilities for outcomes, which can be a critical requirement in many applications like risk modeling.

Types of Logistic Regression

Logistic Regression, despite its name, is a linear model for classification rather than regression. It is a predictive analysis algorithm and based on the concept of probability. We use logistic regression to estimate the probabilities of binary outcomes. It can, however, be extended to multiclass classification through techniques like the One-vs-Rest (OvR) scheme or the multinomial logistic regression. Here are the main types of logistic regression:

1. Binary Logistic Regression

Binary logistic regression is the most common and is used when the target variable is binary, meaning it has only two possible outcomes. Examples include email spam detection (spam or not spam), loan default (yes or no), and disease diagnosis (positive or negative). The output is modeled as a probability that the given input point belongs to a particular class, which is then mapped to a binary outcome using a threshold classifier.

2. Multinomial Logistic Regression (Softmax Regression)

Multinomial logistic regression is used when the target variable can take three or more unordered categories. Unlike binary logistic regression, which outputs a single probability score, multinomial logistic regression can output multiple probabilities, one for each class. The class with the highest probability is taken as the model’s prediction. This type of logistic regression is particularly useful for solving multi-class classification problems, such as predicting the type of animal (dog, cat, fish).

3. Ordinal Logistic Regression

Ordinal logistic regression is used for ordinal targets, where the categories have a natural order but the distances between categories are not known. This type is suitable for variables that express an order or rank, such as movie ratings from one to five stars, stages of cancer, or economic status levels. The model predicts the probability of the target variable falling into a particular category or below.

Key Differences and When to Use

Binary Logistic Regression is used when there are only two possible outcomes. It’s the simplest form of logistic regression and widely used in binary classification tasks.
Multinomial Logistic Regression is applicable when dealing with multiple classes that have no order. It extends binary logistic regression to multiple classes through techniques like the One-vs-Rest method or by modeling the problem directly using a softmax function.
Ordinal Logistic Regression is used when the categories are ordered. This type considers the order of categories but assumes that the distances between them are not uniform. It’s especially useful when the classification involves ranking or ordering.

Cost Function in Logistic Regression

The cost function, also known as the loss function, plays a pivotal role in logistic regression, guiding the optimization of the model parameters to produce the best possible predictions. In logistic regression, the cost function is designed to measure the difference between the predicted values and the actual values in the training data. The goal during training is to minimize this cost function.

Understanding Logistic Regression Cost Function

Logistic regression algorithm uses a sigmoid function to predict the probability that a given input belongs to a particular class. The output of the sigmoid function is then used to make predictions: if the output probability is greater than 0.5, the input is classified into one class, and if it is less than 0.5, it is classified into another. The cost function is crucial for assessing how well the logistic regression model performs.

The cost function used in logistic regression is the “log loss” or “binary cross-entropy loss” for binary classification problems. This cost function is defined as:

This cost function effectively captures the cost associated with the prediction error. If the actual class \(y^{(i)} \) is 1 and the model predicts a probability close to 1, the cost is low. Conversely, if the model predicts a probability close to 0 when the actual class is 1, the cost is high. The same logic applies when the actual class \(y^{(i)} \) is 0.

Properties of the Cost Function

Non-convexity: Unlike linear regression, the cost function for logistic regression is non-linear due to the sigmoid function. However, it is still convex, which guarantees that the gradient descent algorithm will converge to the global minimum.
Gradient Descent Optimization: The parameters of the model (θ) are adjusted by minimizing the cost function using an optimization algorithm such as gradient descent. At each step, gradient descent updates the parameters in the direction that most reduces the cost function.
Regularization: To prevent overfitting, regularization terms can be added to the cost function. L1 (lasso) and L2 (ridge) are common regularization techniques that add a penalty on the size of coefficients.

Practical Example: Model Training with Logistic Regression

Let’s dive into a practical example by using the popular Iris dataset. Although the Iris dataset includes three classes, we’ll simplify our problem to a binary classification task by predicting whether an Iris flower is of the species Setosa or not.

Preparing the Dataset

The Iris dataset contains 150 records of Iris flowers, each with four features: sepal length, sepal width, petal length, and petal width. For our binary classification, we’ll label the Iris Setosa species as “1” and the other species as “0”.

Implementing Logistic Regression

We’ll use Python’s scikit-learn library for implementing logistic regression. Here’s a step-by-step code snippet:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
import numpy as np

# Load the Iris dataset
X, y = load_iris(return_X_y=True)

# Convert to a binary classification problem
y = np.where(y == 0, 1, 0)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the logistic regression model
model = LogisticRegression(solver='liblinear')
model.fit(X_train, y_train)

# Evaluate the model
score = model.score(X_test, y_test)
print(f"Model accuracy: {score*100:.2f}%")

This code segment loads the Iris dataset, converts it into a binary classification task, splits the dataset into training and testing sets, trains a logistic regression model, and evaluates its accuracy.

Conclusion

Logistic regression is a powerful yet simple technique for binary classification problems. Its ease of implementation, interpretability, and efficiency make it a valuable tool for data scientists and machine learning practitioners. While more complex models might offer higher accuracy, logistic regression serves as an excellent starting point and benchmark.

Frequently Asked Questions (FAQs)

Q. When should I use logistic regression?

Use logistic regression for binary classification tasks, especially when you need a fast and interpretable model.

Q. Can logistic regression be used for multi-class classification?

Yes, through an extension known as multinomial logistic regression or the one-vs-rest (OvR) strategy, logistic regression can be adapted for multi-class classification problems.

Q. How do I deal with overfitting in logistic regression?

Regularization techniques like L1 and L2 regularization can help prevent overfitting by penalizing large coefficients.

Q. Is logistic regression sensitive to feature scaling?

Yes, logistic regression can benefit from feature scaling, particularly because it uses gradient descent optimization internally, which converges faster when features are on a similar scale.

Q. How can I interpret logistic regression coefficients?

The coefficients in logistic regression represent the change in the log odds of the outcome for a one-unit change in the predictor variable, holding all other variables constant. Positive coefficients increase the log odds of the outcome (and thus increase the probability), while negative coefficients decrease it.