Understanding various aspects of deep learning and machine learning can often feel like stepping into uncharted territory with no clue where to go. As you start exploring various algorithms and data, you realize that success is based on more than just building a raw model, it’s more about fine-tuning it to perfection. And when we talk about fine-tuning your model, hyper-parameter tuning comes into the picture as a crucial practice. In this article, we are going to explain Hyperparameter tuning in detail, while revealing the strategies, techniques, and tools that allow you to leverage the capabilities of your machine-learning practices.
What is Hyper Parameter Tuning?
Hyperparameter Tuning is optimizing the hyper-parameters of a model (machine learning or deep learning model) to enhance its performance. These hyperparameters are external configurations data scientists and engineers leverage to manage machine learning model training. Most of the time,, we set model hyperparameters even before starting a model’s training.
What is the difference between hyperparameters and Parameters?
Before jumping directly to the intricacies of Hyperparameter Tuning, you need to understand the difference between Model Parameters and Model Hyperparameters. During the entire training process, both parameters and hyperparameters play an important role in the development of the model, with each serving unique purposes and being dealt with differently.Now, let’s delve into the nuances of the difference between model parameters and model hyperparameters and learn their significance in machine learning algorithms:
Model Parameters vs Model Hyperparameters
(i) Model Parameters:
Model parameters are the variables that the model learns during the training process. Optimizing algorithms such as gradient descent iteratively alters model parameters, directly impacting the predictions and forecasts made by the model. Weights and biases in neural networks are a few examples of Model Parameters.
(ii) Model Hyperparameters:
On the contrary, Model hyperparameters are settings or configurations set before the training process starts. The model cannot learn the model hyperparameters from the data we give it for training. Model Hyperparameters influence the behavior of the learning algorithm and also impact the model’s performance and behavior. Model hyperparameters usually include learning rate, number of hidden layers, activation functions, and regularization strength.
Hyperparameters | Parameters |
Required for estimating the model parameters. | Impact the accuracy of prediction. |
They are estimated by hyperparameter tuning | They are estimated by optimization algorithms(Gradient Descent, Adam, Adagrad) |
They are set manually | They are not set manually |
The hyperparameters you chose will impact how efficient the training is. In gradient descent the learning rate decides how efficient and accurate the optimization process is in estimating the parameters | The final parameters found after training will decide how the model will perform on unseen and unfamiliar data. |
Why Hyperparameter tuning is important?
Hyperparameter tuning is essential for the performance of your model. Hyperparameters are the parameters responsible for directly influencing the model structure, functions, and performance. While the architecture of the model itself is crucial, hyperparameters are equally vital as they determine how efficiently the model learns from the provided data. They are the external configurations (such as learning rate, no. of hidden layers, and regularization strength) that lead the model’s learning process. The selection of appropriate hyperparameters can significantly impact the model’s performance, including its accuracy, generalization ability, and computational efficiency. However, determining the optimal values for these hyperparameters of often non-trivial and requires careful experimentation and tuning.
What are the various Hyperparameter Tuning Techniques?
In machine learning, every tweak and adjustment in the model holds the promise of uncovering its hidden potential. To unlock the exceptional capabilities of the model, there are various hyper-parameter tuning techniques adopted by the engineers. Each of these hyperparameter tuning techniques offers a unique approach to fine-tune the model and improve its overall efficiency, So, let’s take a stroll through these techniques of hyperparameter tuning:
1. Bayesian Optimization For Hyperparameter Tuning :
Bayesian Optimization is a popular fine-tuning technique that is based on statistical modeling and probability theory. If you are a mathematics geek, you must have studied or you must be at least familiar with Bayes’theorem, on which this fine-tuning technique is based. It offers a decent approach to hyper-parameter optimization by constructing a probabilistic model from a set of hyper-parameters and iteratively improving the model to reach its ideal configuration.
In contrast to brute-force techniques like grid search, Bayesian Optimisation employs a more sophisticated approach by utilizing the previous evaluations to direct its search. Implementing this technique will iteratively update a probabilistic surrogate model of the objective function, striking a balance between exploration and exploitation (exploration; searching in regions of uncertainty, exploitation; exploiting promising regions). This allows it to efficiently identify the optimal hyperparameter configuration, leading to improved model performance with fewer evaluations compared to traditional approaches.
Example of Bayesian Optimization
Let’s check out an example to understand it more practically:
In this example, we apply Bayesian optimization to a Support Vector Classifier (SVC) model trained on the Iris dataset using the scikit-optimize library. To identify the optimal configuration, we establish the hyperparameter search space and let the optimization algorithm explore and fine-tune the hyperparameters.
from skopt import BayesSearchCV
from sklearn.datasets import load_iris
from sklearn.svm import SVC
# Loading dataset
iris = load_iris()
X, y = iris.data, iris.target
# Define the model and its hyperparameter search space
model = SVC()
param_space = {
'C': (1e-6, 1e+6, 'log-uniform'),
'gamma': (1e-6, 1e+1, 'log-uniform'),
'kernel': ['linear', 'poly', 'rbf', 'sigmoid']
}
# Perform Bayesian optimization for hyperparameter tuning
opt = BayesSearchCV(
model,
param_space,
n_iter=50, # Number of iterations
cv=5, # Number of cross-validation folds
n_jobs=-1, # Parallelize the search process
verbose=0,
random_state=42
)
# Fit the optimization algorithm to the data
opt.fit(X, y)
# Display the best hyperparameters found
print("Best hyperparameters:", opt.best_params_)
Code language: PHP (php)
Steps are mentioned below for hyperparameter tuning using Bayesian Optimization:
- In the above code, first, we have imported the libraries including ‘load_iris’ and ‘SVC’ from scikit-learn and ‘BayesSearchCV’ from scikit-optimize. Please note that all these libraries are necessary to perform Bayesian optimization and work with the SVC model and Iris dataset.
- Now we load the Iris dataset by using the ‘load_iris’ function and allot features (‘x’) and target labels (‘y’) to variables.
- Once we are done, it’s time to define the model and hyperspace by defining the SVC model instance and mentioning the hyperparameter search using ‘param space’ dictionary. It includes ranges for ‘C’ regularization parameter and the ‘gamma’ kernel coefficient along with the options for the ‘kernel’ type.
- In the next step, we initialize the BayesSearchCV object (opt) with the model, hyperparameter search space, and other parameters including no. of iterations (n_iter), cross-validation folds (cv), and parallelization settings (n_jobs). After this, the optimization is performed to explore and refine the hyperparameters iteratively.
- Now, we will fit the optimization algorithm to the dataset (‘x’, ‘y’) by using the ‘fit( )’ method. This will trigger the Bayesian optimization process.
- After optimization is complete, we get the best hyperparameters (best_params_) discovered by the algorithm and display them to the terminal. These hyperparameters represent the configuration that yielded the best performance for the SVC model on the Iris dataset.
2. Grid Search for Hyperparameter Tuning
In Grid Search, we create a grid of various values for each setting that requires tuning. After this, we train the model with each combination to determine which delivers the most efficient performance. It is like trying every possible combination of settings for the model to see which one works best for the model. Ofcourse, It’s a thorough process but it also takes time, and requires immense computing power, especially in cases where we have numerous settings to test.
Example of Grid Search
Now let us demonstrate it for you. Suppose you have three hyperparameters: learning rate, batch size, and no. of hidden layers. You define ranges for each hyperparameter, such as [0.001, 0.01, 0.1] for learning rate, [1, 2, 3] for the number of hidden layers, and [32, 64, 128] for batch size. Now, Grid Search will create a grid with all possible combinations of these values, such as (0.001, 1, 32), (0.001, 1, 64), (0.001, 1, 128), and so on.
Now for each grid combination, you need to train your model using cross-validation and performance assessments. Cross-validation ensures how well each set of hyperparameters performs. Grid Search can identify which set of hyperparameters produces the best results across various subsets of the data. It splits the data into numerous folds, trains the model on each fold, and assess it on the validation set. After testing all the combinations, you will have to select the one hyper-parameter that delivers the most performance metrics. These performance metrics may include accuracy or loss.
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Define the hyperparameter grid
param_grid = {
'C': [0.001, 0.01, 0.1, 1, 10, 100], # Regularization parameter
'gamma': [0.001, 0.01, 0.1, 1, 10, 100], # Kernel coefficient
'kernel': ['linear', 'rbf', 'poly', 'sigmoid'] # Kernel type
}
# Initialize the SVM classifier
svm = SVC()
# Perform Grid Search with cross-validation
grid_search = GridSearchCV(estimator=svm, param_grid=param_grid, cv=5)
# Fit the Grid Search to find the best hyperparameters
grid_search.fit(X, y)
# Get the best hyperparameters and their corresponding score
best_params = grid_search.best_params_
best_score = grid_search.best_score_
print("Best Hyperparameters:", best_params)
print("Best Score:", best_score)
Code language: PHP (php)
Steps are mentioned below for Hyperparameter tuning using Grid Search:
- Above, We’ve imported necessary libraries such as SVC from sklearn.svm for the Support Vector Classifier, load_iris from sklearn.datasets to load the Iris dataset, and GridSearchCV from sklearn.model_selection to perform grid search.
- Then, we loaded the Iris dataset using load_iris() and allotted the features to X and the target labels to y.
- After this, we specify the hyperparameter grid by the dictionary param_grid. Please note that the ‘C’ parameter (regularisation parameter), ‘gamma’ parameter (kernel coefficient), and ‘kernel’ parameter (kernel type) for the SVM classifier have different values.
- Now we initialize the SVM classifier (svm) by using the default settings.
- Now, in order to perform GridSearch with cross-validation, we create a GridSearchCV object (grid_search) by passing the SVM classifier (estimator), hyperparameter grid (param_grid), and the number of cross-validation folds (cv=5).The GridSearchCV object assesses the performance of every combination of hyperparameters by doing a grid search using 5-fold cross-validation.
- Now, we need to fit the Grid Search to find the most relevant hyperparameters. For this, we will call the ‘fit’ method on the ‘grid_search’ object, passing the dataset features (‘X’) and labels (‘Y’). This will train the SVM classifier with each hyperparameter combination and choose the best fit based on cross-validation scores.
- Now, the retrieval of the best hyperparameters found during grind search needs to be done, We use the best_params_ attribute and their corresponding score using the best_score_ attribute of the grid_search object to do this.
- Lastly, for evaluation and analysis, we will print the optimal hyperparameters along with the score they correspond to to the console.
3. Random Search for Hyperparameter Tuning
Here comes the wildcard of hyperparameter tuning- The Random Search technique. The random search involves randomly choosing the hyperparameter combinations from pre-defined distributions and assessing their performance. It balances exploration and exploitation through repeated random selections and evaluations, eventually improving model performance. Because of its adaptability and capacity for handling big search spaces, it is a well-liked option for machine learning hyperparameter optimization.
Instead of systematically exploring all combinations like Grid Search, Random Search seamlessly samples the hyperparameter space. This process is relatively less computationally intensive, especially for high-dimensional spaces. Even after its random nature, the technique often directs at optimal hyperparameter configurations. It is done by harnessing its ability to find out the diverse regions of spaces.
Example Of Random Search
In this example, We are using the Iris dataset to tune the random search hyperparameters of a Support Vector Machine (SVM) classifier.
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint
# Define the hyperparameter distributions
param_dist = {
'C': uniform(0.001, 1000), # Uniform distribution for C
'gamma': uniform(0.001, 10), # Uniform distribution for gamma
'kernel': ['linear', 'rbf', 'poly', 'sigmoid'] # Kernel type
}
# Initialize the Randomized Search with cross-validation
random_search = RandomizedSearchCV(estimator=svm, param_distributions=param_dist, n_iter=100, cv=5)
# Fit the Randomized Search to find the best hyperparameters
random_search.fit(X, y)
# Get the best hyperparameters and their corresponding score
best_params = random_search.best_params_
best_score = random_search.best_score_
print("Best Hyperparameters:", best_params)
print("Best Score:", best_score)
Code language: PHP (php)
Steps are mentioned below for Hyperparameter tuning using Random search:
- In the above code, We first imported the RandomizedSearchCV from scikit-learn and uniform and randint from scipy.stats for defining hyperparameter distributions.
- Then we used a dictionary param_dist to define the hyperparameter distribution specifying uniform distributions for C and gamma, and a list of kernel types. Afterward, we Initialized the Randomized Search with 100 iterations and 5-fold cross-validation, passing the initialized SVM classifier and hyperparameter distributions.
- Once it’s done, we fit the Randomized Search to find the best hyperparameters using the dataset features X and labels y.
- Now it’s time to retrieve the best hyperparameters and their corresponding score. At last, we printed the best hyperparameters and their score.
Hyperparameter Optimization Tools and Frameworks
There are various tools and libraries used in Hyperparameter tuning. These tools and frameworks facilitate the hyperparameter-tuning process by offering efficient optimization techniques, algorithms, and user-intuitive interfaces. To guide you better on this, we have created a list of most popular Hyper-parameter tuning tools and frameworks:
1. scikit-learn (sklearn)
sklearn has an extensive set of tools for Machine Learning in Python, these tools include various hyper-parameter optimization like Grid Search, Randomized Search and Bayesian Optimization.
- Random search: the search is conducted over a number of random parameter combinations using randomsearchcv.
- Grid search: gridsearchcv searches over all of the grid’s parameter sets.
2. scikit-optimize (skopt)
Scikit-optimize utilizes a Sequential model-based optimization algorithm to give you the best possible solutions for hyper-parameter search problems quickly. Scikit-optimize offers many features other than hyper-parameter-optimization, for instance- storing and loading optimization results, and comparing surrogate models and convergence plots. Furthermore, sci-kit-optimize can seamlessly integrate with sci-kit learn offering efficient optimization algorithms (including Bayesian Optimization).
3. Hyperopt
Another popular hyperparameter tuning package is Hyperopt. It offers a customizable and reliable framework for hyperparameter optimization. Users specify a search space in which they believe the best results will be delivered using Hyperopt. This allows the algorithm in Hyperopt to search more precisely. There are 3 algorithms that can be implemented in hyperopt which are Random Search, Adaptive TPE, and Tree of Parzen Estimators (TPE).
To use Hyperopt, you first have to describe the following:
- The objective of the function is to minimize
- Space over which you need to search
- The database in which you need to store all the point evaluations of the search
- The search algorithm to use
4. Optuna
Another commonly used hyperparameter optimization framework is Optuna, which automates the optimization process using various algorithms including CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and TPE (Tree-structured Parzen Estimator). In order to identify the most promising location to search for hyperparameter optimization, Optuna employs a historical record of trail details. This allows it to locate the ideal hyperparameter in the shortest period of time.
It contains a pruning mechanism that, in the early phases of training, automatically ends the less promising paths. Among the main features offered by Optuna are Versatile, lightweight, and platform-agnostic architecture, efficient optimization algorithms, Python search spaces, faster visualization, and seamless parallelization.
5. TensorFlow and Keras
If you build your deep learning model on TensorFlow and Keras Tuner, you can utilize their specialized tools for hyperparameter tuning. They have seamlessly usable APIs that you can use to define search spaces and conduct optimization experiments.
Keras Tuner allows you to select the ideal set of hyperparameters for your TensorFlow program. In addition to defining the model architecture when building a model for hyperparameter tuning, you also have to define the search space. When defining the hypermodel, you can use these two approaches: either utilize a model builder function or subclass the hypermodel of the Keras Tuner API.
6. Ray Tune
Another popular choice used for hyperparameter tuning at any scale is Raytune. It leverages the capabilities of distributed computing to make hyperparameter optimization faster. It also has implementation for various optimization algorithms at scale making it suitable for large-scale hyper-parameter optimization tasks. Here are the features of ray-tune:
- distributed asynchronous optimization using Ray right out of the box.
- Seamlessly Scalable
- Available SOTA algorithms, including Population-Based Training, BOHB, and ASHA.
- supports MLflow and Tensorboard.
- Supports diverse frameworks, including PyTorch, TensorFlow, XGBoost, and Sklearn.
Hyperparameter Tuning Algorithms
Several algorithms are specifically for Hyperparameter tuning. These algorithms aim to explore the hyperparameter space efficiently and find the ideal configurations. Here are the most efficient and popular Hyperparameter tuning algorithms:
1. Hyperband
Hyperband is a bandit-based algorithm used for hyperparameter optimization. This hyperparameter tuning algorithm combines the random search with successive halving. Hyperband can also be described as a variation of random search however it uses some explore-exploit theory to identify the ideal time allocation for each of the available configurations. It facilitates effective exploration of the hyperparameter space by allocating more resources to promising hyperparameter configurations and eliminating unpromising ones.
2. Population-Based Training (PBT)
Population-based Training is an evolutionary algorithm used for hyperparameter optimization. It keeps track of a vast population of potential solutions or hyperparameter configurations and iteratively uses genetic operators including crossover, selection and mutation repeatedly to evolve the population in the direction of better solutions.
This technique is an amalgamation of the two most popular search techniques: application of manual tuning to neural network models and random search.
We start PBT by training multiple neural networks simultaneously with random hyperparameters. These networks, however, are not entirely independent of one another.
The hyperparameters are refined and the value of the hyperparameter to try is determined by using data from the remainder of the population.
3. Bayesian Optimization With Hyperband (BOHB)
BOHB combines the efficacy of Bayesian optimization with the effectiveness of Bayesian Optimization. It uses Hyperband to assign resources dynamically to various hyperparameter configurations and Bayesian optimization to model the main function and direct the search.
4. Genetic Algorithms for Hyperparameter Optimisation
The principles of natural selection and evolution serve as the basis for genetic algorithms (GAs). They keep track of a population of potential solutions, or hyperparameter configurations, and repeatedly use genetic operators like crossover, selection, and mutation to push the population toward better answers.
5. Tree-structured Parzen Estimator (TPE)
Tree-Structure Parzen Estimator (TPE) is also a Bayesian Optimization algorithm for hyperparameter tuning. It uses the probability density functions (PDFs) to simulate the distribution of both good and bad hyperparameters. TPE focuses the search on promising areas of the hyperparameter space by iteratively updating these PDFs based on the effectiveness of examined configurations. Through careful consideration of both exploration and exploitation, TPE effectively finds the best hyperparameter configurations.
6. Particle Swarm Optimization (PSO)
PSO is a population-based optimization algorithm that draws social behavior models from fish schools and bird flocks. It keeps track of a swarm of potential solutions, or particles, that travel around the hyperparameter space in an attempt to find the best fit.
Final Words
To sum up, maximizing the performance of machine learning and deep learning models requires a solid understanding of Hyperparameter tuning in Python. A foundational knowledge of the differences between model parameters and hyperparameters is necessary to implement successful tuning techniques. Various methods such as Bayesian Optimisation, Grid Search, and Random Search provide different ways to effectively fine-tune models. The tuning process is streamlined by utilising frameworks and tools like scikit-learn, scikit-optimize, and Optuna. Furthermore, specialized algorithms that offer sophisticated optimization methodologies for efficiently exploring the hyperparameter space include Hyperband, Population-Based Training, and Bayesian Optimisation with Hyperband. By utilizing these methods and resources, you may fully realize the potential of your machine learning models. Have fun with the adjustments!