This repository implements a Transformer model that supports both full-precision floating-point and N-bit fixed-point arithmetic for activations and weights. This flexibility allows experimentation with reduced precision arithmetic, enabling trade-offs between computational efficiency and model accuracy. The model is specifically applied to the task of stock price prediction.
- Generated using DALLE
This repository contains a Jupyter Notebook implementation of a quantized Transformer model for stock price prediction. The notebook is organized into clearly defined sections to facilitate reproducibility and ease of understanding. Each section corresponds to a specific stage in the workflow, from data preparation to model analysis. Below is a summary of the key sections:
- Setup: Importing all necessary libraries and packages required for the implementation.
- Data Fetch and Pre-processing: Fetching stock price data and preparing it for training and evaluation.
- Quantized Transformer Model: Defining the architecture of the Transformer model with support for quantized computations.
- Training and Inference Function: Functions for model training and generating predictions using the trained model.
- Visualization Function: Functions to visualize stock price predictions and evaluate the model's performance.
- Model Analysis Function: Tools to analyze and compare model performance across different quantization levels.
- Execution: Bringing all components together to train, evaluate, and analyze the model.
- FP16 Quantization Using Torch.amp Library: Demonstrating model quantization to FP16 precision using PyTorch’s
torch.amp
library.
Before running the notebook, make sure you have the following Python packages installed:
os
torch
pickle
numpy
pandas
torch.nn
yfinance
matplotlib
torch.nn.functional
ta
sklearn
torch.utils.data
torch.amp
You can install these packages using pip as follows:
pip install torch numpy pandas yfinance matplotlib ta scikit-learn
This code was developed and tested on a MacBook Pro using the MPS (Metal Performance Shaders) backend for GPU acceleration. If you are running the notebook on Google Colab or another system, you may need to select the appropriate device for your setup.
To do this, locate the following line in the code:
device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
Replace 'mps' with the device available on your system, such as 'cuda' for NVIDIA GPUs or 'cpu' for standard CPU execution. If no compatible device is available, the code will default to running on the CPU.
This section focuses on fetching historical stock market data, enriching it with technical indicators, and preparing it for training and evaluation.
-
store_stock_data(ticker, start_date, end_date, file_path)
This function fetches historical stock market data for a given ticker symbol and time period, using theyfinance
library. It saves the data in CSV format at the specified file path.- Columns included:
Date
,High
,Low
,Open
,Adj Close
,Close
,Volume
.
- Columns included:
-
load_data_with_indicators(file_path, column='Close', time_step=100, train_fraction=0.8)
This function loads the stored stock data and performs the following:- Ensures the
Date
column is in the correct format and addsDayOfWeek
andMonth
as additional features. - Adds technical indicators such as RSI, MACD, and Bollinger Bands using the
ta
library. - Includes lagged
Close
values as features for improved model performance. - Normalizes features and target using
MinMaxScaler
. - Splits the data into training and testing sets based on the
train_fraction
. - Creates time-series sequences of a specified length (
time_step
). - Converts the processed data into PyTorch tensors.
- Ensures the
The function returns the following:
train_X
,train_Y
: Training features and labels as PyTorch tensors.test_X
,test_Y
: Testing features and labels as PyTorch tensors.feature_scaler
,target_scaler
: Scalers used for normalizing features and targets.
Ensure the yfinance
and ta
libraries are installed. You can install them using pip:
pip install yfinance ta scikit-learn
Notes:
- The data fetching and processing code assumes the input dataset contains a Date column. Ensure your dataset meets this requirement.
- Missing or invalid dates in the dataset will be dropped during processing.
- This implementation normalizes data to the range [0, 1] and splits it into train and test sets, making it ready for input to the Transformer model.
This section defines a custom transformer model with quantization applied to both the activations and weights of each layer. The model includes custom multi-head attention and feed-forward layers that support fixed-point quantization.
-
quantize_tensor(tensor, bits)
: Quantizes the input tensor to a fixed-point representation with a specified number of bits. The range is determined by the minimum and maximum values of the tensor. -
quantize_tensor_weights(tensor, bits)
: Quantizes the weight tensor to a fixed-point representation with a specified number of bits. Special handling is applied whenbits
is 1 (binarization). -
quantize_layer_weights(layer, bits)
: Quantizes the weights of a given layer. This function iterates over all parameters in the layer and appliesquantize_tensor_weights
to the weights. -
QuantizedCustomMultiheadAttention
: A custom multi-head attention layer that supports quantization for both activations and weights. The layer includes methods to quantize weights and activations during the forward pass. -
QuantizedTransformerStockPredictor
: A transformer model designed for stock prediction that supports quantization for both activations and weights. It includes positional encoding, multiple self-attention layers, feed-forward layers and a fully connected layer at the end. Quantization is applied at each layer during both training and inference.
- Quantization: Both activations and weights are quantized to fixed-point representations, with the number of bits specified by the user. This reduces model size and computation during inference.
- Custom Multi-Head Attention: Implements a quantized version of multi-head attention that supports the same operations as the standard transformer, but with reduced precision for both weights and activations.
- Flexible Configuration: Users can adjust the number of bits for both activations and weights, allowing a trade-off between accuracy and performance.
The QuantizedTransformerStockPredictor
class includes an implementation of the forward pass where quantization is applied to the inputs, self-attention layers, and feed-forward layers, followed by the output layer. Quantization is applied on each layer's activations and weights to enable more efficient inference while retaining the functionality of the original transformer architecture.
The training
function trains the model using Mean Squared Error (MSE) loss and the Adam optimizer. The training process includes both training and validation steps. The function accepts the following parameters:
model
: The neural network model to train.train_X
,train_Y
: The training features and labels.test_X
,test_Y
: The validation features and labels.epochs
: The number of epochs to train the model (default is 50).batch_size
: The batch size used for training (default is 32).lr
: The learning rate for the optimizer (default is 0.001).device
: The device (CPU or GPU) to run the model on (default is CPU).
-
Model and Data Initialization:
- The model and data are moved to the specified device (either CPU or GPU).
-
Loss Calculation (Epoch 0):
- The model is set to evaluation mode (
model.eval()
) to calculate the initial training and validation loss without updating the model parameters. - A batch-wise calculation of the training loss is done using the MSE loss function.
- Similarly, validation loss is computed on the validation dataset.
- The model is set to evaluation mode (
-
Training Loop:
- The model is set to training mode (
model.train()
). - A random permutation of the training dataset is generated, and batches are processed.
- For each batch:
- The optimizer gradients are cleared using
optimizer.zero_grad()
. - The model is forward-passed to obtain predictions.
- The loss is computed, and backpropagation (
loss.backward()
) is performed. - Gradients are clipped to prevent exploding gradients using
torch.nn.utils.clip_grad_norm_()
. - The optimizer updates the model weights with
optimizer.step()
.
- The optimizer gradients are cleared using
- The training loss is calculated and averaged for the entire epoch.
- The model is set to training mode (
-
Validation Loop:
- The model is evaluated on the validation dataset after each training epoch.
- No gradients are computed during validation (
with torch.no_grad()
). - The validation loss is computed similarly to the training loss.
-
Learning Rate Scheduling:
- A learning rate scheduler (
StepLR
) is used to reduce the learning rate by a factor of 0.1 every 10 epochs.
- A learning rate scheduler (
-
Return:
- After all epochs, the model is moved back to CPU, and a dictionary containing the training and validation losses for each epoch is returned.
The inference
function is used to make predictions on a dataset using the trained model. It accepts the following parameters:
model
: The trained PyTorch model.data_loader
: A DataLoader containing the dataset (could be for training, validation, or test data).
-
Model Evaluation:
- The model is set to evaluation mode (
model.eval()
) to disable dropout and batch normalization, ensuring consistent predictions.
- The model is set to evaluation mode (
-
Batch-wise Prediction:
- For each batch of data in the
data_loader
:- The model performs a forward pass to obtain predictions.
- Predictions and true labels (ground truth) are stored in
predictions
andtrue_values
lists, respectively.
- For each batch of data in the
-
Return:
- The function returns the predictions and true values as numpy arrays.
This function allows the model to generate predictions for a given dataset and is typically used after training to evaluate the model's performance on a test set or to make predictions on new data.
The plot_stock_prediction
function visualizes the stock price predictions made by the trained model, along with the actual prices for both the training and test datasets. The function performs the following steps:
model
: The trained model used for making predictions.train_X
,train_Y
: The feature and target data for the training set.test_X
,test_Y
: The feature and target data for the test set.batch_size
: The batch size used for data loading.target_scaler
: The scaler used to inverse the scaling of the target variable (e.g., stock prices).ticker
: The stock ticker symbol (used in the plot title).
-
Data Preparation:
DataLoader
instances are created for both the training and test datasets, using the providedtrain_X
,train_Y
,test_X
, andtest_Y
tensors. The batches are not shuffled as time-series data needs to maintain its order.
-
Prediction:
- The
inference
function is called on both the training and test DataLoaders to obtain predictions (train_preds
,test_preds
) and actual values (train_actuals
,test_actuals
).
- The
-
Inverse Scaling:
- The predicted and actual values are inverse-transformed using the
target_scaler
to restore the original scale of stock prices. The predictions and actuals are reshaped and flattened for the plotting process.
- The predicted and actual values are inverse-transformed using the
-
Data Combination:
- The training actual values (
train_actuals_unscaled
) and the test actual values (test_actuals_unscaled
) are combined into a single array (full_actuals
). - The predictions for the training set are combined with
NaN
values for the test set (full_preds
). - Similarly, the test predictions are combined with
NaN
values for the training set (test_preds_combined
).
- The training actual values (
-
Plotting:
- The plot is generated with the following lines:
- Blue line: Represents the actual stock prices (
full_actuals
). - Green dashed line: Represents the predicted prices for the training set (
full_preds
). - Red dashed line: Represents the predicted prices for the test set (
test_preds_combined
).
- Blue line: Represents the actual stock prices (
- The plot is customized with a title, axis labels, and a grid for better readability.
- The plot is generated with the following lines:
-
Displaying the Plot:
- The plot is displayed using
plt.show()
to visualize the results.
- The plot is displayed using
The plot_loss
function visualizes the training and validation loss over the course of the model's training. The function performs the following steps:
training_loss
: A list or array containing the training loss at each epoch.validation_loss
: A list or array containing the validation loss at each epoch.epochs
: The total number of epochs the model was trained for.model_name
: The name of the model, which will be included in the plot title.
-
Plotting the Losses:
- The function creates a plot with the following lines:
- The training loss (
training_loss
) is plotted over the epochs. - The validation loss (
validation_loss
) is plotted over the epochs.
- The training loss (
- The function creates a plot with the following lines:
-
Customization:
- The plot includes:
- A title indicating the model name (
{model_name} - Training vs Validation Loss
). - X-axis labeled as "Epochs".
- Y-axis labeled as "Loss".
- A legend to differentiate between training and validation losses.
- A grid for better readability.
- A title indicating the model name (
- The plot includes:
-
Displaying the Plot:
- The plot is displayed using
plt.show()
to visualize the loss trends during training and validation.
- The plot is displayed using
These two functions help visualize the performance of the model during training, both in terms of loss reduction and prediction accuracy on the stock price prediction task.
The model_analysis
function trains or loads a pretrained model, performs stock price prediction analysis, and visualizes the results. It also tracks the model's loss convergence during training. The function executes the following steps:
activation_bits
: The number of bits used for activation precision (e.g., 32, 16, 8, 4). Default isNone
, meaning full precision (FP32).weight_bits
: The number of bits used for weight precision (e.g., 32, 16, 8, 4). Default isNone
, meaning full precision (FP32).load_pretrained_model
: A boolean flag indicating whether to load a pretrained model (True
) or train a new one (False
).epochs
: The number of epochs for training. Default is 50.batch_size
: The batch size for training. Default is 64.lr
: The learning rate for training. Default is 0.001.
-
Precision Selection:
- Based on the provided
activation_bits
andweight_bits
, the precision for the model is determined:- If both
activation_bits
andweight_bits
areNone
, the precision is set to 'FP32' (floating point 32-bit). - If both
activation_bits
andweight_bits
are the same (e.g., 32, 16, 8, 4), the corresponding fixed-point precision is used (e.g., 'FX32', 'FX16'). - If only
weight_bits
is set to 1, the model will use binary weights (i.e., 'BinarizeWeights').
- If both
- Based on the provided
-
Model and Loss Tracker Paths:
- Paths for saving and loading the model and loss tracker are generated based on the ticker symbol, precision, and number of epochs.
- These paths are used to save and load the model's state and the training loss history.
-
Loading and Preprocessing Data:
- The function loads the data using the
load_data_with_indicators
function. The data is scaled usingfeature_scaler
andtarget_scaler
, and the data is split into training and test datasets based on the providedtrain_fraction
.
- The function loads the data using the
-
Model Creation:
- A
QuantizedTransformerStockPredictor
model is instantiated with the specified hyperparameters such asinput_dim
,d_model
,nhead
,num_encoder_layers
,dim_feedforward
,drop_out
, andactivation_bits
/weight_bits
.
- A
-
Training the Model:
- If
load_pretrained_model
is set toFalse
, the model is trained on the loaded data using thetraining
function, which tracks the loss over the epochs. The model is trained on the available device (mps
orcpu
), and the model's state is saved tomodel_path
, while the loss tracker is saved toloss_tracker_path
.
- If
-
Loading a Pretrained Model:
- If
load_pretrained_model
is set toTrue
, the model loads its state from themodel_path
, and the loss tracker is loaded from theloss_tracker_path
. This allows the user to resume from a previously saved model.
- If
-
Stock Price Prediction Visualization:
- The
plot_stock_prediction
function is called to visualize the actual vs predicted stock prices for both training and testing datasets. This helps in evaluating the model's prediction performance.
- The
-
Loss Convergence Visualization:
- The
plot_loss
function is called to visualize the training and validation loss curves over the epochs. This helps in evaluating the model's convergence and whether it is overfitting or underfitting.
- The
In summary, this function provides a comprehensive approach to training, evaluating, and analyzing the performance of a stock price prediction model. It handles both training from scratch and loading pretrained models, while also providing insightful visualizations of the model's performance and loss convergence.
In this section, the code configures the parameters for the data, model, and training, and then initiates the training process for different quantization configurations. The goal is to train a stock price prediction model using various bit-widths for activations and weights, including full precision (FP32) and various fixed-point precisions (FX32, FX16, FX8, FX4), as well as binarized weights.
-
Data Configuration:
- The
ticker
variable specifies the stock symbol to be used for data collection (e.g., 'AAPL' for Apple). - The
start_date
andend_date
define the time period for which stock data is retrieved (from January 1, 2016, to January 1, 2024). - The
time_step
is set to 5, which determines the sequence length for the model. - The
train_fraction
is set to 0.8, meaning 80% of the data will be used for training, and the remaining 20% will be used for testing. - The
data_path
defines the location of the stock data file (stocks_data/AAPL.csv
).
The
store_stock_data
function is called with the above parameters to download and save the stock data to the specifieddata_path
. - The
-
Model Configuration:
- The model's configuration is set with the following parameters:
drop_out
: Dropout rate of 0.3 is used to prevent overfitting.dim_feedforward
: The size of the feedforward layer in the model is set to 128.num_encoder_layers
: The model will use 4 encoder layers.nhead
: The number of attention heads is set to 8.dmodel
: The model dimension is set to 64.
- The model's configuration is set with the following parameters:
-
Training Configuration:
- The
load_pretrained_model
flag is set toFalse
, indicating that the model will be trained from scratch. epochs
is set to 50, meaning the model will be trained for 50 epochs.learning_rate
: The learning rate for training is set to 0.001.batch_size
: The batch size for training is set to 64.
- The
-
Launching Training for Various Quantization Configurations:
- A loop is used to train the model with different quantization configurations. The following combinations of activation and weight bit-widths are tested:
- FP32: Full precision (32-bit floating-point) for both activations and weights.
- FX32: 32-bit fixed-point precision for both activations and weights.
- FX16: 16-bit fixed-point precision for both activations and weights.
- FX8: 8-bit fixed-point precision for both activations and weights.
- FX4: 4-bit fixed-point precision for both activations and weights.
- BinarizeWeights: Binary weights with full precision activations (1-bit for weights).
For each combination of
activation_bits
andweight_bits
, themodel_analysis
function is called, which trains the model or loads a pretrained model, and visualizes the results of the stock price predictions and loss convergence. - A loop is used to train the model with different quantization configurations. The following combinations of activation and weight bit-widths are tested:
This section automates the training process for different quantization schemes and provides a comprehensive analysis of how varying precision levels for activations and weights affect the performance of the stock price prediction model.
The model used in this section is the same as the one described earlier, but without any quantization configuration capabilities. The model is implemented to support training with FP16 precision using the torch.amp
library, which enables mixed-precision training.
In this section, the training and inference functions are modified to support FP16 mixed-precision. The key differences from the earlier training and inference functions are as follows:
- Mixed-Precision Training: The
torch.amp
library is used for automatic mixed-precision (AMP) training, whereautocast
is used for performing operations in FP16 precision, andGradScaler
is used to scale the gradients during backpropagation to avoid underflow. - Loss Scaling: The loss is scaled using
scaler.scale(loss)
to ensure stable gradient updates during training, which is particularly important for FP16 precision.
- Mixed-Precision Inference: Similar to the training function,
autocast
is used during the inference phase to use FP16 precision for predictions.
The execution in this section follows a similar approach to what was done previously. The data and model configurations remain the same, and the training is launched for FP16 precision using the modified training_fp16
function. The model is then evaluated using the inference_fp16
function for prediction and performance evaluation in FP16 precision.