[FEATURE] <description>Incorporate Time-Series Cross-Validation Support #141
Labels
enhancement
New feature or request
gssoc-ext
GSSoC'24 Extended Version
hacktoberfest
Hacktober Collaboration
hacktoberfest-accepted
Hacktoberfest 2024
level2
25 Points 🥈(GSSoC)
Is this a unique feature?
Is your feature request related to a problem/unavailable functionality? Please describe.
The stock price prediction model currently uses "train_test_split" to randomly split data, which might not be the best method for time-series data. The sequential nature of time-series stock data is ignored by this approach, which may result in data leakage and inaccurate model evaluation.
Proposed Solution
In order to enable the model to split the data sequentially, I wanted to add "TimeSeriesSplit" from scikit-learn. This approach maintains the temporal order by guaranteeing that training is done on past data and evaluation is done on future data.
Screenshots
No response
Do you want to work on this issue?
Yes
If "yes" to above, please explain how you would technically implement this (issue will not be assigned if this is skipped)
I'll change the dataset splitting procedure to make advantage of 'TimeSeriesSplit' and tweak the model training to accommodate multiple splits. In order to demonstrate how 'TimeSeriesSplit' enhances model performance on stock price data, I will also present comprehensive comparison metrics (such as RMSE and MAE) before and after the implementation.
Steps:
1.Modify the data splitting logic to use TimeSeriesSplit.
2.Train the model on each split and calculate evaluation metrics.
3.Compare the results with the current random data split method.
4.Provide detailed documentation on how this feature improves the accuracy of predictions on time-series data.
The text was updated successfully, but these errors were encountered: