Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create support for multi-target GA-XGB ensemble #1

Merged
merged 3 commits into from
Dec 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -172,10 +172,12 @@ log/
.ruff_cache/

# model files
**/model_base/*
*.pkl

# reports
reports/report_*
reports/scores/scores_*.csv
reports/portfolios/portfolio_*.xlsx

# references
references
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)


This project implements an intelligent dynamic stock selection system using an **Adaptive Genetic Algorithm-optimized XGBoost** (GA-XGBoost) classifier to identify stocks with potential market outperformance. The model analyzes quarterly financial statements, market data, insider trading patterns and other external data to rank and select stocks that will outperform the S&P 500 index over a one-year horizon. The project includes a **Streamlit-based analytics dashboard** that provides comprehensive stock analysis tools, including technical indicators, financial metrics visualization, and model-driven insights.
This project implements an intelligent dynamic stock selection system using **Adaptive Genetic Algorithm-optimized XGBoost** (GA-XGBoost) ensemble models to identify stocks with potential market outperformance in the medium to long term. The model analyzes quarterly financial statements, market data, insider trading patterns and other external data to rank and select stocks that will outperform the S&P 500 index over a one-year horizon. The project includes a **Streamlit-based analytics dashboard** that provides comprehensive stock analysis tools, including technical indicators, financial metrics visualization, and model-driven insights.


## Table of Contents
Expand All @@ -28,7 +28,7 @@ This project implements an intelligent stock selection system that identifies po
The core engine combines three key components:

1. **Data Pipeline**
- Automated collection of S&P 500 constituent data
- Automated collection of S&P500 constituent data
- Integration of multiple data sources:
- Quarterly financial statements and earnings reports
- Daily market data and technical indicators
Expand Down
1,770 changes: 1,770 additions & 0 deletions notebooks/classification.ipynb

Large diffs are not rendered by default.

857 changes: 0 additions & 857 deletions notebooks/db_analysis.ipynb

This file was deleted.

40,345 changes: 38,187 additions & 2,158 deletions notebooks/eda.ipynb

Large diffs are not rendered by default.

365 changes: 365 additions & 0 deletions notebooks/experiments_class.ipynb

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions notebooks/mock_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
"cells": [
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[32m2024-11-29 11:44:04.892\u001b[0m | \u001b[32m\u001b[1mSUCCESS \u001b[0m | \u001b[36mstocksense.database_handler.schema\u001b[0m:\u001b[36mcreate_tables\u001b[0m:\u001b[36m121\u001b[0m - \u001b[32m\u001b[1mTables created successfully\u001b[0m\n"
"\u001b[32m2024-12-23 10:14:10.240\u001b[0m | \u001b[32m\u001b[1mSUCCESS \u001b[0m | \u001b[36mstocksense.database.schema\u001b[0m:\u001b[36mcreate_tables\u001b[0m:\u001b[36m121\u001b[0m - \u001b[32m\u001b[1mTables created successfully\u001b[0m\n"
]
}
],
Expand All @@ -18,7 +18,7 @@
"\n",
"import polars as pl\n",
"\n",
"from stocksense.database_handler import DatabaseHandler\n",
"from stocksense.database import DatabaseHandler\n",
"\n",
"FIXTURE_PATH = Path(\"../tests/fixtures\")\n",
"\n",
Expand All @@ -27,7 +27,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 2,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -63,7 +63,7 @@
"└──────┴────────────┴────────────┴──────────┴───┴────────┴────────┴──────────┴──────────────┘"
]
},
"execution_count": 4,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -75,7 +75,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
Expand Down
1,726 changes: 0 additions & 1,726 deletions notebooks/modeling.ipynb

This file was deleted.

1,856 changes: 1,856 additions & 0 deletions notebooks/regression.ipynb

Large diffs are not rendered by default.

131 changes: 76 additions & 55 deletions notebooks/report_analysis.ipynb

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ dependencies = [
"ipykernel",
"shap",
"pre-commit",
"pydantic"
"pydantic",
"openpyxl"
]

[project.optional-dependencies]
Expand Down
9 changes: 9 additions & 0 deletions stocksense/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""Stocksense package for stock selection"""

try:
from importlib.metadata import version

__version__ = version("stocksense")
except ImportError:
# Package is not installed
__version__ = "1.0.0"
13 changes: 4 additions & 9 deletions stocksense/app/pages/analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import streamlit as st
from plotly.subplots import make_subplots

from stocksense.database_handler import DatabaseHandler
from stocksense.database import DatabaseHandler
from stocksense.pipeline import clean, engineer_features

pd.options.mode.chained_assignment = None # default='warn'
Expand All @@ -16,12 +16,7 @@
def list_stocks():
db = DatabaseHandler()
stocks = db.fetch_stock().to_pandas()
return sorted(
stocks.loc[
stocks.spx_status == 1, # noqa: E712
"tic",
].values.tolist()
)
return sorted(stocks.loc[stocks.date_removed.isnull()]["tic"].values.tolist())


def date_breaks(df, date_col="date"):
Expand Down Expand Up @@ -234,7 +229,7 @@ def plot_processed_data(df):

def main():
"""
Main app script.
Main analytics script.
"""

st.set_page_config(layout="wide", page_title="Stock Data Analytics", page_icon="📈")
Expand Down Expand Up @@ -280,7 +275,7 @@ def main():
st.session_state.page_subheader = f"{name} ({ticker})"

st.subheader(st.session_state.page_subheader)
st.markdown(f"**Last update**: {stock.loc[0, 'last_update']}")
st.markdown(f"**Last update**: {max_date}")

tab1, tab2, tab3, tab4, tab5 = st.tabs(
["Status", "Market", "Financials", "Insider Trading", "Feature Analysis"]
Expand Down
114 changes: 114 additions & 0 deletions stocksense/app/pages/insights.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
import datetime as dt
from pathlib import Path

import pandas as pd
import plotly.express as px
import streamlit as st

from stocksense.database import DatabaseHandler

REPORTS_DIR = Path(__file__).parents[3] / "reports"
SCORES_DIR = REPORTS_DIR / "scores"
PORTFOLIOS_DIR = REPORTS_DIR / "portfolios"


@st.cache_data(show_spinner="Loading stock data...", max_entries=10)
def load_stock_data():
db = DatabaseHandler()
return db.fetch_stock().to_pandas()


def get_available_dates():
"""
Get all available trade dates from score files.
"""
score_files = list(SCORES_DIR.glob("scores_*.csv"))
dates = [dt.datetime.strptime(f.stem.split("_")[1], "%Y-%m-%d").date() for f in score_files]
return sorted(dates, reverse=True)


def load_scores(trade_date):
"""
Load scores for a specific trade date.
"""
score_file = SCORES_DIR / f"scores_{trade_date}.csv"
if not score_file.exists():
st.error(f"No scores found for trade date {trade_date}")
return None
return pd.read_csv(score_file)


def plot_sector_distribution(portfolio_data):
"""
Plot sector distribution of selected stocks.
"""
sector_dist = portfolio_data.groupby("sector")["weight"].sum().reset_index()
fig = px.pie(
sector_dist,
values="weight",
names="sector",
title="Sector Distribution",
template="plotly_dark",
)
fig.update_traces(textposition="inside", textinfo="percent+label")
st.plotly_chart(fig, use_container_width=True)


def display_portfolio_metrics(portfolio_data):
"""
Display key portfolio metrics.
"""
total_stocks = len(portfolio_data)
avg_score = portfolio_data["pred"].mean()
avg_price = portfolio_data["adj_close"].mean()

col1, col2, col3 = st.columns(3)
with col1:
st.metric("Number of Stocks", total_stocks)
with col2:
st.metric("Average Model Score", f"{avg_score:.3f}")
with col3:
st.metric("Average Stock Price", f"${avg_price:.2f}")


def main():
"""Insights main script."""

st.set_page_config(layout="wide", page_title="Stock Picks", page_icon="🔮")
st.sidebar.title("Stocksense App")
st.sidebar.success("Select page")

st.sidebar.page_link("home.py", label="Home", icon="🏠")
st.sidebar.page_link("pages/overview.py", label="Market Overview", icon="🌎")
st.sidebar.page_link("pages/analytics.py", label="Stock Analytics", icon="📈")
st.sidebar.page_link("pages/insights.py", label="Stock Picks", icon="🔮")
st.sidebar.divider()

st.title("Stock Picks Insights")

stock_data = load_stock_data()
available_dates = get_available_dates()
trade_date = st.selectbox("Select Trade Date", available_dates)
scores = load_scores(trade_date)
scores = scores.join(stock_data, on="tic", rsuffix="_stock")

if scores is not None:
display_portfolio_metrics(scores)
plot_sector_distribution(scores)

st.subheader("Top 30 Selected Stocks")
columns_to_display = ["symbol", "company_name", "sector", "pred", "adj_close", "weight"]
formatted_scores = scores[columns_to_display].head(30)

formatted_scores["pred"] = formatted_scores["pred"].round(3)
formatted_scores["adj_close"] = formatted_scores["adj_close"].round(2)
formatted_scores["weight"] = (formatted_scores["weight"] * 100).round(2).astype(str) + "%"
formatted_scores.columns = ["Symbol", "Company", "Sector", "Score", "Price ($)", "Weight"]

st.dataframe(formatted_scores, use_container_width=True)
else:
st.warning("No data available for the selected date.")


if __name__ == "__main__":
main()
8 changes: 4 additions & 4 deletions stocksense/app/pages/overview.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import plotly.express as px
import streamlit as st

from stocksense.database_handler import DatabaseHandler
from stocksense.database import DatabaseHandler

pd.set_option("future.no_silent_downcasting", True)

Expand All @@ -20,7 +20,7 @@ def load_sp500_data():
"""
db = DatabaseHandler()
stocks = db.fetch_stock().to_pandas()
stocks = stocks.loc[stocks.spx_status == 1]
stocks = stocks.loc[stocks.date_removed.isnull()]

info = db.fetch_info().to_pandas()
stock_df = stocks.merge(info, how="left", on="tic")
Expand Down Expand Up @@ -60,7 +60,7 @@ def show_recent_earnings(data):
data : pd.DataFrame
Processed S&P 500 data.
"""
df = data.sort_values("rdq", ascending=False).head(10)
df = data.sort_values("rdq", ascending=False).head(15)
df = df[["tic", "rdq", "sector", "curr_price", "saleq", "surprise_pct"]]
st.dataframe(
df,
Expand Down Expand Up @@ -108,7 +108,7 @@ def show_market_summary(data):
with col2:
st.metric("Total Market Cap", summary["Total Market Cap"])
with col3:
st.metric("Average P/E", summary["Average P/E"])
st.metric("Average Trailing P/E", summary["Average P/E"])
with col4:
st.metric("Avg Target Upside", summary["Avg Target Upside"])

Expand Down
Loading
Loading