-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
15 fred sofr datasets #16
Conversation
# Conflicts: # audit_trail.csv
# Conflicts: # audit_trail.csv
WalkthroughThis pull request introduces significant modifications to the project's configuration and data processing workflow. The changes focus on enhancing data retrieval and management, particularly for financial data. The Changes
Sequence DiagramsequenceDiagram
participant Main as main.py
participant DataOps as data_operations.py
participant FRED as FRED API
participant GCP as Google Cloud Platform
Main->>DataOps: process_non_sofr_data()
DataOps->>FRED: Fetch financial data
FRED-->>DataOps: Return data
DataOps->>DataOps: Process and format data
DataOps->>GCP: Optional upload
Main->>DataOps: process_sofr_data()
DataOps->>FRED: Fetch SOFR data
FRED-->>DataOps: Return SOFR data
DataOps->>DataOps: Merge and process SOFR data
DataOps->>GCP: Optional upload
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (23)
audit_trail.csv
is excluded by!**/*.csv
data/cpi_aus/cpi_aus.pkl
is excluded by!**/*.pkl
data/cpi_can/cpi_can.pkl
is excluded by!**/*.pkl
data/cpi_chf/cpi_chf.pkl
is excluded by!**/*.pkl
data/cpi_deu/cpi_deu.pkl
is excluded by!**/*.pkl
data/cpi_eur/cpi_eur.pkl
is excluded by!**/*.pkl
data/cpi_gbr/cpi_gbr.pkl
is excluded by!**/*.pkl
data/cpi_jpn/cpi_jpn.pkl
is excluded by!**/*.pkl
data/cpi_usa/cpi_usa.pkl
is excluded by!**/*.pkl
data/dff/dff.pkl
is excluded by!**/*.pkl
data/frb_kc_lmci_monthly/frb_kc_lmci_monthly.pkl
is excluded by!**/*.pkl
data/michigan_csi_monthly/michigan_csi_monthly.pkl
is excluded by!**/*.pkl
data/rate_aus_3m_bank/rate_aus_3m_bank.pkl
is excluded by!**/*.pkl
data/rate_chf_3m_bank/rate_chf_3m_bank.pkl
is excluded by!**/*.pkl
data/rate_cnd_3m_bank/rate_cnd_3m_bank.pkl
is excluded by!**/*.pkl
data/rate_deu_3m_bank/rate_deu_3m_bank.pkl
is excluded by!**/*.pkl
data/rate_deu_lt_gov/rate_deu_lt_gov.pkl
is excluded by!**/*.pkl
data/rate_eur_3m_bank/rate_eur_3m_bank.pkl
is excluded by!**/*.pkl
data/rate_eur_lt_gov/rate_eur_lt_gov.pkl
is excluded by!**/*.pkl
data/rate_gbp_3m_bank/rate_gbp_3m_bank.pkl
is excluded by!**/*.pkl
data/rate_jpn_3m_bank/rate_jpn_3m_bank.pkl
is excluded by!**/*.pkl
data/sofr/combined_sofr_data.csv
is excluded by!**/*.csv
data/sofr/combined_sofr_data.pkl
is excluded by!**/*.pkl
📒 Files selected for processing (5)
.dockerignore
(1 hunks).gitignore
(0 hunks)config/settings.yml
(1 hunks)main.py
(1 hunks)src/data_operations.py
(1 hunks)
💤 Files with no reviewable changes (1)
- .gitignore
🧰 Additional context used
🪛 Ruff (0.8.2)
main.py
41-41: Use f-string instead of format
call
Convert to f-string
(UP032)
52-52: Missing return type annotation for public function main
Add return type annotation: None
(ANN201)
70-70: Missing return type annotation for public function push_to_github
Add return type annotation: None
(ANN201)
78-78: datetime.datetime.today()
used
(DTZ002)
src/data_operations.py
6-6: fredapi.Fred
imported but unused
Remove unused import: fredapi.Fred
(F401)
8-8: Missing return type annotation for public function process_non_sofr_data
(ANN201)
8-8: Missing type annotation for function argument data_map_dict
(ANN001)
8-8: Missing type annotation for function argument fred
(ANN001)
8-8: Missing type annotation for function argument col_date
(ANN001)
8-8: Missing type annotation for function argument dataPath
(ANN001)
8-8: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
8-8: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
8-8: Missing type annotation for function argument bucket
(ANN001)
22-22: Missing return type annotation for public function process_sofr_data
(ANN201)
22-22: Missing type annotation for function argument sofr_series
(ANN001)
22-22: Missing type annotation for function argument fred
(ANN001)
22-22: Missing type annotation for function argument col_date
(ANN001)
22-22: Missing type annotation for function argument dataPath
(ANN001)
22-22: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
22-22: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
22-22: Missing type annotation for function argument bucket
(ANN001)
38-38: Missing return type annotation for public function fetch_and_process_data
(ANN201)
38-38: Missing type annotation for function argument fred
(ANN001)
38-38: Missing type annotation for function argument data_info
(ANN001)
38-38: Missing type annotation for function argument col_date
(ANN001)
46-46: Missing return type annotation for public function save_data
Add return type annotation: None
(ANN201)
46-46: Missing type annotation for function argument data_df
(ANN001)
46-46: Missing type annotation for function argument data_type
(ANN001)
46-46: Missing type annotation for function argument dataPath
(ANN001)
46-46: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
54-54: Missing return type annotation for public function collect_audit_info
(ANN201)
54-54: Missing type annotation for function argument data_df
(ANN001)
54-54: Missing type annotation for function argument series_name
(ANN001)
54-54: Missing type annotation for function argument data_ref
(ANN001)
61-61: datetime.datetime.now()
called without a tz
argument
(DTZ005)
61-61: Trailing comma missing
Add trailing comma
(COM812)
64-64: Missing return type annotation for public function upload_to_gcp
Add return type annotation: None
(ANN201)
64-64: Missing type annotation for function argument data_df
(ANN001)
64-64: Missing type annotation for function argument data_type
(ANN001)
64-64: Missing type annotation for function argument bucket
(ANN001)
64-64: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
76-76: Missing return type annotation for public function update_sofr_data
(ANN201)
76-76: Missing type annotation for function argument sofr_data
(ANN001)
76-76: Missing type annotation for function argument new_data
(ANN001)
76-76: Missing type annotation for function argument col_date
(ANN001)
76-76: Missing type annotation for function argument series
(ANN001)
76-76: Unused function argument: series
(ARG001)
79-79: Unnecessary else
after return
statement
Remove unnecessary else
(RET505)
82-82: Missing return type annotation for public function save_combined_data
Add return type annotation: None
(ANN201)
82-82: Missing type annotation for function argument data
(ANN001)
82-82: Missing type annotation for function argument dataPath
(ANN001)
82-82: Missing type annotation for function argument filename
(ANN001)
82-82: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
90-90: Missing return type annotation for public function upload_combined_to_gcp
Add return type annotation: None
(ANN201)
90-90: Missing type annotation for function argument data
(ANN001)
90-90: Missing type annotation for function argument filename
(ANN001)
90-90: Missing type annotation for function argument bucket
(ANN001)
90-90: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
🔇 Additional comments (6)
src/data_operations.py (2)
38-45
: Ensure API usage is properly validated.
fetch_and_process_data
fetches time series data from the FRED API using data_info['data_ref']
. Consider adding checks for invalid or missing fields in data_info
, and optionally handle exceptions if the series is unavailable. Types for data_info
and the return type can further prevent confusion.
🧰 Tools
🪛 Ruff (0.8.2)
38-38: Missing return type annotation for public function fetch_and_process_data
(ANN201)
38-38: Missing type annotation for function argument fred
(ANN001)
38-38: Missing type annotation for function argument data_info
(ANN001)
38-38: Missing type annotation for function argument col_date
(ANN001)
64-75
: Add optional error-handling for GCP upload.
upload_to_gcp
performs multiple upload steps (CSV and optional pickle). Consider wrapping the upload steps in a try/except block or validating the bucket for null references, especially if deployed in environments where GCP credentials or bucket connectivity might fail.
🧰 Tools
🪛 Ruff (0.8.2)
64-64: Missing return type annotation for public function upload_to_gcp
Add return type annotation: None
(ANN201)
64-64: Missing type annotation for function argument data_df
(ANN001)
64-64: Missing type annotation for function argument data_type
(ANN001)
64-64: Missing type annotation for function argument bucket
(ANN001)
64-64: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
.dockerignore (1)
2-3
: Check the correctness of directory paths.
Ignoring archive
and the entire venv/
directory helps prevent large or sensitive files from being included in Docker images. Verify that these directory names match your actual directory structure, especially if you plan to rename or reorganize them in the future.
config/settings.yml (1)
96-110
: Great addition of SOFR data mappings.
These new entries for SOFR data references appear consistent with the FRED data references. Ensure that each series can be fetched successfully from FRED by verifying the correct series IDs (e.g., "SOFR", "SOFR30DAYAVG", etc.).
main.py (2)
19-21
: Foster consistent environment variable usage.
You’re already using dotenv.load_dotenv(".env")
. Double-check that all needed environment variables (like FRED_API_KEY
, GIT_TOKEN
) are present in .env
or in your deployment environment.
46-49
: 🧹 Nitpick (assertive)
Use f-string for readability.
When printing or logging your bucket name, consider an f-string for greater readability (per static analysis hint).
-print('Retrieved GCP bucket: {}'.format(bucket))
+print(f"Retrieved GCP bucket: {bucket}")
Likely invalid or redundant comment.
def process_non_sofr_data(data_map_dict, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket): | ||
audit_data = [] | ||
for data_type, data_info in data_map_dict.items(): | ||
if not data_type.startswith('SOFR'): | ||
data_df = fetch_and_process_data(fred, data_info, col_date) | ||
save_data(data_df, data_type, dataPath, SAVE_AS_PICKLE) | ||
audit_data.append(collect_audit_info(data_df, data_type, data_info['data_ref'])) | ||
if PUSH_TO_GCP: | ||
upload_to_gcp(data_df, data_type, bucket, SAVE_AS_PICKLE) | ||
print(f"# {data_type}: Updated") | ||
time.sleep(1) | ||
return audit_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Recommend adding type annotations for function arguments and return types.
process_non_sofr_data
is a public function, and it would improve readability and maintainability to declare the types of parameters like data_map_dict
, fred
, col_date
, etc. Additionally, clarify the return type (likely List[Dict[str, Any]]
or similar).
🧰 Tools
🪛 Ruff (0.8.2)
8-8: Missing return type annotation for public function process_non_sofr_data
(ANN201)
8-8: Missing type annotation for function argument data_map_dict
(ANN001)
8-8: Missing type annotation for function argument fred
(ANN001)
8-8: Missing type annotation for function argument col_date
(ANN001)
8-8: Missing type annotation for function argument dataPath
(ANN001)
8-8: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
8-8: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
8-8: Missing type annotation for function argument bucket
(ANN001)
def process_sofr_data(sofr_series, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket): | ||
sofr_data = pd.DataFrame() | ||
audit_data = [] | ||
for series in sofr_series: | ||
data_df = fetch_and_process_data(fred, {'data_ref': series}, col_date) | ||
sofr_data = update_sofr_data(sofr_data, data_df, col_date, series) | ||
audit_data.append(collect_audit_info(data_df, series, series)) | ||
print(f"# {series}: Updated") | ||
time.sleep(1) | ||
|
||
sofr_data = sofr_data.sort_values(col_date).ffill() | ||
save_combined_data(sofr_data, dataPath, "sofr/combined_sofr_data", SAVE_AS_PICKLE) | ||
if PUSH_TO_GCP: | ||
upload_combined_to_gcp(sofr_data, "sofr/sofr_data", bucket, SAVE_AS_PICKLE) | ||
return audit_data | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Use consistent naming and type annotations for public APIs.
Similar to process_non_sofr_data
, adding type annotations for process_sofr_data
parameters (e.g., sofr_series: List[str]
) and specifying a return type is recommended. This practice ensures better clarity in large codebases and fosters easier collaboration.
🧰 Tools
🪛 Ruff (0.8.2)
22-22: Missing return type annotation for public function process_sofr_data
(ANN201)
22-22: Missing type annotation for function argument sofr_series
(ANN001)
22-22: Missing type annotation for function argument fred
(ANN001)
22-22: Missing type annotation for function argument col_date
(ANN001)
22-22: Missing type annotation for function argument dataPath
(ANN001)
22-22: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
22-22: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
22-22: Missing type annotation for function argument bucket
(ANN001)
def collect_audit_info(data_df, series_name, data_ref): | ||
last_date = data_df[data_df.columns[0]].max() | ||
last_value = data_df.loc[data_df[data_df.columns[0]] == last_date, data_ref].values[0] | ||
return { | ||
"Series Name": series_name, | ||
"Last Date": last_date, | ||
"Last Value": last_value, | ||
"Last Request Datetime": datetime.now().strftime("%Y-%m-%d %H:%M:%S") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Specify a timezone or clarify local time usage.
When calling datetime.now()
, you may consider using a timezone-aware approach (e.g., datetime.now(timezone.utc)
) to avoid confusion in distributed systems. Additionally, a trailing comma after 'Last Request Datetime': ...
can help comply with style guidelines.
🧰 Tools
🪛 Ruff (0.8.2)
54-54: Missing return type annotation for public function collect_audit_info
(ANN201)
54-54: Missing type annotation for function argument data_df
(ANN001)
54-54: Missing type annotation for function argument series_name
(ANN001)
54-54: Missing type annotation for function argument data_ref
(ANN001)
61-61: datetime.datetime.now()
called without a tz
argument
(DTZ005)
61-61: Trailing comma missing
Add trailing comma
(COM812)
def save_combined_data(data, dataPath, filename, SAVE_AS_PICKLE): | ||
file_path = dataPath / filename | ||
file_path.parent.mkdir(parents=True, exist_ok=True) | ||
data.to_csv(f"{file_path}.csv", index=False) | ||
if SAVE_AS_PICKLE: | ||
with open(f"{file_path}.pkl", 'wb') as f: | ||
pickle.dump(data, f) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add docstring for save_combined_data
to clarify usage.
This function saves a combined dataset to CSV or pickle formats. A short docstring describing parameters (filename
, SAVE_AS_PICKLE
) and clarifying the default return of None
can make the function’s intent clearer.
🧰 Tools
🪛 Ruff (0.8.2)
82-82: Missing return type annotation for public function save_combined_data
Add return type annotation: None
(ANN201)
82-82: Missing type annotation for function argument data
(ANN001)
82-82: Missing type annotation for function argument dataPath
(ANN001)
82-82: Missing type annotation for function argument filename
(ANN001)
82-82: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
def upload_combined_to_gcp(data, filename, bucket, SAVE_AS_PICKLE): | ||
csv_buffer = StringIO() | ||
data.to_csv(csv_buffer, index=False) | ||
blob_name = f'{filename}.csv' | ||
blob = bucket.blob(blob_name) | ||
blob.upload_from_string(csv_buffer.getvalue(), content_type='text/csv') | ||
if SAVE_AS_PICKLE: | ||
pickle_buffer = pickle.dumps(data) | ||
pickle_blob_name = f'{filename}.pkl' | ||
pickle_blob = bucket.blob(pickle_blob_name) | ||
pickle_blob.upload_from_string(pickle_buffer, content_type='application/octet-stream') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add docstring and return type annotation.
Similar to other I/O functions, upload_combined_to_gcp
does not return a value. Indicating a None
return type and documenting the function helps maintain consistent style.
🧰 Tools
🪛 Ruff (0.8.2)
90-90: Missing return type annotation for public function upload_combined_to_gcp
Add return type annotation: None
(ANN201)
90-90: Missing type annotation for function argument data
(ANN001)
90-90: Missing type annotation for function argument filename
(ANN001)
90-90: Missing type annotation for function argument bucket
(ANN001)
90-90: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
def update_sofr_data(sofr_data, new_data, col_date, series): | ||
if sofr_data.empty: | ||
return new_data | ||
else: | ||
return pd.merge(sofr_data, new_data, on=col_date, how='outer') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Remove unnecessary else
block and consider using series
parameter or removing it.
- The
else
block at line 79 is not strictly needed afterreturn new_data
. - The
series
argument is currently unused. Remove it or integrate it (e.g., renaming columns or storing metadata).
def update_sofr_data(sofr_data, new_data, col_date, series):
if sofr_data.empty:
return new_data
- else:
- return pd.merge(sofr_data, new_data, on=col_date, how='outer')
+ return pd.merge(sofr_data, new_data, on=col_date, how='outer')
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
76-76: Missing return type annotation for public function update_sofr_data
(ANN201)
76-76: Missing type annotation for function argument sofr_data
(ANN001)
76-76: Missing type annotation for function argument new_data
(ANN001)
76-76: Missing type annotation for function argument col_date
(ANN001)
76-76: Missing type annotation for function argument series
(ANN001)
76-76: Unused function argument: series
(ARG001)
79-79: Unnecessary else
after return
statement
Remove unnecessary else
(RET505)
from pathlib import Path | ||
from github import Github | ||
from git import Repo | ||
from fredapi import Fred | ||
import dotenv | ||
|
||
from src.utils import load_config, get_gcp_bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consolidate imports if desired.
You’re importing from multiple libraries and modules. Consider grouping or sorting them if it helps maintain clarity. This is optional, as the current approach is still readable.
# Load configuration and setup | ||
config = load_config(path=configPath / "settings.yml") | ||
fred_api_key = os.environ.get("FRED_API_KEY") | ||
fred = Fred(api_key=fred_api_key) | ||
|
||
PUSH_TO_GITHUB = os.environ.get("PUSH_TO_GITHUB") != 'False' | ||
PUSH_TO_GCP = os.environ.get("PUSH_TO_GCP") != 'False' | ||
SAVE_AS_PICKLE = os.environ.get("SAVE_AS_PICKLE") != 'False' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Handle missing or invalid environment variables gracefully.
If FRED_API_KEY
or others are not set, the code may fail at runtime. Consider adding a short check that logs a warning or raises a descriptive error when critical variables are missing.
def main(): | ||
audit_data = [] | ||
|
||
# Process non-SOFR data | ||
audit_data.extend(process_non_sofr_data(data_map_dict, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket)) | ||
|
||
# Process SOFR data | ||
sofr_series = ['SOFR', 'SOFR30DAYAVG', 'SOFR90DAYAVG', 'SOFR180DAYAVG', 'SOFRINDEX'] | ||
audit_data.extend(process_sofr_data(sofr_series, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket)) | ||
|
||
# Create audit CSV | ||
audit_df = pd.DataFrame(audit_data) | ||
audit_df.to_csv("audit_trail.csv", index=False) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add a return type annotation to main()
.
This function does not return anything. Clarifying a None
return type or including a docstring can help. Additionally, this block is a clear setup for orchestrating the data processing tasks, so a docstring describing these tasks is beneficial.
🧰 Tools
🪛 Ruff (0.8.2)
52-52: Missing return type annotation for public function main
Add return type annotation: None
(ANN201)
def push_to_github(): | ||
repo_object = Repo('.') | ||
remote_url = f"https://{os.getenv('GIT_TOKEN')}@github.com/deerfieldgreen/FRED_data.git" | ||
# Make sure the remote URL is set correctly | ||
for remote in repo_object.remotes: | ||
remote.set_url(remote_url) | ||
|
||
|
||
git = repo_object.git | ||
git.add('--all') | ||
git.commit('-m', f"Updated Files for {datetime.today()}") | ||
git.pull('-s','ours') | ||
git.push() | ||
print("All changes pushed to GitHub in a single commit.") No newline at end of file | ||
print("All changes pushed to GitHub in a single commit.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add a return type annotation and potential error handling in push_to_github()
.
- The function appears to return
None
; clarifying that explicitly or in a docstring is beneficial. - Consider catching exceptions from the git push process, as merges or credential issues can arise.
🧰 Tools
🪛 Ruff (0.8.2)
70-70: Missing return type annotation for public function push_to_github
Add return type annotation: None
(ANN201)
78-78: datetime.datetime.today()
used
(DTZ002)
Added SOFR datasets
Summary by CodeRabbit
New Features
Bug Fixes
Refactor