-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
14 adding monetary supply us + uk #17
Conversation
WalkthroughThe pull request modifies the Changes
Possibly related PRs
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (9)
audit_trail.csv
is excluded by!**/*.csv
data/M1SL/M1SL.pkl
is excluded by!**/*.pkl
data/M1SL/data.csv
is excluded by!**/*.csv
data/M2SL/M2SL.pkl
is excluded by!**/*.pkl
data/M2SL/data.csv
is excluded by!**/*.csv
data/dff/data.csv
is excluded by!**/*.csv
data/dff/dff.pkl
is excluded by!**/*.pkl
data/sofr/combined_sofr_data.csv
is excluded by!**/*.csv
data/sofr/combined_sofr_data.pkl
is excluded by!**/*.pkl
📒 Files selected for processing (2)
config/settings.yml
(1 hunks)src/data_operations.py
(2 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
config/settings.yml
[error] 111-111: trailing spaces
(trailing-spaces)
🪛 Ruff (0.8.2)
src/data_operations.py
46-46: Missing return type annotation for public function process_m1m2_data
(ANN201)
46-46: Missing type annotation for function argument fred
(ANN001)
46-46: Missing type annotation for function argument col_date
(ANN001)
46-46: Missing type annotation for function argument dataPath
(ANN001)
46-46: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
46-46: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
46-46: Missing type annotation for function argument bucket
(ANN001)
72-72: datetime.datetime.now()
called without a tz
argument
(DTZ005)
72-72: Trailing comma missing
Add trailing comma
(COM812)
🔇 Additional comments (1)
src/data_operations.py (1)
11-19
: Conditional check for enabled data series looks good
This change correctly utilizes the enabled
key for conditional data processing.
src/data_operations.py
Outdated
# Handle single data reference | ||
data_df = fetch_and_process_data(fred, data_info, col_date) | ||
save_data(data_df, data_type, dataPath, SAVE_AS_PICKLE) | ||
audit_data.append(collect_audit_info(data_df, data_type, data_info['data_ref'])) | ||
if PUSH_TO_GCP: | ||
upload_to_gcp(data_df, data_type, bucket, SAVE_AS_PICKLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider error handling for single data references
When fetching and saving data, consider handling exceptions (e.g., connection failures, empty data) to enhance resilience.
src/data_operations.py
Outdated
def process_m1m2_data(fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket): | ||
series = ['M1SL', 'M2SL'] | ||
combined_data = pd.DataFrame() | ||
|
||
for series_id in series: | ||
data = fred.get_series(series_id) | ||
df = pd.DataFrame(data, columns=[series_id]) | ||
df.index.name = col_date | ||
df.reset_index(inplace=True) | ||
|
||
if combined_data.empty: | ||
combined_data = df | ||
else: | ||
combined_data = pd.merge(combined_data, df, on=col_date, how='outer') | ||
|
||
# Forward fill NA values | ||
combined_data = combined_data.ffill() | ||
|
||
save_combined_data(combined_data, dataPath, "m1m2/combined_m1m2_data", SAVE_AS_PICKLE) | ||
if PUSH_TO_GCP: | ||
upload_combined_to_gcp(combined_data, "m1m2/m1m2_data", bucket, SAVE_AS_PICKLE) | ||
|
||
return [{ | ||
"Series Name": "M1M2_Combined", | ||
"Last Date": combined_data[col_date].max(), | ||
"Last Value": f"M1: {combined_data['M1SL'].iloc[-1]:.2f}, M2: {combined_data['M2SL'].iloc[-1]:.2f}", | ||
"Last Request Datetime": datetime.now().strftime("%Y-%m-%d %H:%M:%S") | ||
}] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add type annotations and use a timezone-aware datetime
- Adding type hints for the function signature improves code clarity and maintainability.
- For reliable timestamping, use a timezone-aware datetime (e.g.,
datetime.now(timezone.utc)
). - A trailing comma at the end of the dictionary item on line 72 can improve readability.
def process_m1m2_data(
- fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket
+ fred: Fred,
+ col_date: str,
+ dataPath: Path,
+ SAVE_AS_PICKLE: bool,
+ PUSH_TO_GCP: bool,
+ bucket, # add concrete type if available
) -> List[Dict[str, Any]]:
series = ['M1SL', 'M2SL']
...
return [{
"Series Name": "M1M2_Combined",
"Last Date": combined_data[col_date].max(),
"Last Value": f"M1: {combined_data['M1SL'].iloc[-1]:.2f}, M2: {combined_data['M2SL'].iloc[-1]:.2f}",
- "Last Request Datetime": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+ "Last Request Datetime": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S"),
}]
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
46-46: Missing return type annotation for public function process_m1m2_data
(ANN201)
46-46: Missing type annotation for function argument fred
(ANN001)
46-46: Missing type annotation for function argument col_date
(ANN001)
46-46: Missing type annotation for function argument dataPath
(ANN001)
46-46: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
46-46: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
46-46: Missing type annotation for function argument bucket
(ANN001)
72-72: datetime.datetime.now()
called without a tz
argument
(DTZ005)
72-72: Trailing comma missing
Add trailing comma
(COM812)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would the comma at the end improve readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AmazingAbhi, a trailing comma in a function call, dictionary, or list can make the code more readable by clearly separating items on different lines. If you later add new fields or arguments, the diff tends to be cleaner because only the new line changes rather than also having to add a comma to the previously last line. This style can help keep version control diffs minimal and improve clarity when visually parsing multi-line data structures or function parameters.
config/settings.yml
Outdated
|
||
M1SL: | ||
data_ref: "M1SL" | ||
data_source: "FRED" | ||
M2SL: | ||
data_ref: "M2SL" | ||
data_source: "FRED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Remove trailing spaces for YAML compliance
YAML parsers can be sensitive to trailing spaces. Remove them to avoid lint errors.
-
+
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
M1SL: | |
data_ref: "M1SL" | |
data_source: "FRED" | |
M2SL: | |
data_ref: "M2SL" | |
data_source: "FRED" | |
M1SL: | |
data_ref: "M1SL" | |
data_source: "FRED" | |
M2SL: | |
data_ref: "M2SL" | |
data_source: "FRED" |
🧰 Tools
🪛 yamllint (1.35.1)
[error] 111-111: trailing spaces
(trailing-spaces)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (6)
audit_trail.csv
is excluded by!**/*.csv
data/M1M2_combined/M1M2_combined.pkl
is excluded by!**/*.pkl
data/M1M2_combined/data.csv
is excluded by!**/*.csv
data/SOFR_combined/SOFR_combined.pkl
is excluded by!**/*.pkl
data/SOFR_combined/data.csv
is excluded by!**/*.csv
data/sofr/combined_sofr_data.pkl
is excluded by!**/*.pkl
📒 Files selected for processing (4)
config/settings.yml
(1 hunks)main.py
(3 hunks)src/data_operations.py
(2 hunks)src/utils.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/utils.py
12-12: Missing return type annotation for public function setup_logging
(ANN201)
main.py
21-21: Missing return type annotation for public function main
Add return type annotation: None
(ANN201)
44-44: Use f-string instead of format
call
Convert to f-string
(UP032)
44-44: Logging statement uses str.format
(G001)
src/data_operations.py
8-8: Missing return type annotation for public function process_data
(ANN201)
8-8: Missing type annotation for function argument data_map_dict
(ANN001)
8-8: Missing type annotation for function argument fred
(ANN001)
8-8: Missing type annotation for function argument col_date
(ANN001)
8-8: Missing type annotation for function argument dataPath
(ANN001)
8-8: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
8-8: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
8-8: Missing type annotation for function argument bucket
(ANN001)
20-20: Missing return type annotation for public function process_single_dataset
(ANN201)
20-20: Missing type annotation for function argument data_type
(ANN001)
20-20: Missing type annotation for function argument data_info
(ANN001)
20-20: Missing type annotation for function argument fred
(ANN001)
20-20: Missing type annotation for function argument col_date
(ANN001)
20-20: Missing type annotation for function argument dataPath
(ANN001)
20-20: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
20-20: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
20-20: Missing type annotation for function argument bucket
(ANN001)
27-27: Missing return type annotation for public function process_combined_dataset
(ANN201)
27-27: Missing type annotation for function argument data_type
(ANN001)
27-27: Missing type annotation for function argument data_info
(ANN001)
27-27: Missing type annotation for function argument fred
(ANN001)
27-27: Missing type annotation for function argument col_date
(ANN001)
27-27: Missing type annotation for function argument dataPath
(ANN001)
27-27: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
27-27: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
27-27: Missing type annotation for function argument bucket
(ANN001)
49-49: datetime.datetime.now()
called without a tz
argument
(DTZ005)
49-49: Trailing comma missing
Add trailing comma
(COM812)
🪛 yamllint (1.35.1)
config/settings.yml
[error] 10-10: trailing spaces
(trailing-spaces)
[error] 15-15: trailing spaces
(trailing-spaces)
[error] 21-21: trailing spaces
(trailing-spaces)
[error] 27-27: trailing spaces
(trailing-spaces)
[error] 32-32: trailing spaces
(trailing-spaces)
[error] 38-38: trailing spaces
(trailing-spaces)
[error] 43-43: trailing spaces
(trailing-spaces)
[error] 48-48: trailing spaces
(trailing-spaces)
[error] 49-49: trailing spaces
(trailing-spaces)
[error] 54-54: trailing spaces
(trailing-spaces)
[error] 59-59: trailing spaces
(trailing-spaces)
[error] 60-60: trailing spaces
(trailing-spaces)
[error] 65-65: trailing spaces
(trailing-spaces)
[error] 70-70: trailing spaces
(trailing-spaces)
[error] 76-76: trailing spaces
(trailing-spaces)
[error] 82-82: trailing spaces
(trailing-spaces)
[error] 87-87: trailing spaces
(trailing-spaces)
[error] 104-104: trailing spaces
(trailing-spaces)
[error] 125-125: trailing spaces
(trailing-spaces)
🔇 Additional comments (4)
src/data_operations.py (4)
20-25
: Duplicate Comment: Consider error handling for single data references.
When fetching and saving data, handling exceptions (e.g., connection failures, empty data) can enhance resilience.
🧰 Tools
🪛 Ruff (0.8.2)
20-20: Missing return type annotation for public function process_single_dataset
(ANN201)
20-20: Missing type annotation for function argument data_type
(ANN001)
20-20: Missing type annotation for function argument data_info
(ANN001)
20-20: Missing type annotation for function argument fred
(ANN001)
20-20: Missing type annotation for function argument col_date
(ANN001)
20-20: Missing type annotation for function argument dataPath
(ANN001)
20-20: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
20-20: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
20-20: Missing type annotation for function argument bucket
(ANN001)
20-20
: 🧹 Nitpick (assertive)
Add type annotations for process_single_dataset
.
Similar to process_data
, consider adding type hints for parameters and return types.
- def process_single_dataset(data_type, data_info, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket):
+ from typing import Dict, Any, List
+ def process_single_dataset(
+ data_type: str,
+ data_info: Dict[str, Any],
+ fred: Fred,
+ col_date: str,
+ dataPath: Path,
+ SAVE_AS_PICKLE: bool,
+ PUSH_TO_GCP: bool,
+ bucket
+ ) -> List[Dict[str, Any]]:
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
20-20: Missing return type annotation for public function process_single_dataset
(ANN201)
20-20: Missing type annotation for function argument data_type
(ANN001)
20-20: Missing type annotation for function argument data_info
(ANN001)
20-20: Missing type annotation for function argument fred
(ANN001)
20-20: Missing type annotation for function argument col_date
(ANN001)
20-20: Missing type annotation for function argument dataPath
(ANN001)
20-20: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
20-20: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
20-20: Missing type annotation for function argument bucket
(ANN001)
8-8
: 🧹 Nitpick (assertive)
Add type annotations to improve clarity and maintainability.
The function process_data
lacks return type and parameter type hints. Consider adding them for better self-documentation:
- def process_data(data_map_dict, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket):
+ from typing import Dict, Any, List
+ def process_data(
+ data_map_dict: Dict[str, Any],
+ fred: Fred,
+ col_date: str,
+ dataPath: Path,
+ SAVE_AS_PICKLE: bool,
+ PUSH_TO_GCP: bool,
+ bucket
+ ) -> List[Dict[str, Any]]:
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
8-8: Missing return type annotation for public function process_data
(ANN201)
8-8: Missing type annotation for function argument data_map_dict
(ANN001)
8-8: Missing type annotation for function argument fred
(ANN001)
8-8: Missing type annotation for function argument col_date
(ANN001)
8-8: Missing type annotation for function argument dataPath
(ANN001)
8-8: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
8-8: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
8-8: Missing type annotation for function argument bucket
(ANN001)
49-49
: 🧹 Nitpick (assertive)
Use a timezone-aware datetime and add a trailing comma.
To ensure consistent timestamps, especially in distributed systems, it’s best to use a timezone, e.g. UTC. Also, add a trailing comma to comply with linting recommendations.
- "Last Request Datetime": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+ from datetime import datetime, timezone
+ "Last Request Datetime": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S"),
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
49-49: datetime.datetime.now()
called without a tz
argument
(DTZ005)
49-49: Trailing comma missing
Add trailing comma
(COM812)
dataPath = projectPath / 'data' | ||
configPath = projectPath / 'config' | ||
credsPath = projectPath / 'credentials' | ||
def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add return type annotation for main
.
It is recommended to explicitly declare the return type for public functions. For example:
- def main():
+ def main() -> None:
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def main(): | |
def main() -> None: |
🧰 Tools
🪛 Ruff (0.8.2)
21-21: Missing return type annotation for public function main
Add return type annotation: None
(ANN201)
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
src/data_operations.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/data_operations.py
8-8: Missing return type annotation for public function process_data
(ANN201)
8-8: Missing type annotation for function argument data_map_dict
(ANN001)
8-8: Missing type annotation for function argument fred
(ANN001)
8-8: Missing type annotation for function argument col_date
(ANN001)
8-8: Missing type annotation for function argument dataPath
(ANN001)
8-8: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
8-8: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
8-8: Missing type annotation for function argument bucket
(ANN001)
20-20: Missing return type annotation for public function process_single_dataset
(ANN201)
20-20: Missing type annotation for function argument data_type
(ANN001)
20-20: Missing type annotation for function argument data_info
(ANN001)
20-20: Missing type annotation for function argument fred
(ANN001)
20-20: Missing type annotation for function argument col_date
(ANN001)
20-20: Missing type annotation for function argument dataPath
(ANN001)
20-20: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
20-20: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
20-20: Missing type annotation for function argument bucket
(ANN001)
27-27: Module level import not at top of file
(E402)
27-27: typing.Dict
is deprecated, use dict
instead
(UP035)
27-27: typing.List
is deprecated, use list
instead
(UP035)
30-30: Use dict
instead of Dict
for type annotation
Replace with dict
(UP006)
33-33: Undefined name Path
(F821)
34-34: Boolean-typed positional argument in function definition
(FBT001)
35-35: Boolean-typed positional argument in function definition
(FBT001)
36-36: Missing type annotation for function argument bucket
(ANN001)
36-36: Trailing comma missing
Add trailing comma
(COM812)
37-37: Use list
instead of List
for type annotation
Replace with list
(UP006)
37-37: Use dict
instead of Dict
for type annotation
Replace with dict
(UP006)
59-59: datetime.datetime.now()
called without a tz
argument
(DTZ005)
59-59: Trailing comma missing
Add trailing comma
(COM812)
🔇 Additional comments (1)
src/data_operations.py (1)
98-98
: Consider adding error handling for GCP upload.
Network or permission failures during blob upload can cause runtime exceptions. A try-except or a robust retry approach will enhance reliability.
def process_data(data_map_dict, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket): | ||
audit_data = [] | ||
for data_type, data_info in data_map_dict.items(): | ||
if not data_type.startswith('SOFR'): | ||
data_df = fetch_and_process_data(fred, data_info, col_date) | ||
save_data(data_df, data_type, dataPath, SAVE_AS_PICKLE) | ||
audit_data.append(collect_audit_info(data_df, data_type, data_info['data_ref'])) | ||
if PUSH_TO_GCP: | ||
upload_to_gcp(data_df, data_type, bucket, SAVE_AS_PICKLE) | ||
if data_info.get('enabled', True): | ||
if isinstance(data_info['data_ref'], list): | ||
audit_data.extend(process_combined_dataset(data_type, data_info, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket)) | ||
else: | ||
audit_data.extend(process_single_dataset(data_type, data_info, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add missing type annotations and handle potential data_ref
key errors.
The function process_data
lacks type hints for parameters (data_map_dict
, fred
, col_date
, dataPath
, SAVE_AS_PICKLE
, PUSH_TO_GCP
, bucket
) and a return type annotation. Consider adding them for clarity. Also, referencing data_info['data_ref']
directly may raise a KeyError
if data_ref
is absent in data_info
; you may want to validate the key’s existence beforehand.
🧰 Tools
🪛 Ruff (0.8.2)
8-8: Missing return type annotation for public function process_data
(ANN201)
8-8: Missing type annotation for function argument data_map_dict
(ANN001)
8-8: Missing type annotation for function argument fred
(ANN001)
8-8: Missing type annotation for function argument col_date
(ANN001)
8-8: Missing type annotation for function argument dataPath
(ANN001)
8-8: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
8-8: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
8-8: Missing type annotation for function argument bucket
(ANN001)
def process_single_dataset(data_type, data_info, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket): | ||
data_df = fetch_and_process_data(fred, data_info, col_date) | ||
save_data(data_df, data_type, dataPath, SAVE_AS_PICKLE) | ||
if PUSH_TO_GCP: | ||
upload_to_gcp(data_df, data_type, bucket, SAVE_AS_PICKLE) | ||
return [collect_audit_info(data_df, data_type, data_info['data_ref'])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider exception handling in process_single_dataset
.
Currently, if fetch_and_process_data
raises an exception (e.g., due to network, empty data, or FRED-specific issues), it might bubble up unhandled. Add try-except blocks or other resilience measures to ensure the process continues gracefully in error scenarios.
Additionally, add missing type annotations to align with best practices and maintain consistency throughout the codebase.
🧰 Tools
🪛 Ruff (0.8.2)
20-20: Missing return type annotation for public function process_single_dataset
(ANN201)
20-20: Missing type annotation for function argument data_type
(ANN001)
20-20: Missing type annotation for function argument data_info
(ANN001)
20-20: Missing type annotation for function argument fred
(ANN001)
20-20: Missing type annotation for function argument col_date
(ANN001)
20-20: Missing type annotation for function argument dataPath
(ANN001)
20-20: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
20-20: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
20-20: Missing type annotation for function argument bucket
(ANN001)
src/data_operations.py
Outdated
|
||
sofr_data = sofr_data.sort_values(col_date).ffill() | ||
save_combined_data(sofr_data, dataPath, "sofr/combined_sofr_data", SAVE_AS_PICKLE) | ||
from typing import Dict, Any, List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Re-locate and update import statements.
Placing from typing import Dict, Any, List
mid-file violates the recommended top-level import structure and references outdated generics. If you are using Python 3.9+, consider using the new built-in dict
, list
, and Any
types.
🧰 Tools
🪛 Ruff (0.8.2)
27-27: Module level import not at top of file
(E402)
27-27: typing.Dict
is deprecated, use dict
instead
(UP035)
27-27: typing.List
is deprecated, use list
instead
(UP035)
"Series Name": f"{data_type}_Combined", | ||
"Last Date": combined_data[col_date].max(), | ||
"Last Value": ", ".join([f"{series}: {combined_data[series].iloc[-1]:.2f}" for series in data_info['data_ref']]), | ||
"Last Request Datetime": datetime.now().strftime("%Y-%m-%d %H:%M:%S") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Use timezone-aware datetimes.
datetime.now()
produces a naive datetime object. Consider adding a timezone argument (datetime.now(timezone.utc)
) to promote consistency when storing and comparing timestamps across different environments.
🧰 Tools
🪛 Ruff (0.8.2)
59-59: datetime.datetime.now()
called without a tz
argument
(DTZ005)
59-59: Trailing comma missing
Add trailing comma
(COM812)
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…m/deerfieldgreen/FRED_data into 14-adding-monetary-supply-us-+-uk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
config/settings.yml
(1 hunks)src/data_operations.py
(2 hunks)src/utils.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/data_operations.py
10-10: typing.Dict
is deprecated, use dict
instead
(UP035)
10-10: typing.List
is deprecated, use list
instead
(UP035)
13-13: Missing return type annotation for public function process_data
(ANN201)
13-13: Missing type annotation for function argument data_map_dict
(ANN001)
13-13: Missing type annotation for function argument fred
(ANN001)
13-13: Missing type annotation for function argument col_date
(ANN001)
13-13: Missing type annotation for function argument dataPath
(ANN001)
13-13: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
13-13: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
13-13: Missing type annotation for function argument bucket
(ANN001)
25-25: Missing return type annotation for public function process_single_dataset
(ANN201)
25-25: Missing type annotation for function argument data_type
(ANN001)
25-25: Missing type annotation for function argument data_info
(ANN001)
25-25: Missing type annotation for function argument fred
(ANN001)
25-25: Missing type annotation for function argument col_date
(ANN001)
25-25: Missing type annotation for function argument dataPath
(ANN001)
25-25: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
25-25: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
25-25: Missing type annotation for function argument bucket
(ANN001)
34-34: Use dict
instead of Dict
for type annotation
Replace with dict
(UP006)
35-35: Undefined name Fred
(F821)
37-37: Undefined name Path
(F821)
38-38: Boolean-typed positional argument in function definition
(FBT001)
39-39: Boolean-typed positional argument in function definition
(FBT001)
40-40: Missing type annotation for function argument bucket
(ANN001)
40-40: Trailing comma missing
Add trailing comma
(COM812)
41-41: Use list
instead of List
for type annotation
Replace with list
(UP006)
41-41: Use dict
instead of Dict
for type annotation
Replace with dict
(UP006)
63-63: datetime.datetime.now()
called without a tz
argument
(DTZ005)
63-63: Trailing comma missing
Add trailing comma
(COM812)
94-94: Missing return type annotation for public function upload_to_gcp
Add return type annotation: None
(ANN201)
94-94: Missing type annotation for function argument data_df
(ANN001)
94-94: Missing type annotation for function argument data_type
(ANN001)
94-94: Missing type annotation for function argument bucket
(ANN001)
94-94: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
104-104: Logging statement uses f-string
(G004)
112-112: Logging statement uses f-string
(G004)
115-115: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
115-115: Logging statement uses f-string
(G004)
115-115: Use explicit conversion flag
Replace with conversion flag
(RUF010)
119-119: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
119-119: Logging statement uses f-string
(G004)
119-119: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🔇 Additional comments (2)
config/settings.yml (1)
4-119
: LGTM! Well-structured configuration changes.
The changes improve the configuration by:
- Adding control flags for selective processing
- Consolidating related data series (SOFR, M1M2)
- Maintaining consistent structure across entries
src/data_operations.py (1)
25-30
:
Add error handling and type hints.
The function needs error handling and type annotations:
+from typing import Any, list, dict
+from fredapi import Fred
+from pathlib import Path
-def process_single_dataset(data_type, data_info, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket):
+def process_single_dataset(
+ data_type: str,
+ data_info: dict[str, Any],
+ fred: Fred,
+ col_date: str,
+ dataPath: Path,
+ SAVE_AS_PICKLE: bool,
+ PUSH_TO_GCP: bool,
+ bucket: Any
+) -> list[dict[str, Any]]:
+ try:
data_df = fetch_and_process_data(fred, data_info, col_date)
save_data(data_df, data_type, dataPath, SAVE_AS_PICKLE)
if PUSH_TO_GCP:
upload_to_gcp(data_df, data_type, bucket, SAVE_AS_PICKLE)
return [collect_audit_info(data_df, data_type, data_info['data_ref'])]
+ except Exception as e:
+ logging.error(f"Failed to process dataset {data_type}: {str(e)}")
+ return []
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
25-25: Missing return type annotation for public function process_single_dataset
(ANN201)
25-25: Missing type annotation for function argument data_type
(ANN001)
25-25: Missing type annotation for function argument data_info
(ANN001)
25-25: Missing type annotation for function argument fred
(ANN001)
25-25: Missing type annotation for function argument col_date
(ANN001)
25-25: Missing type annotation for function argument dataPath
(ANN001)
25-25: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
25-25: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
25-25: Missing type annotation for function argument bucket
(ANN001)
def setup_logging() -> logging.Logger: | ||
logging.basicConfig(level=logging.INFO, | ||
format='%(asctime)s - %(levelname)s - %(message)s', | ||
datefmt='%Y-%m-%d %H:%M:%S') | ||
return logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider enhancing logging configuration robustness.
The logging setup looks good, but consider these improvements:
- Add
force=True
tobasicConfig
to ensure configuration is applied even if logging was previously configured - Consider adding a file handler for persistent logging
def setup_logging() -> logging.Logger:
- logging.basicConfig(level=logging.INFO,
- format='%(asctime)s - %(levelname)s - %(message)s',
- datefmt='%Y-%m-%d %H:%M:%S')
+ logging.basicConfig(
+ level=logging.INFO,
+ format='%(asctime)s - %(levelname)s - %(message)s',
+ datefmt='%Y-%m-%d %H:%M:%S',
+ force=True
+ )
return logging.getLogger(__name__)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def setup_logging() -> logging.Logger: | |
logging.basicConfig(level=logging.INFO, | |
format='%(asctime)s - %(levelname)s - %(message)s', | |
datefmt='%Y-%m-%d %H:%M:%S') | |
return logging.getLogger(__name__) | |
def setup_logging() -> logging.Logger: | |
logging.basicConfig( | |
level=logging.INFO, | |
format='%(asctime)s - %(levelname)s - %(message)s', | |
datefmt='%Y-%m-%d %H:%M:%S', | |
force=True | |
) | |
return logging.getLogger(__name__) |
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10)) | ||
def upload_to_gcp(data_df, data_type, bucket, SAVE_AS_PICKLE): | ||
csv_buffer = StringIO() | ||
data_df.to_csv(csv_buffer, index=False) | ||
blob_name = f'{data_type.lower()}.csv' | ||
blob = bucket.blob(blob_name) | ||
blob.upload_from_string(csv_buffer.getvalue(), content_type='text/csv') | ||
if SAVE_AS_PICKLE: | ||
pickle_buffer = pickle.dumps(data_df) | ||
pickle_blob_name = f'{data_type.lower()}.pkl' | ||
pickle_blob = bucket.blob(pickle_blob_name) | ||
pickle_blob.upload_from_string(pickle_buffer, content_type='application/octet-stream') | ||
logger = logging.getLogger(__name__) | ||
|
||
def update_sofr_data(sofr_data, new_data, col_date, series): | ||
if sofr_data.empty: | ||
return new_data | ||
else: | ||
return pd.merge(sofr_data, new_data, on=col_date, how='outer') | ||
try: | ||
# Upload CSV | ||
csv_buffer = StringIO() | ||
data_df.to_csv(csv_buffer, index=False) | ||
blob_name = f'{data_type.lower()}.csv' | ||
blob = bucket.blob(blob_name) | ||
blob.upload_from_string(csv_buffer.getvalue(), content_type='text/csv') | ||
logger.info(f"Successfully uploaded {blob_name} to GCP") | ||
|
||
def save_combined_data(data, dataPath, filename, SAVE_AS_PICKLE): | ||
file_path = dataPath / filename | ||
file_path.parent.mkdir(parents=True, exist_ok=True) | ||
data.to_csv(f"{file_path}.csv", index=False) | ||
if SAVE_AS_PICKLE: | ||
with open(f"{file_path}.pkl", 'wb') as f: | ||
pickle.dump(data, f) | ||
# Upload pickle if enabled | ||
if SAVE_AS_PICKLE: | ||
pickle_buffer = pickle.dumps(data_df) | ||
pickle_blob_name = f'{data_type.lower()}.pkl' | ||
pickle_blob = bucket.blob(pickle_blob_name) | ||
pickle_blob.upload_from_string(pickle_buffer, content_type='application/octet-stream') | ||
logger.info(f"Successfully uploaded {pickle_blob_name} to GCP") | ||
|
||
def upload_combined_to_gcp(data, filename, bucket, SAVE_AS_PICKLE): | ||
csv_buffer = StringIO() | ||
data.to_csv(csv_buffer, index=False) | ||
blob_name = f'{filename}.csv' | ||
blob = bucket.blob(blob_name) | ||
blob.upload_from_string(csv_buffer.getvalue(), content_type='text/csv') | ||
if SAVE_AS_PICKLE: | ||
pickle_buffer = pickle.dumps(data) | ||
pickle_blob_name = f'{filename}.pkl' | ||
pickle_blob = bucket.blob(pickle_blob_name) | ||
pickle_blob.upload_from_string(pickle_buffer, content_type='application/octet-stream') | ||
except exceptions.GoogleCloudError as e: | ||
logger.error(f"GCP upload failed for {data_type}: {str(e)}") | ||
raise # Re-raise the exception to trigger a retry | ||
|
||
except Exception as e: | ||
logger.error(f"Unexpected error during GCP upload for {data_type}: {str(e)}") | ||
raise # Re-raise the exception to trigger a retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Enhance logging messages and add type hints.
The retry mechanism and error handling look good. Consider these improvements:
- Add type hints
- Use logging.exception for better stack traces
- Add more context to success messages
+from pandas import DataFrame
+from google.cloud.storage import Bucket
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
-def upload_to_gcp(data_df, data_type, bucket, SAVE_AS_PICKLE):
+def upload_to_gcp(data_df: DataFrame, data_type: str, bucket: Bucket, SAVE_AS_PICKLE: bool) -> None:
logger = logging.getLogger(__name__)
try:
# Upload CSV
csv_buffer = StringIO()
data_df.to_csv(csv_buffer, index=False)
blob_name = f'{data_type.lower()}.csv'
blob = bucket.blob(blob_name)
blob.upload_from_string(csv_buffer.getvalue(), content_type='text/csv')
- logger.info(f"Successfully uploaded {blob_name} to GCP")
+ logger.info("Successfully uploaded %s to GCP (size: %d bytes)", blob_name, len(csv_buffer.getvalue()))
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
94-94: Missing return type annotation for public function upload_to_gcp
Add return type annotation: None
(ANN201)
94-94: Missing type annotation for function argument data_df
(ANN001)
94-94: Missing type annotation for function argument data_type
(ANN001)
94-94: Missing type annotation for function argument bucket
(ANN001)
94-94: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
104-104: Logging statement uses f-string
(G004)
112-112: Logging statement uses f-string
(G004)
115-115: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
115-115: Logging statement uses f-string
(G004)
115-115: Use explicit conversion flag
Replace with conversion flag
(RUF010)
119-119: Use logging.exception
instead of logging.error
Replace with exception
(TRY400)
119-119: Logging statement uses f-string
(G004)
119-119: Use explicit conversion flag
Replace with conversion flag
(RUF010)
def process_data(data_map_dict, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket): | ||
audit_data = [] | ||
for data_type, data_info in data_map_dict.items(): | ||
if not data_type.startswith('SOFR'): | ||
data_df = fetch_and_process_data(fred, data_info, col_date) | ||
save_data(data_df, data_type, dataPath, SAVE_AS_PICKLE) | ||
audit_data.append(collect_audit_info(data_df, data_type, data_info['data_ref'])) | ||
if PUSH_TO_GCP: | ||
upload_to_gcp(data_df, data_type, bucket, SAVE_AS_PICKLE) | ||
if data_info.get('enabled', True): | ||
if isinstance(data_info['data_ref'], list): | ||
audit_data.extend(process_combined_dataset(data_type, data_info, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket)) | ||
else: | ||
audit_data.extend(process_single_dataset(data_type, data_info, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket)) | ||
print(f"# {data_type}: Updated") | ||
time.sleep(1) | ||
return audit_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add type hints and consider configuration improvements.
The function logic looks good but needs some enhancements:
+from typing import Any, list, dict
+from fredapi import Fred
+from pathlib import Path
-def process_data(data_map_dict, fred, col_date, dataPath, SAVE_AS_PICKLE, PUSH_TO_GCP, bucket):
+def process_data(
+ data_map_dict: dict[str, dict[str, Any]],
+ fred: Fred,
+ col_date: str,
+ dataPath: Path,
+ SAVE_AS_PICKLE: bool,
+ PUSH_TO_GCP: bool,
+ bucket: Any
+) -> list[dict[str, Any]]:
Also consider:
- Making the sleep duration configurable
- Defaulting
enabled
toFalse
for safer behavior
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
13-13: Missing return type annotation for public function process_data
(ANN201)
13-13: Missing type annotation for function argument data_map_dict
(ANN001)
13-13: Missing type annotation for function argument fred
(ANN001)
13-13: Missing type annotation for function argument col_date
(ANN001)
13-13: Missing type annotation for function argument dataPath
(ANN001)
13-13: Missing type annotation for function argument SAVE_AS_PICKLE
(ANN001)
13-13: Missing type annotation for function argument PUSH_TO_GCP
(ANN001)
13-13: Missing type annotation for function argument bucket
(ANN001)
def process_combined_dataset( | ||
data_type: str, | ||
data_info: Dict[str, Any], | ||
fred: Fred, | ||
col_date: str, | ||
dataPath: Path, | ||
SAVE_AS_PICKLE: bool, | ||
PUSH_TO_GCP: bool, | ||
bucket | ||
) -> List[Dict[str, Any]]: | ||
combined_data = pd.DataFrame() | ||
for series_id in data_info['data_ref']: | ||
data = fred.get_series(series_id) | ||
df = pd.DataFrame(data, columns=[series_id]) | ||
df.index.name = col_date | ||
df.reset_index(inplace=True) | ||
|
||
if combined_data.empty: | ||
combined_data = df | ||
else: | ||
combined_data = pd.merge(combined_data, df, on=col_date, how='outer') | ||
|
||
combined_data = combined_data.ffill() | ||
save_data(combined_data, f"{data_type}_combined", dataPath, SAVE_AS_PICKLE) | ||
if PUSH_TO_GCP: | ||
upload_combined_to_gcp(sofr_data, "sofr/sofr_data", bucket, SAVE_AS_PICKLE) | ||
return audit_data | ||
upload_to_gcp(combined_data, f"{data_type}_combined", bucket, SAVE_AS_PICKLE) | ||
|
||
return [{ | ||
"Series Name": f"{data_type}_Combined", | ||
"Last Date": combined_data[col_date].max(), | ||
"Last Value": ", ".join([f"{series}: {combined_data[series].iloc[-1]:.2f}" for series in data_info['data_ref']]), | ||
"Last Request Datetime": datetime.now().strftime("%Y-%m-%d %H:%M:%S") | ||
}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add data validation and error handling.
The function needs several improvements:
- Validate data_ref before processing
- Add error handling for FRED API calls
- Consider warning when ffill creates artificial data
- Update type hints to use built-in types
def process_combined_dataset(
data_type: str,
- data_info: Dict[str, Any],
- fred: Fred,
+ data_info: dict[str, Any],
+ fred: Fred,
col_date: str,
dataPath: Path,
SAVE_AS_PICKLE: bool,
PUSH_TO_GCP: bool,
- bucket
-) -> List[Dict[str, Any]]:
+ bucket: Any,
+) -> list[dict[str, Any]]:
+ if not data_info.get('data_ref') or not isinstance(data_info['data_ref'], list):
+ logging.error(f"Invalid data_ref for {data_type}")
+ return []
+
combined_data = pd.DataFrame()
+ missing_data = False
for series_id in data_info['data_ref']:
- data = fred.get_series(series_id)
+ try:
+ data = fred.get_series(series_id)
+ if data.empty:
+ logging.warning(f"No data returned for {series_id}")
+ continue
+ except Exception as e:
+ logging.error(f"Failed to fetch {series_id}: {str(e)}")
+ continue
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
34-34: Use dict
instead of Dict
for type annotation
Replace with dict
(UP006)
35-35: Undefined name Fred
(F821)
37-37: Undefined name Path
(F821)
38-38: Boolean-typed positional argument in function definition
(FBT001)
39-39: Boolean-typed positional argument in function definition
(FBT001)
40-40: Missing type annotation for function argument bucket
(ANN001)
40-40: Trailing comma missing
Add trailing comma
(COM812)
41-41: Use list
instead of List
for type annotation
Replace with list
(UP006)
41-41: Use dict
instead of Dict
for type annotation
Replace with dict
(UP006)
63-63: datetime.datetime.now()
called without a tz
argument
(DTZ005)
63-63: Trailing comma missing
Add trailing comma
(COM812)
Summary by CodeRabbit
New Features
Improvements
enabled
field for various data entries.