-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Document360Reader. Contributed by the PLACE team. #16305
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
def __init__( | ||
self, | ||
api_key: str, | ||
should_process_project_version=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add types to these arguments? I have no idea what I should be passing for these items
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@@ -0,0 +1,135 @@ | |||
class Article: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably best to make objects like these a pydantic model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then with that, no need for all the getters and setters
class Article(BaseModel):
id: str
title: str
...
If things can be None
, then we should also use Optional[str]
typing for example
And lastly, we can transform this into a dict easily with .model_dump()
(useful for attaching as metadata to document objects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the fields in the entities are marked as Optional. This is because the actual API responses from Document360 sometimes do not match the expected schema mentioned in the API documentation.
(also mentioned it in the README file)
|
||
[tool.poetry.dependencies] | ||
python = ">=3.8.1,<4.0" | ||
llama-index-core = "^0.10.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be ^0.11.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Thanks for the suggestions @logan-markewich! Updated the MR and tested locally. Everything works fine. |
Description
The
Document360Reader
class is a custom reader developed by the PLACE team that interacts with the Document360 API to fetch articles.It processes articles recursively and allows further handling via custom callback functions, while also handling rate limiting and errors.
Fixes # (issue)
New Package?
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)(It's a new package)
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods