Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define architecture #38

Closed
nahu opened this issue May 8, 2023 · 6 comments
Closed

Define architecture #38

nahu opened this issue May 8, 2023 · 6 comments
Assignees
Labels
topic: documentation Improvements or additions to documentation

Comments

@nahu
Copy link
Contributor

nahu commented May 8, 2023

Frontend

Framework UI https://react.dev/
Development build https://vitejs.dev/
Components library definition https://storybook.js.org/
UI Components Options:

Backend

FastAPI https://fastapi.tiangolo.com/lo/
ORM https://sqlmodel.tiangolo.com/
Database: Postgres
Cognito SDK https://docs.aws.amazon.com/code-library/latest/ug/python_3_cognito-identity-provider_code_examples.html
Scheduled processes https://typer.tiangolo.com/
AWS SES (emails) https://docs.aws.amazon.com/code-library/latest/ug/python_3_ses_code_examples.html
Test Library https://www.starlette.io/testclient/

@nahu nahu converted this from a draft issue May 8, 2023
@nahu nahu added the topic: documentation Improvements or additions to documentation label May 8, 2023
@nahu nahu moved this from 🏗 In progress to 👀 In review in Credere optimization backlog May 8, 2023
@nahu nahu self-assigned this May 9, 2023
@jpmckinney
Copy link
Member

jpmckinney commented May 9, 2023

@yolile and I reviewed, and prepared the following questions/comments:

Just to confirm:

  • Regardless of which component libraries are used, the UI will match the designs visually, correct?
  • SQLModel (whether directly, or indirectly via SQLAlchemy) can be used for easy database migrations, correct?

Questions/comments:

  • https://pypi.org/project/fastapi-scheduler/ seems like a very new and still unpopular library. Why not https://fastapi-utils.davidmontague.xyz/user-guide/repeated-tasks/, which is authored by a main contributor to pydantic and fastapi, and is more popular?
  • Vite: OCP currently uses Webpack. Vite is faster, but our applications are so small, it does not make much of a difference. If we want speed, we can consider Turbopack. What is your preference / opinion?
  • As I understand, the Starlette test client needs to be combined with a test runner. (Also, not all of the code will be in the context of starlette requests, e.g. model logic.) Will you be using pytest?
  • A few things aren’t mentioned, but I assume we are using: Sentry (error logging – we can give you the URL), and FastAPI’s Background Tasks (for some things like upload from disk to S3, send email to FI, etc.). For code style, I assume we are using (but you can suggest alternatives): Black, ESLint, Standard (for JS), Prettier (for HTML, CSS), pip-tools (for dependency management).

@jpmckinney
Copy link
Member

jpmckinney commented May 12, 2023

I see there's a PR to document how migrations work in SQLModel: fastapi/sqlmodel#512

Current version of docs can be previewed here (linked from PR) (note: the docs aren't rendering correctly): https://639e7f62ce461c1a81789517--sqlmodel.netlify.app/advanced/migrations/

@jpmckinney
Copy link
Member

We have been considering #10 and the financial documents, which are currently proposed to be uploaded to S3.

FYI, we are considering whether it would be simpler to just store these files in PostgreSQL as bytea columns. PostgreSQL already handles binary files fairly well, and based on our projections, the total file size will never be that large.

A bonus is that it makes access control and data deletion much simpler (one system, instead of two).

In the app, to have the best performance, SQLModel would need to select the bytea column only when necessary. There is a simple way to do that here: fastapi/sqlmodel#499

We are confirming with our server maintenance consultants if we've missed any concerns with this approach.

@jpmckinney
Copy link
Member

jpmckinney commented May 15, 2023

I heard back from our server maintenance consultants. For completeness, my message to them was:

I would like to get your opinion on storing files in the database for Credere.

We are considering storing financial documents (in PDF, Word, etc. formats) in the database. This is to simplify access control, data deletion, etc. As I understand, PostgreSQL handles this transparently by using TOAST (if we e.g. use the bytea column type) and generally avoids some of the historical concerns around storing files in DBs. Also, as I understand, if we only SELECT the bytea column when we actually need that data, we won’t see any difference in read speed.

That said, I’ve read that there could be issues like slower backups, higher memory, and INSERT/UPDATE performance.

In our case, the rows storing files will be regularly deleted (in summary, small businesses upload documents when applying for credit, and those documents can be deleted once a decision has been made), and the files are expected to be small (less than 5MB). The maximum size (assuming 100% market adoption) is on the order of 10s of GBs. Realistically, we will be operating at 1-10% of that.

In other words, the total size of the files at a given time is not expected to be very large (much less than the JSON text we store for Kingfisher - but individual files will be larger than individual JSON texts).

All that said, let me know what you think. Our alternative is to upload files to S3, with corresponding complexity in keeping the two systems in sync.

Their response is, in short, that adding 10s of GBs to a database will indeed slow down the backup/restore process (but OCP already backs up a DB of 100s of GBs, so this is not an issue). Otherwise, the factors are as above.

So, I would propose to simply (1) store the files as a bytea column on the borrower_documents table and (2) avoid SELECTing that column except when needed (link in my previous comment).

@nahu
Copy link
Contributor Author

nahu commented May 16, 2023

  • Regardless of which component libraries are used, the UI will match the designs visually, correct?

Correct.

  • SQLModel (whether directly, or indirectly via SQLAlchemy) can be used for easy database migrations, correct?

Yes, we will be using Alembic

As I could see in the documentation, repeated-tasks does not allow to establish specific times as a scheduler, it only repeats every X time from an event such as the start of the application, so it will not allow us to set, for example, that the update from the API is done in a specific time like the beginning of the day. I found other options to try like

  • Vite: OCP currently uses Webpack. Vite is faster, but our applications are so small, it does not make much of a difference. If we want speed, we can consider Turbopack. What is your preference / opinion?

Turbopack is not yet ready for production use, and it can only be used with Next.js at the moment. As the project is small, it is an excellent opportunity to test Vite, which has become very popular as an alternative to create-react-app.

  • As I understand, the Starlette test client needs to be combined with a test runner. (Also, not all of the code will be in the context of starlette requests, e.g. model logic.) Will you be using pytest?

According to the documentation, it's used under the hood

  • A few things aren’t mentioned, but I assume we are using: Sentry (error logging – we can give you the URL), and FastAPI’s Background Tasks (for some things like upload from disk to S3, send email to FI, etc.). For code style, I assume we are using (but you can suggest alternatives): Black, ESLint, Standard (for JS), Prettier (for HTML, CSS), pip-tools (for dependency management).

I created the ticket for the Sentry integration in #47. You can provide the URL there.

@jpmckinney
Copy link
Member

I am okay with using Vite, but is there any reason not to use Webpack?

For scheduled tasks, could we not just use cron? e.g. in other projects we run, for example:

cd /data/deploy/project; /usr/local/bin/docker-compose run --rm cron python manage.py manageprocess 2> /dev/null

In this project, one option is is to write a simple CLI using typer (Click-based, same author as fastapi). We can share the SQLModel models across the typer CLI and fastapi API.

@nahu nahu moved this from 👀 In review to ✅ Done in Credere optimization backlog Jun 20, 2023
@yolile yolile closed this as completed Jun 21, 2023
jpmckinney added a commit that referenced this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: documentation Improvements or additions to documentation
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants