Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instantiate dev DB #12

Merged
merged 26 commits into from
Nov 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
1e0b56b
[change] oops
duynguyen158 Nov 8, 2023
3cde8ec
[change] bump ver
duynguyen158 Nov 8, 2023
98ffd47
[add] embedding model
duynguyen158 Nov 9, 2023
d68b2e4
[add] extension enabler
duynguyen158 Nov 9, 2023
246b307
[add] embedding creation test
duynguyen158 Nov 9, 2023
280a370
[add] execution model
duynguyen158 Nov 9, 2023
40d98d0
[change] update tests
duynguyen158 Nov 9, 2023
bfc36e0
[add] cosine similarity test
duynguyen158 Nov 9, 2023
bafa8b2
[add] new models step
duynguyen158 Nov 10, 2023
2e8d5d1
[add] first dev stage config
duynguyen158 Nov 10, 2023
de201d0
[add] training job roles and users
duynguyen158 Nov 10, 2023
a910aad
[add] pgvector extension
duynguyen158 Nov 10, 2023
a9293da
Merge branch 'orm-create-recs-embeds-models' into orm-install-terraform
duynguyen158 Nov 10, 2023
0a2cdc6
[remove] db init tests
duynguyen158 Nov 10, 2023
2bf2333
[add] terraform task
duynguyen158 Nov 10, 2023
88e7a39
[change] separate db to its own module
duynguyen158 Nov 10, 2023
6dc4728
[add] alembic script to create all tables
duynguyen158 Nov 10, 2023
f4be6fa
[remove] old db declare python stuff
duynguyen158 Nov 10, 2023
4e51c30
[add] refer to db
duynguyen158 Nov 10, 2023
a525517
[change] use a specific set of special chars
duynguyen158 Nov 10, 2023
08e0007
[remove] ampersand
duynguyen158 Nov 10, 2023
6f80819
[remove] all special chars
duynguyen158 Nov 10, 2023
9072e5d
[change] okay this should work
duynguyen158 Nov 10, 2023
b383e59
[add] CICD language
duynguyen158 Nov 11, 2023
a18f446
[change] gh workflow branches
duynguyen158 Nov 13, 2023
e4df6b8
[change] use correct image
duynguyen158 Nov 13, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/on-merge-to-main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@ name: publish
on:
push:
branches:
- main
- orm
- dev # dev acts as the default branch for package version updates, for now
paths:
- "pyproject.toml"
jobs:
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/on-pull-request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ name: run checks
on:
pull_request:
branches:
- main
- orm
- dev
- prod
paths:
- "**.py"

Expand All @@ -14,7 +14,7 @@ jobs:
# Label used to access the service container
postgres:
# Docker Hub image
image: postgres
image: ankane/pgvector:v0.4.1
# Provide the password for postgres
env:
POSTGRES_PASSWORD: postgres
Expand Down Expand Up @@ -44,4 +44,4 @@ jobs:
- name: run mypy
run: mypy
- name: run tests
run: DB_NAME=postgres pytest tests
run: pytest tests
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -166,3 +166,8 @@ cython_debug/

# VSCode
/.vscode

# Terraform
.terraform/
*.tfvars.json
*.tfvars
45 changes: 11 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,49 +27,26 @@ that correspond to a row in the table.
To install the package from PyPi, run: `pip install article-rec-db`. Check existing versions
[here](https://pypi.org/project/article-rec-db/).

### Initialize a new cluster
### Database management

If you want to initialize a fresh database cluster, pass in the env vars to connect to the cluster and run `init_db`.
If the target cluster has IP restrictions, make sure your IP address is a valid access point.
We use [Terraform](https://developer.hashicorp.com/terraform) to manage cluster entities such as databases, roles and extensions. The code is in the `terraform` directory. Stages (dev and prod) are represented as different databases. To make changes to an existing database,

An example run with fake credentials (from the root dir of this project with the virtual env
activated):
`HOST=fakehost USER=fakeuser PASSWORD=fakepw DB_NAME=postgres python db_init_steps/_0_init_db.py`
1. Make changes inside `terraform/modules`.
2. Run `poe terraform [stage] plan` to see the changes that will be applied to the corresponding database.
3. At this point, if you're happy, you can run `poe terraform [stage] apply` yourself, but we prefer a CI/CD approach. Merging a PR to the `dev` branch will trigger a plan to be applied to the `dev` database, and the same for the `prod` branch. _We always merge to `dev` first, then do another merge from `dev` to `prod`._

(If you run into a `ModuleNotFoundError`, try including `PYTHONPATH=$(pwd)` before the `python` command. This applies to any other commands that uses `python`.)

This should run the most up-to-date SQLModel definitions of the tables, which means you are
safe to then run any additional changes in role, access, and policy changes. So you can
run the rest of the steps in `db_init_steps`, one after the other in ascending numerical order.

No `PORT` is passed because the default port is 5432, the standard for Postgres.

### Migrations
### Table and column migrations

So you made some changes to what tables there are, what columns there are, indices, etc. and you'd like to
update the databases. This is what alembic is for!
update the databases. This is what alembic is for! (And notice the difference between Terraform and alembic: Terraform manages database entities that are not specific to a database, like roles and extensions, while alembic manages database entities that are specific to a database, like tables and columns.)

To generate a new revision after you've updated the models:

1. Run this from the root of the project: `DB_CONNECTION_STRING='postgresql://user:password@host:port/db_name' alembic revision --autogenerate -m "message"`. (There's a Poe task for this: run `poe rmtdiff -d db_name -m "message"`)
2. Check the `/alembic/versions/` directory for the new revision and verify that it does what you want it to
3. Run this from the root of the project: `DB_CONNECTION_STRING='postgresql://user:password@host:port/db_name' alembic upgrade head`
4. Note that you only need to generate the revision file (step 1) _once_ because we want the same content in each environment's database, but you do need to run the `upgrade head` command once _for each_ database (change the DB_NAME to the desired target). (There's a Poe task for this: run `poe rmtupgrade -d db_name`)

If you decide to do Step 1 or 4 with Poe, make sure to include the `DB_CREDENTIALS_SSM_PARAM` env var set to the name of the AWS SSM parameter that stores the credentials for the database, either inline or in a top-level `.env` file. Make sure the AWS CLI and `jq` command-line package are installed.

To make new users, grant privileges, etc., follow the patterns used in db_init_stages along with the
helpers under article_rec_db.
3. Run this from the root of the project: `DB_CONNECTION_STRING='postgresql://user:password@host:port/db_name' alembic upgrade head`. Note that you only need to generate the revision file (step 1) _once_ because we want the same content in each environment's database, but you do need to run the `upgrade head` command once _for each_ database (change the DB_NAME to the desired target). (There's a Poe task for this: run `poe rmtupgrade -d db_name`)

1. Create a new file under db*init_stages that does what you want and is prefixed with `\_X*`, where `X` is the next number (it has no function, it's just nice to keep track of the step order).
2. Run the file. You can run it like so: `HOST=fakehost USER=fakeuser PASSWORD=fakepw DB_NAME=postgres python article_rec_db/db_init_stages/_X_fake_file.py`
3. I'd recommend that you then connect to the cluster and verify your changes took place.

Note that you must provide valid host, user, password, and database name environment variables for it to work. The `PORT`
env var has a default value of 5432, so it is omitted here. The only other env var you might need
(if you are creating new roles/users that have credentials) is the `ENABLE_SSM` env var. By default
it is `FALSE` but if you set it to `TRUE` then it will make sure to upload any new credentials to the
SSM parameter store.
Similar to database management, we let our CI/CD handle Step 3.

## Development

Expand Down Expand Up @@ -115,10 +92,10 @@ is to run a Docker container, then run the tests while it is active.
3. Run `DB_NAME=postgres pytest tests` from the root directory of the project. Explore the `pytest` docs (linked above)
to see more options.

Steps 2 and 3 can be combined into one Poe task: `poe test`, which also stops the container after the tests are done, even if tests fail. In addition, you can also run `poe lclstart` to just start the container, and `poe lclstop` to stop it whenever you're done. `poe lclconnect` will connect you to the container via `psql` so you can poke around.

Note that if you decide to run the Postgres container with different credentials (a different password, port, etc.) or
via a different method, you will likely need to update the test file to point to the correct Postgres instance.

Additionally, if you want to re-run the tests, you want to make sure you start over from a fresh Postgres
instance. If you run Postgres via Docker, you can simply `ctrl-C` to stop the image and start a new one.

Steps 2 and 3 can be combined into one Poe task: `poe test`, which also stops the container after the tests are done, even if tests fail. In addition, you can also run `poe lclstart` to just start the container, and `poe lclstop` to stop it whenever you're done. `poe lclconnect` will connect you to the container via `psql` so you can poke around.

This file was deleted.

54 changes: 0 additions & 54 deletions article_rec_db/db/controller.py

This file was deleted.

26 changes: 0 additions & 26 deletions article_rec_db/db/database.py

This file was deleted.

87 changes: 0 additions & 87 deletions article_rec_db/db/helpers.py

This file was deleted.

Loading