Skip to content

Commit

Permalink
Remove API key requirement (#24)
Browse files Browse the repository at this point in the history
* modify api_key_exists to return True

* remove unused imports pylint

* include env var usage option, handle published url, clean up comments

* readme updates

* revert server_url change, pylint updates after removing disabled checks

* Code cleanup

* Use the standard port 8080

Personal/development environments should be configured by overriding
the server port with the `FIDESLOG__SERVER_PORT` environment variable.

* Use the previously configured ENV variable names

Also updates the expected GH secret names

* Remove `ETHYCA_` prefix from secret names

Secret names cannot be upated in the GH UI, they must be replaced. This
is easier.

* Ignore reused validators in pylint checks

* Remove requirement for the API Key completely, update readme, improve route testing

* remove api key authorization from the SDK

* Include Snowflake secrets in pytest GH action

* Minor README changes

* Globally disable line-too-long

* Organize imports

* Log event creation errors to server logs

* Code cleanup

* Remove last remaining API key reference

* Restore no-self-use pylint exception

* add opt_out_copy to sdk utils

Co-authored-by: Phil Salant <PSalant@gmail.com>
  • Loading branch information
SteveDMurphy and PSalant726 authored Mar 10, 2022
1 parent a47b24f commit 1fb830a
Show file tree
Hide file tree
Showing 24 changed files with 300 additions and 311 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/pr_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,10 @@ jobs:
Pytest:
needs: Build
runs-on: ubuntu-latest
env:
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_DB_PASSWORD: ${{ secrets.SNOWFLAKE_DB_PASSWORD }}
SNOWFLAKE_DB_USER: ${{ secrets.SNOWFLAKE_DB_USER }}
steps:
- name: Download fidescls container
uses: actions/download-artifact@v2
Expand Down
135 changes: 122 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,46 @@
# fideslog
Privacy respecting usage analytics collection.

## Development

## Sample Working Implementation
_(Working from a local installation)_
The following environment variables must be set in order to successfully run the API server locally:

```
export SNOWFLAKE_ACCOUNT=[see 1Password]
export SNOWFLAKE_DB_USER=[see 1Password]
export SNOWFLAKE_DB_PASSWORD=[see 1Password]
```

## Sample Working Implementation
_(Working from a local installation)_

First, start up a local api server
```
$ make api
```

Next create a virtual environment, install fideslog, and open up a python env
```
```bash
$ python3 -m venv env && source env/bin/activate
$ pip install -e .
$ python
```

The following below should work as is (provided you have a populated `fideslog.toml`)
```
The following below should work as is (provided you have a populated `fideslog.toml`/ environment variables)
```python
import platform
from importlib.metadata import version
import asyncio
from datetime import datetime, timezone
from fideslog.sdk.python import event, client

API_KEY = "12345"
product_name = "fideslog"

fideslog_client = client.AnalyticsClient(
api_key=API_KEY,
client_id="test_client_id",
os="Darwin",
product_name="fideslog",
production_version="1.2.3",
os=platform.system(),
product_name=product_name,
production_version=version(product_name),
)

fideslog_event = event.AnalyticsEvent(
Expand All @@ -46,7 +52,7 @@ asyncio.run(fideslog_client.send(event=fideslog_event))
```

Example structure of a minimum working payload
```
```json
{
"client_id": "test_client_id",
"event": "test_event",
Expand All @@ -56,3 +62,106 @@ Example structure of a minimum working payload
"production_version": "1.2.3"
}
```


## Example of an opt-out routine

All opt-out functionality will reside in the fides family tool implementing `fideslog`

The current copy to be used is as follows:
> Fides needs your permission to send Ethyca a limited set of anonymous usage statistics.
> Ethyca will only use this anonymous usage data to improve the product experience, and will never collect sensitive or personal data.
>
> ***
> Don't believe us? Check out the open-source code here:
> https://github.com/ethyca/fideslog
> ***
>
> To opt-out of all telemetry, press "n". To continue with telemetry, press any other key.

### Sample storing of values for fideslog
All values are currently stored in the fides tool configuration toml file, as below:
```toml
[cli]
analytics_id = "some_generated_anonymous_unique_id"

[user]
analytics_opt_out = false
```


### Generating a unique id

Using Pydantic defaults will create a unique ID if one doesn't exist:

```python
from fideslog.sdk.python.utils import generate_client_id, FIDESCTL_CLI

class CLISettings(FidesSettings):
"""Class used to store values from the 'cli' section of the config."""

local_mode: bool = False
server_url: str = "http://localhost:8080"
analytics_id: str = generate_client_id(FIDESCTL_CLI)

class Config:
env_prefix = "FIDESCTL__CLI__"
```


### Gaining consent in a conspicuous way


A user should be asked only once if they would like to provide anonymous analytics to Ethyca.

Integrating with an initial workflow (i.e. `fidesctl init`) is a great way to capture and generate the required values up front.

Additionally, having a catch in the top-most level click command can provide an alternative method to ask only once for permission.


### Implementing the sdk

There are two items required for successfully sending an event to `fideslog`: `AnalyticsClient` & `AnalyticsEvent`

`AnalyticsClient` establishes some (relatively) constant properties that are required upon instantiation.

`AnalyticsEvent` is much more variable in makeup and can contain a number of extra properties for tacking purposes. Some of these will depend on the fides ecosystem being tracked (`endpoint` should align with `api` events for instance) with only the `event` and `event_created_at` properties required to send an event.

Example function of sending an event:
```python
def opt_out_anonymous_usage(
analytics_values: Optional[Dict] = None, config_path: str = ""
) -> bool:
"""
This function handles the verbiage and response of opting
in or out of anonymous usage analytic tracking.
If opting out, return True to set the opt out config.
"""
opt_in = input(OPT_OUT_COPY)
if analytics_values:
analytics_values["user"]["analytics_opt_out"] = bool(opt_in.lower() == "n")
update_config_file(analytics_values)
return bool(opt_in.lower() == "n")
```


Click allows for embedding a call to a function at higher levels of grouped commands. This will allow for consistent capturing of event data without having to touch every single implemented command. For nested groups however, you will likely be required to have function calls at the lower-tier group level as well. (i.e. `fidesctl export organization` will require an event on the `export` function to return the invoked subcommand)

Example at top-level cli group:
```python
if not ctx.obj["CONFIG"].user.analytics_opt_out:
send_anonymous_event(
command=ctx.invoked_subcommand, client_id=ctx.obj["CONFIG"].cli.analytics_id
)
```

Example at nested cli group:
```python
if not ctx.obj["CONFIG"].user.analytics_opt_out:
command = " ".join(filter(None, [ctx.info_name, ctx.invoked_subcommand]))
send_anonymous_event(
command=command, client_id=ctx.obj["CONFIG"].cli.analytics_id
)
```
8 changes: 4 additions & 4 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
services:
fideslog:
image: ethyca/fideslog:local
command: uvicorn --host 0.0.0.0 --port 8888 --reload fideslog.api.main:app
command: uvicorn --host 0.0.0.0 --port 8080 --reload fideslog.api.main:app
healthcheck:
test: [ "CMD", "curl", "-f", "http://0.0.0.0:8888/health" ]
test: [ "CMD", "curl", "-f", "http://0.0.0.0:8080/health" ]
interval: 5s
timeout: 5s
retries: 5
env_file:
- fideslog.env
expose:
- 8888
- 8080
ports:
- "8888:8888"
- "8080:8080"
volumes:
- type: bind
source: .
Expand Down
6 changes: 3 additions & 3 deletions fideslog.env
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
export SNOWFLAKE_DB_USER=$SNOWFLAKE_DB_USER
export SNOWFLAKE_DB_PASSWORD=$SNOWFLAKE_DB_PASSWORD

export FIDESLOG__DATABASE_ACCOUNT=$SNOWFLAKE_ACCOUNT
export FIDESLOG__DATABASE_PASSWORD=$SNOWFLAKE_DB_PASSWORD
export FIDESLOG__DATABASE_USER=$SNOWFLAKE_DB_USER
11 changes: 4 additions & 7 deletions fideslog.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
[database]
account = "redacted"
database = "redacted"
password = "redacted"
role = "redacted"
db_schema = "redacted"
warehouse = "redacted"
user = "redacted"
database = "raw"
role = "event_writer"
db_schema = "fides"
warehouse = "fides_log"

[server]
host = "localhost"
Expand Down
12 changes: 6 additions & 6 deletions fideslog/api/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,14 @@ class DatabaseSettings(Settings):
"""Configuration options for Snowflake."""

account: str = Field(..., exclude=True)
database: str = Field(..., exclude=True)
db_schema: str = Field("fides", exclude=True)
database: str = "raw"
db_schema: str = "fides"
password: str = Field(..., exclude=True)
role: str = Field("event_writer", exclude=True)
warehouse: str = Field("fides_log", exclude=True)
role: str = "event_writer"
warehouse: str = "fides_log"
user: str = Field(..., exclude=True)

db_connection_uri: Optional[str] = None
db_connection_uri: Optional[str] = Field(None, exclude=True)

@validator("db_connection_uri", pre=True, always=True)
def assemble_db_connection_uri(
Expand Down Expand Up @@ -90,7 +90,7 @@ class ServerSettings(Settings):

host: str = "0.0.0.0"
hot_reload: bool = False
port: int = 8888
port: int = 8080

class Config:
"""Modifies pydantic behavior."""
Expand Down
70 changes: 28 additions & 42 deletions fideslog/api/database/data_access.py
Original file line number Diff line number Diff line change
@@ -1,58 +1,44 @@
import logging
import json
from json import dumps
from logging import getLogger

from sqlalchemy import select
from sqlalchemy.orm import Session
from sqlalchemy.exc import DBAPIError
from sqlalchemy.orm import Session

# from fidesapi.database.session import async_session ## future to do after working sync

from fideslog.api.database.models import AnalyticsEvent as AnalyticsEventORM
from fideslog.api.models.analytics_event import AnalyticsEvent

from fideslog.api.database.models import AnalyticsEvent as AnalyticsEventORM
from fideslog.api.database.models import APIKey
log = getLogger(__name__)

log = logging.getLogger(__name__)

# TODO: Finish this
def create_event(database: Session, event: AnalyticsEvent) -> None:
"""Create a new analytics event."""

try:
log.debug("Creating resource")
event_record = AnalyticsEventORM(
client_id=event.client_id,
product_name=event.product_name,
production_version=event.production_version,
os=event.os,
docker=event.docker,
resource_counts=json.dumps(event.resource_counts.dict())
if event.resource_counts
else None,
event=event.event,
command=event.command,
flags=", ".join(event.flags) if event.flags else None,
endpoint=event.endpoint,
status_code=event.status_code,
error=event.error,
local_host=event.local_host,
extra_data=json.dumps(event.extra_data) if event.extra_data else None,
event_created_at=event.event_created_at,
extra_data = dumps(event.extra_data) if event.extra_data else None
flags = ", ".join(event.flags) if event.flags else None
resource_counts = (
dumps(event.resource_counts.dict()) if event.resource_counts else None
)
database.add(
AnalyticsEventORM(
client_id=event.client_id,
command=event.command,
docker=event.docker,
endpoint=event.endpoint,
error=event.error,
event=event.event,
event_created_at=event.event_created_at,
extra_data=extra_data,
flags=flags,
local_host=event.local_host,
os=event.os,
product_name=event.product_name,
production_version=event.production_version,
resource_counts=resource_counts,
status_code=event.status_code,
)
)
database.add(event_record)
database.commit()
except DBAPIError:
log.error("Insert Failed")


def api_key_exists(database: Session, token: str) -> bool:
"""
Return whether the provided token exists in the database.
"""

return (
database.execute(
select(APIKey).where(APIKey.api_key == token).limit(1),
).first()
is not None
)
2 changes: 1 addition & 1 deletion fideslog/api/database/database.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, Session
from sqlalchemy.orm import Session, sessionmaker

from fideslog.api.config import config

Expand Down
Loading

0 comments on commit 1fb830a

Please sign in to comment.