Skip to content

Commit

Permalink
Merge pull request #668 from uc-cdis/feat/requester-pays
Browse files Browse the repository at this point in the history
Feat/requester pays
  • Loading branch information
Avantol13 authored Jul 31, 2019
2 parents 7342645 + ab1f61e commit 66e0123
Show file tree
Hide file tree
Showing 11 changed files with 419 additions and 21 deletions.
77 changes: 76 additions & 1 deletion docs/google_architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,81 @@ The scopes that allow access to these methods are also available for individual

> NOTE: When these docs talk about `clients` accessing data, it's reasonable to assume that this means both `clients` and `users` themselves. Much of the algorithms behave similarly if the user themselves were to request access without going through an outside application.
### Handling Requester Pays Storage Buckets in Google (Optional)

Google supports a bucket-level configuration called ["Requester Pays"](https://cloud.google.com/storage/docs/requester-pays) which effectively pushes the cost of data access in a Google Storage bucket to the entity accessing the data.

> Without requester pays enabled, the Google project the Google Bucket is in gets billed.
The Data Access methods [Signed URLs](#signed-urls) and [Temporary Service Account Credentials](#temporary-service-account-credentials) support accessing requester pays buckets with some additional configuration and considerations (noteably: how it affects end-users).

In order to fully understand the options for requester pays support, it's important to first understand the technical steps to get any of these Google Data Access methods to work, as detailed in the library Fence uses for Google API interactions, [cirrus](https://github.com/uc-cdis/cirrus). It would also be useful to read through the access method details for [Signed URLs](#signed-urls) and [Temporary Service Account Credentials](#temporary-service-account-credentials).

#### Options for Billing Project(s)

The easiest option for supporting requester pays is to simply bill a Google Project you already own for all access to the bucket instead of requiring end-users to supply a project to bill. This essentially makes the requester pays bucket a non-requester pays bucket, since you'll be paying for all the access. This may be a necessary solution in cases where:

1) you want to serve data from a bucket you don't fully control (in other words, can't just turn "requester pays" off)
2) you don't want end-users to have to do manual configuration in Google Cloud Platform to enable billing their project
3) you/end-users don't want to have to give your application IAM permissions in a project the end-user owns to automatically enable billing

**NOTE:** If you do _not_ want to bill yourself for access, it is possible to require end-users to provide the project to bill OR configure a default billing project other than one you own. _However_, this will require more work for end-users that you need to consider.

To bill a project you do _not_ own, users either need to do `2` from above (manually give the necessary service account(s) (that Fence uses) access to bill the project they specify) OR they can agree to `3` (let Fence automatically assign the necessary billing permission in the project they specify). `3` requires that the Fence admin service account have the necessary roles in the users project(s) though. More details about that further down.

#### Billing Your Own Project

To bill a project you own, Fence offers an optional configuration for a "default billing project" which you can set to be a project that you manage and can provide the necessary permissions to the Fence admin service account.

> NOTE: At the time of writing, the configuration variable for the "default billing project" for signed urls is `BILLING_PROJECT_FOR_SIGNED_URLS`. The "default billing project" for temporary service account credentials is `BILLING_PROJECT_FOR_SA_CREDS`. The configuration for the Fence admin service account is `CIRRUS_CFG/GOOGLE_ADMIN_EMAIL` and is available through the API.
For the [Temporary Service Account Credentials](#temporary-service-account-credentials) access methods, clients need to know what the "default billing project" is (to include in their direct requests to Google). The configured "default billing project" is exposed through an API endpoint [detailed here](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/uc-cdis/fence/master/openapis/swagger.yaml#/google/getGoogleBillingProjects).

#### End-users Specifying a Billing Project

API requests to create a signed url and get temporary service account credentials also support the end-user providing a project to bill, in the form of a query param `userProject`. See the [API Documentation](https://github.com/uc-cdis/fence/tree/master#API-documentation) for more details.

If you do **not** want to bill a project you own and actually require end-users to pay for access to requester pays buckets, it will require manual configuration by the end-users. The configuration necessary for billing a project is the same whether you or an end-user has to enable it, as detailed below.

#### Required Google Cloud Platform (GCP) Configuration for Billing Project

Whether you bill your own project, or require end-users to specify a billing project, the required configuration in GCP is the same. The service account used to sign the URL and/or the service account used for Temporary Service Account Credentials needs the GCP permission `serviceusage.services.use` in the Google Project specified to bill to.

> "All actions that include a billing project in the request require serviceusage.services.use permission for the project that's specified" [according to Google's docs](https://cloud.google.com/storage/docs/access-control/iam-console).
You have 2 options to achieve the above:

1) assume end-users will provide the necessary permission for billing
2) configure Fence to automatically attempt to provide the necessary permission for billing

If you want Fence to automatically attempt to provide the necessary permissions to the relevant service accounts for data access, the Fence admin service account needs a couple pre-defined Google roles (through their Cloud IAM) on whatever project is provided for billing (be that in a request to Fence or whatever is configured as the "default billing project"):

* `Project IAM Admin`: to update the project's policy to give the necessary service account(s) billing permission
* `Role Administrator`: for creating a custom role that only provides billing permission to the project

> NOTE: The custom role that Fence creates contains the single permission in Google `serviceusage.services.use`.
#### Requester Pays Signed URLs and Temporary Service Account Credentials

1) For [Signed URLs](#signed-urls): a `userProject=<google-project-to-bill>` query parameter will be appended to the signed url
* will only be appended if a valid `userProject` is provided in the request **or** Fence is configured with a "default billing project" for signed URLs
* if Fence is configured to automatically enable billing permission, it will do that for the service account used to sign the URL
2) For [Temporary Service Account Credentials](#temporary-service-account-credentials): if Fence was configured to automatically enable billing permission, the service account key provided will have the necessary permissions on the `userProject` provided (in request or configured "default billing project") so that subsequent requests to Google using these service account credentials will allow specifying that `userProject` to bill
* depending on how the creds are used, this may involve adding additional query params or args to Google SDKs/services to provide the `userProject`

Example for Google's Cloud Storage SDK `gsutil`:

```bash
# activate the temporary service account credentials recieved from Fence
# this assumes the creds are saved in a file named `creds.json`
gcloud auth activate-service-account --key-file ./creds.json

# copy a file from the requester pays bucket locally
gsutil -u google-project-to-bill cp gs://some-requester-pays-bucket/file.txt .
```

In the above script, `google-project-to-bill` is either the `userProject` provided in the request to Fence, or Fence's "default billing project". `some-requester-pays-bucket` is a Google Storage Bucket with requester pays enabled.

## Data Access Methods

### Signed URLs
Expand Down Expand Up @@ -179,7 +254,7 @@ Projects are always validated against the following checks:
* Google Project has valid service accounts
* Key: `service_accounts`
* Checks if the Service Account members on the project pass the Service Account validity checks detailed below.

Service Accounts on the project, as well as the Service Account being registered, are validated against some combination of the following checks (which checks occur ultimately depend on the type of Service Account and whether or not the Service Account is currently being registered or not).

* Service Account is owned by Google Project identified in the request
Expand Down
58 changes: 49 additions & 9 deletions fence/blueprints/data/indexd.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from cached_property import cached_property
import cirrus
from cirrus import GoogleCloudManager
from cdislogging import get_logger
from cdispyutils.config import get_value
from cdispyutils.hmac4 import generate_aws_presigned_url
Expand Down Expand Up @@ -32,6 +33,7 @@
create_primary_service_account_key,
get_or_create_proxy_group_id,
get_google_app_creds,
give_service_account_billing_access_if_necessary,
)
from fence.utils import get_valid_expiration_from_request
from . import multipart_upload
Expand All @@ -52,6 +54,7 @@

def get_signed_url_for_file(action, file_id):
requested_protocol = flask.request.args.get("protocol", None)
r_pays_project = flask.request.args.get("userProject", None)

# default to signing the url even if it's a public object
# this will work so long as we're provided a user token
Expand All @@ -66,7 +69,11 @@ def get_signed_url_for_file(action, file_id):
expires_in = min(requested_expires_in, expires_in)

signed_url = indexed_file.get_signed_url(
requested_protocol, action, expires_in, force_signed_url=force_signed_url
requested_protocol,
action,
expires_in,
force_signed_url=force_signed_url,
r_pays_project=r_pays_project,
)
return {"url": signed_url}

Expand Down Expand Up @@ -295,7 +302,9 @@ def indexed_file_locations(self):
urls = self.index_document.get("urls", [])
return list(map(IndexedFileLocation.from_url, urls))

def get_signed_url(self, protocol, action, expires_in, force_signed_url=True):
def get_signed_url(
self, protocol, action, expires_in, force_signed_url=True, r_pays_project=None
):
if self.public and action == "upload":
raise Unauthorized("Cannot upload on public files")
# don't check the authorization if the file is public
Expand All @@ -306,9 +315,13 @@ def get_signed_url(self, protocol, action, expires_in, force_signed_url=True):
)
if action is not None and action not in SUPPORTED_ACTIONS:
raise NotSupported("action {} is not supported".format(action))
return self._get_signed_url(protocol, action, expires_in, force_signed_url)
return self._get_signed_url(
protocol, action, expires_in, force_signed_url, r_pays_project
)

def _get_signed_url(self, protocol, action, expires_in, force_signed_url):
def _get_signed_url(
self, protocol, action, expires_in, force_signed_url, r_pays_project
):
if not protocol:
# no protocol specified, return first location as signed url
try:
Expand All @@ -317,6 +330,7 @@ def _get_signed_url(self, protocol, action, expires_in, force_signed_url):
expires_in,
public_data=self.public,
force_signed_url=force_signed_url,
r_pays_project=r_pays_project,
)
except IndexError:
raise NotFound("Can't find any file locations.")
Expand All @@ -331,6 +345,7 @@ def _get_signed_url(self, protocol, action, expires_in, force_signed_url):
expires_in,
public_data=self.public,
force_signed_url=force_signed_url,
r_pays_project=r_pays_project,
)

raise NotFound(
Expand Down Expand Up @@ -469,7 +484,7 @@ def from_url(url):
return IndexedFileLocation(url)

def get_signed_url(
self, action, expires_in, public_data=False, force_signed_url=True
self, action, expires_in, public_data=False, force_signed_url=True, **kwargs
):
return self.url

Expand Down Expand Up @@ -579,7 +594,7 @@ def get_bucket_region(self):
return bucket_cred["region"]

def get_signed_url(
self, action, expires_in, public_data=False, force_signed_url=True
self, action, expires_in, public_data=False, force_signed_url=True, **kwargs
):
aws_creds = get_value(
config, "AWS_CREDENTIALS", InternalError("credentials not configured")
Expand Down Expand Up @@ -714,7 +729,12 @@ class GoogleStorageIndexedFileLocation(IndexedFileLocation):
"""

def get_signed_url(
self, action, expires_in, public_data=False, force_signed_url=True
self,
action,
expires_in,
public_data=False,
force_signed_url=True,
r_pays_project=None,
):
resource_path = (
self.parsed_url.netloc.strip("/") + "/" + self.parsed_url.path.strip("/")
Expand All @@ -737,12 +757,13 @@ def get_signed_url(
expiration_time,
user_info.get("user_id"),
user_info.get("username"),
r_pays_project=r_pays_project,
)

return url

def _generate_anonymous_google_storage_signed_url(
self, http_verb, resource_path, expiration_time
self, http_verb, resource_path, expiration_time, r_pays_project=None
):
# we will use the main fence SA service account to sign anonymous requests
private_key = get_google_app_creds()
Expand All @@ -754,11 +775,18 @@ def _generate_anonymous_google_storage_signed_url(
content_type="",
md5_value="",
service_account_creds=private_key,
requester_pays_user_project=r_pays_project,
)
return final_url

def _generate_google_storage_signed_url(
self, http_verb, resource_path, expiration_time, user_id, username
self,
http_verb,
resource_path,
expiration_time,
user_id,
username,
r_pays_project=None,
):
proxy_group_id = get_or_create_proxy_group_id()

Expand All @@ -781,6 +809,17 @@ def _generate_google_storage_signed_url(
user_id=user_id, username=username, proxy_group_id=proxy_group_id
)

if config["ENABLE_AUTOMATIC_BILLING_PERMISSION_SIGNED_URLS"]:
give_service_account_billing_access_if_necessary(
private_key,
r_pays_project,
default_billing_project=config["BILLING_PROJECT_FOR_SIGNED_URLS"],
)

# use configured project if it exists and no user project was given
if config["BILLING_PROJECT_FOR_SIGNED_URLS"] and not r_pays_project:
r_pays_project = config["BILLING_PROJECT_FOR_SIGNED_URLS"]

final_url = cirrus.google_cloud.utils.get_signed_url(
resource_path,
http_verb,
Expand All @@ -789,6 +828,7 @@ def _generate_google_storage_signed_url(
content_type="",
md5_value="",
service_account_creds=private_key,
requester_pays_user_project=r_pays_project,
)
return final_url

Expand Down
21 changes: 19 additions & 2 deletions fence/blueprints/google.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

from fence.auth import current_token, require_auth_header
from fence.restful import RestfulApi
from fence.config import config
from fence.errors import UserError, NotFound, Unauthorized, Forbidden
from fence.resources.google.validity import GoogleProjectValidity
from fence.resources.google.access_utils import (
Expand Down Expand Up @@ -60,6 +61,10 @@ def make_google_blueprint():
GoogleServiceAccountRoot, "/service_accounts", strict_slashes=False
)

blueprint_api.add_resource(
GoogleBillingAccount, "/billing_projects", strict_slashes=False
)

blueprint_api.add_resource(
GoogleServiceAccountDryRun,
"/service_accounts/_dry_run/<id_>",
Expand Down Expand Up @@ -100,6 +105,19 @@ def __init__(self, email, project_access, google_project_id, user_id=None):
self.user_id = user_id


class GoogleBillingAccount(Resource):
def get(self):
"""
Get the configured default Google billing projects if it exists.
"""
return {
"signed_urls": {"project_id": config["BILLING_PROJECT_FOR_SIGNED_URLS"]},
"temporary_service_account_credentials": {
"project_id": config["BILLING_PROJECT_FOR_SA_CREDS"]
},
}


class GoogleServiceAccountRoot(Resource):
@require_auth_header({"google_service_account"})
def post(self):
Expand Down Expand Up @@ -455,7 +473,6 @@ def _update_service_account_permissions(self, sa):
patch_user_service_account(
sa.google_project_id, sa.email, sa.project_access
)

except CirrusNotFound as exc:
return (
"Can not update the service accout {}. Detail {}".format(sa.email, exc),
Expand All @@ -467,7 +484,7 @@ def _update_service_account_permissions(self, sa):
400,
)
except Exception:
return (" Can not delete the service account {}".format(sa.email), 500)
return ("Can not update the service account {}".format(sa.email), 500)

return ("Successfully update service account {}".format(sa.email), 200)

Expand Down
11 changes: 11 additions & 0 deletions fence/blueprints/storage_creds/google.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from cirrus import GoogleCloudManager
from cirrus.config import config as cirrus_config

from fence.config import config
from fence.auth import require_auth_header
from fence.auth import current_token
from fence.errors import UserError
Expand All @@ -19,6 +20,7 @@
get_service_account,
get_or_create_service_account,
get_or_create_proxy_group_id,
give_service_account_billing_access_if_necessary,
)
from fence.utils import get_valid_expiration_from_request

Expand Down Expand Up @@ -149,10 +151,19 @@ def post(self):
proxy_group_id = get_or_create_proxy_group_id()
username = current_token.get("context", {}).get("user", {}).get("name")

r_pays_project = flask.request.args.get("userProject", None)

key, service_account = create_google_access_key(
client_id, user_id, username, proxy_group_id
)

if config["ENABLE_AUTOMATIC_BILLING_PERMISSION_SA_CREDS"]:
give_service_account_billing_access_if_necessary(
key,
r_pays_project,
default_billing_project=config["BILLING_PROJECT_FOR_SA_CREDS"],
)

if client_id is None:
self.handle_user_service_account_creds(key, service_account)

Expand Down
Loading

0 comments on commit 66e0123

Please sign in to comment.