diff --git a/.gitignore b/.gitignore index 62652182a..8d141b634 100644 --- a/.gitignore +++ b/.gitignore @@ -15,8 +15,7 @@ dist/ downloads/ eggs/ .eggs/ -# need to comment out otherwise the swagger js libs don't get copied in the gh-pages deploy -#lib/ +lib/ lib64/ parts/ sdist/ diff --git a/README.md b/README.md index 3773c7d6a..2edbb1b0c 100644 --- a/README.md +++ b/README.md @@ -2,12 +2,14 @@ # Data Repository Service (DRS) API -`master` branch status: [![Build Status](https://travis-ci.org/ga4gh/data-repository-service-schemas.svg?branch=master)](https://travis-ci.org/ga4gh/data-repository-service-schemas?branch=master) -Swagger Validator +`develop` branch status: [![Build Status](https://travis-ci.org/ga4gh/data-repository-service-schemas.svg?branch=develop)](https://travis-ci.org/ga4gh/data-repository-service-schemas?branch=develop) +Swagger Validator [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1405753.svg)](https://doi.org/10.5281/zenodo.1405753) + +![PyPI - Python Version](https://img.shields.io/pypi/pyversions/ga4gh-drs-schemas.svg)--> + + The [Global Alliance for Genomics and Health](http://genomicsandhealth.org/) (GA4GH) is an international coalition, formed to enable the sharing of genomic and clinical data. # About the GA4GH Cloud Work Stream @@ -19,7 +21,7 @@ We work with platform development partners and industry leaders to develop stand # What is DRS? -The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standard way regardless of where it’s stored and how it’s managed. +The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standardized way regardless of where it’s stored or how it’s managed. The primary functionality of DRS is to map a logical ID to a means for physically retrieving the data represented by the ID. For more information see our HTML documentation links in the table below. @@ -30,8 +32,9 @@ For more information see our HTML documentation links in the table below. | --- | --- | --- | | **master**: the current release | [HTML](https://ga4gh.github.io/data-repository-service-schemas/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/swagger-ui/#/DataRepositoryService/) | | **develop**: the stable development branch, into which feature branches are merged | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/develop/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/develop/swagger-ui/#/DataRepositoryService/) | -| **release 0.0.1**: the initial DRS after the rename from DOS | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.0.1/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.0.1/swagger-ui/#/DataRepositoryService/) | +| **release 1.0.0**: the 1.0.0 release of DRS that we are submitting to GA4GH for standards approval | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.0.0/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-1.0.0/swagger-ui/#/DataRepositoryService/) | | **release 0.1**: simplifying DRS to core functionality | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-0.1.0/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/drs-0.1.0/swagger-ui/#/DataRepositoryService/) | +| **release 0.0.1**: the initial DRS after the rename from DOS | [HTML](https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.0.1/docs/) | [Swagger](https://ga4gh.github.io/data-repository-service-schemas/preview/release/0.0.1/swagger-ui/#/DataRepositoryService/) | To monitor development work on various branches, add 'preview/\' to the master URLs above (e.g., 'https://ga4gh.github.io/data-repository-service-schemas/preview/\/docs'). diff --git a/docs/asciidoc/front_matter.adoc b/docs/asciidoc/front_matter.adoc index d4d0739b9..5c4dd2101 100644 --- a/docs/asciidoc/front_matter.adoc +++ b/docs/asciidoc/front_matter.adoc @@ -10,26 +10,29 @@ The primary functionality of DRS is to map a logical ID to a means for physicall Each implementation of DRS can choose its own id scheme, as long as it follows these guidelines: -* DRS IDs are URL-safe text strings made up of alphanumeric characters and any of [.-_/] +* DRS IDs are strings made up of uppercase and lowercase letters, decimal digits, hypen, period, underscore and tilde [A-Za-z0-9.-_~]. See https://tools.ietf.org/html/rfc3986#section-2.3[RFC 3986 § 2.3]. +* Note to server implementors: internal IDs can contain other characters, but they MUST be encoded into valid DRS IDs whenever exposed by the API. * One DRS ID MUST always return the same object data (or, in the case of a collection, the same set of objects). This constraint aids with reproducibility. * DRS v1 does NOT support semantics around multiple versions of an object. (For example, there’s no notion of “get latest version” or “list all versions”.) Individual implementation MAY choose an ID scheme that includes version hints. * DRS implementations MAY have more than one ID that maps to the same object. +=== DRS URIs + +For convenience, including when passing content references to a WES server, we define a URI syntax for DRS-accessible content. Strings of the form `drs:///` mean _“you can fetch the content with DRS id `` from the DRS server at `` "_. + +For example, if a WES server was asked to process `drs://drs.example.org/314159`, it would know that it could issue a GET request to `https://drs.example.org/ga4gh/drs/v1/objects/314159` to learn how to fetch that object. + === DRS Datatypes DRS v1 supports two types of content: -* an `Object` is like a file -- it's a single blob of bytes -* a `Bundle` is like a folder -- it's a collection of other DRS content (either objects or bundles) +* a _blob_ is like a file -- it's a single blob of bytes, represented by a `DrsObject` without a `contents` array +* a _bundle_ is like a folder -- it's a collection of other DRS content (either blobs or bundles), represented by a `DrsObject` with a `contents` array === Read-only DRS v1 is a read-only API. We expect that each implementation will define its own mechanisms and interfaces (graphical and/or programmatic) for adding and updating data. -=== URI convention - -For convenience, including when passing content references to a WES server, we intend to define a recommended URI syntax. The syntax will probably use URI strings beginning with `drs://` -- details are being discussed in https://github.com/ga4gh/data-repository-service-schemas/issues/252[DRS#252]. - === Standards The DRS API specification is written in OpenAPI and embodies a RESTful service philosophy. It uses JSON in requests and responses and standard HTTPS for information transport. @@ -38,7 +41,7 @@ The DRS API specification is written in OpenAPI and embodies a RESTful service p === Making DRS Requests -The DRS implementation is responsible for defining and enforcing an authorization policy that determines which users are allowed to make which requests. GA4GH recommends that DRS implementations use an OAuth 2.0 https://oauth.net/2/bearer-tokens/[bearer token], although they can choose other mechanisms if appropriate. The `service-info` endpoint should provide sufficient information for a user to figure out how to authenticate with a DRS implementation. +The DRS implementation is responsible for defining and enforcing an authorization policy that determines which users are allowed to make which requests. GA4GH recommends that DRS implementations use an OAuth 2.0 https://oauth.net/2/bearer-tokens/[bearer token], although they can choose other mechanisms if appropriate. === Fetching DRS Objects diff --git a/docs/figure1.png b/docs/figure1.png new file mode 100644 index 000000000..0f03c1041 Binary files /dev/null and b/docs/figure1.png differ diff --git a/docs/figure2.png b/docs/figure2.png new file mode 100644 index 000000000..1b82112e2 Binary files /dev/null and b/docs/figure2.png differ diff --git a/docs/figure3.png b/docs/figure3.png new file mode 100644 index 000000000..062f0b74b Binary files /dev/null and b/docs/figure3.png differ diff --git a/openapi/data_repository_service.swagger.yaml b/openapi/data_repository_service.swagger.yaml index be7fbf1d2..ad646e57c 100644 --- a/openapi/data_repository_service.swagger.yaml +++ b/openapi/data_repository_service.swagger.yaml @@ -2,7 +2,7 @@ swagger: '2.0' basePath: '/ga4gh/drs/v1' info: title: Data Repository Service - version: 0.1.0 + version: 1.0.0 description: 'https://github.com/ga4gh/data-repository-service-schemas' termsOfService: 'https://www.ga4gh.org/terms-and-conditions/' contact: @@ -21,70 +21,32 @@ security: - {} - authToken: [] paths: - '/service-info': - get: - summary: Get information about this implementation. - description: >- - May return service version and other information. - operationId: GetServiceInfo - responses: - '200': - description: Service information returned successfully - schema: - $ref: '#/definitions/ServiceInfo' - tags: - - DataRepositoryService - x-swagger-router-controller: ga4gh.drs.server - '/bundles/{bundle_id}': - get: - summary: Get info about a Data Bundle. - description: >- - Returns bundle metadata, and a list of ids that can be used to fetch bundle contents. - operationId: GetBundle - responses: - '200': - description: The Data Bundle was found successfully. - schema: - $ref: '#/definitions/Bundle' - '400': - description: The request is malformed. - schema: - $ref: '#/definitions/Error' - '401': - description: The request is unauthorized. - schema: - $ref: '#/definitions/Error' - '403': - description: The requester is not authorized to perform this action. - schema: - $ref: '#/definitions/Error' - '404': - description: The requested Data Bundle wasn't found. - schema: - $ref: '#/definitions/Error' - '500': - description: An unexpected error occurred. - schema: - $ref: '#/definitions/Error' - parameters: - - name: bundle_id - in: path - required: true - type: string - tags: - - DataRepositoryService - x-swagger-router-controller: ga4gh.drs.server '/objects/{object_id}': get: - summary: Get info about a Data Object. + summary: Get info about a `DrsObject`. description: >- Returns object metadata, and a list of access methods that can be used to fetch object bytes. operationId: GetObject responses: '200': - description: The Data Object was found successfully. + description: The `DrsObject` was found successfully. schema: - $ref: '#/definitions/Object' + $ref: '#/definitions/DrsObject' + '202': + description: > + The operation is delayed and will continue asynchronously. + The client should retry this same request after the delay specified by Retry-After header. + headers: + Retry-After: + description: > + Delay in seconds. The client should retry this same request after waiting for this duration. + To simplify client response processing, this must be an integral relative time in seconds. + This value SHOULD represent the minimum duration the client should wait before attempting + the operation again with a reasonable expectation of success. When it is not feasible + for the server to determine the actual expected delay, the server may return a + brief, fixed value instead. + type: integer + format: int64 '400': description: The request is malformed. schema: @@ -98,10 +60,9 @@ paths: schema: $ref: '#/definitions/Error' '404': - description: The requested Data Object wasn't found + description: The requested `DrsObject` wasn't found schema: $ref: '#/definitions/Error' - '500': description: An unexpected error occurred. schema: @@ -111,6 +72,22 @@ paths: in: path required: true type: string + - in: query + name: expand + type: boolean + default: false + description: >- + If false and the object_id refers to a bundle, then the ContentsObject array + contains only those objects directly contained in the bundle. That is, if the + bundle contains other bundles, those other bundles are not recursively + included in the result. + + If true and the object_id refers to a bundle, then the entire set of objects + in the bundle is expanded. That is, if the bundle contains aother bundles, + then those other bundles are recursively expanded and included in the result. + Recursion continues through the entire sub-tree of the bundle. + + If the object_id refers to a blob, then the query parameter is ignored. tags: - DataRepositoryService x-swagger-router-controller: ga4gh.drs.server @@ -118,7 +95,7 @@ paths: get: summary: Get a URL for fetching bytes. description: >- - Returns a URL that can be used to fetch the object bytes. + Returns a URL that can be used to fetch the bytes of a `DrsObject`. This method only needs to be called when using an `AccessMethod` that contains an `access_id` @@ -129,6 +106,21 @@ paths: description: The access URL was found successfully. schema: $ref: '#/definitions/AccessURL' + '202': + description: > + The operation is delayed and will continue asynchronously. + The client should retry this same request after the delay specified by Retry-After header. + headers: + Retry-After: + description: > + Delay in seconds. The client should retry this same request after waiting for this duration. + To simplify client response processing, this must be an integral relative time in seconds. + This value SHOULD represent the minimum duration the client should wait before attempting + the operation again with a reasonable expectation of success. When it is not feasible + for the server to determine the actual expected delay, the server may return a + brief, fixed value instead. + type: integer + format: int64 '400': description: The request is malformed. schema: @@ -154,12 +146,12 @@ paths: in: path required: true type: string - description: An `id` of a Data Object + description: An `id` of a `DrsObject` - name: access_id in: path required: true type: string - description: An `access_id` from the `access_methods` list of a Data Object + description: An `access_id` from the `access_methods` list of a `DrsObject` tags: - DataRepositoryService x-swagger-router-controller: ga4gh.drs.server @@ -175,160 +167,140 @@ securityDefinitions: definitions: Checksum: type: object - required: - - checksum + required: ['checksum', 'type'] properties: checksum: type: string description: 'The hex-string encoded checksum for the data' type: type: string - description: |- - The digest method used to create the checksum. If left unspecified md5 - will be assumed. - - possible values: - md5 # most blob stores provide a checksum using this - etag # multipart uploads to blob stores - sha256 - sha512 - Bundle: - type: object - required: ['id', 'size', 'created', 'checksums', 'contents'] - properties: - id: - type: string - description: >- - An identifier unique to this Data Bundle. - name: - type: string - description: >- - A string that can be used to name a Data Bundle. - size: - type: string - format: int64 - description: >- - The cumulative size, in bytes, of all Data Objects and Bundles listed in - the `contents` field. - created: - type: string - format: date-time - description: >- - Timestamp of Bundle creation in RFC3339. - updated: - type: string - format: date-time description: >- - Timestamp of Bundle update in RFC3339, identical to create timestamp in - systems that do not support updates. - version: - type: string - description: >- - A string representing a version. (Some systems may use checksum, a - RFC3339 timestamp, or an incrementing version number.) - checksums: - type: array - description: |- - The checksum of the Data Bundle. At least one checksum must be provided. + The digest method used to create the checksum. - The Data Bundle checksum is computed over a sorted concatenation of all - the checksums (names not included) within the top-level 'contents' of the - Bundle (not recursive). The list of Data Object or Bundle checksums are - sorted alphabetically (hex-code) before concatenation and a further checksum - is performed on the concatenated checksum value. - Example below: - Data Ojects: - md5(DO1) = 72794b6d30bc86d92e40a1aa65c880b8 - md5(DO2) = 5e089d29a18954e68a78ee6a3c6edabd - Data Bundle: - DB1 = md5( concat( sort( md5(DO1), md5(DO2) ) ) ) - = md5( concat( sort( 72794b6d30bc86d92e40a1aa65c880b8, 5e089d29a18954e68a78ee6a3c6edabd ) ) ) - = md5( concat( 5e089d29a18954e68a78ee6a3c6edabd, 72794b6d30bc86d92e40a1aa65c880b8 ) ) - = md5( 5e089d29a18954e68a78ee6a3c6edabd72794b6d30bc86d92e40a1aa65c880b8 ) - = f7a29a0422e7d870b10839ad6c985079 - items: - $ref: '#/definitions/Checksum' - description: - type: string - description: A human readable description of the Data Bundle. - aliases: - type: array - description: >- - A list of strings that can be used to find other metadata - about this Data Bundle from external metadata sources. These - aliases can be used to represent the Data Bundle's secondary - accession numbers or external GUIDs. - items: - type: string - contents: - type: array - description: >- - The list of Data Objects and Data Bundles contained by this Data Bundle. - items: - $ref: '#/definitions/BundleObject' - Object: + + The value (e.g. `sha-256`) SHOULD be listed as `Hash Name String` in the https://www.iana.org/assignments/named-information/named-information.xhtml#hash-alg[IANA Named Information Hash Algorithm Registry]. + Other values MAY be used, as long as implementors are aware of the issues discussed in https://tools.ietf.org/html/rfc6920#section-9.4[RFC6920]. + + + GA4GH may provide more explicit guidance for use of non-IANA-registered algorithms in the future. + Until then, if implementors do choose such an algorithm (e.g. because it's implemented by their storage provider), they SHOULD use an existing + standard `type` value such as `md5`, `etag`, `crc32c`, `trunc512`, or `sha1`. + example: + sha-256 + DrsObject: type: object - required: ['id', 'size', 'created', 'checksums', 'access_methods'] + required: ['id', 'self_uri', 'size', 'created_time', 'checksums'] properties: id: type: string description: |- - An identifier unique to this Data Object. + An identifier unique to this `DrsObject`. name: type: string description: |- - A string that can be used to name a Data Object. + A string that can be used to name a `DrsObject`. + This string is made up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore [A-Za-z0-9.-_]. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282[portable filenames]. + self_uri: + type: string + description: |- + A drs:// URI, as defined in the DRS documentation, that tells clients how to access this object. + The intent of this field is to make DRS objects self-contained, and therefore easier for clients to store and pass around. + example: + drs://drs.example.org/314159 size: type: integer format: int64 description: |- - The object size in bytes. - created: + For blobs, the blob size in bytes. + For bundles, the cumulative size, in bytes, of items in the `contents` field. + created_time: type: string format: date-time description: |- - Timestamp of object creation in RFC3339. - updated: + Timestamp of content creation in RFC3339. + (This is the creation time of the underlying content, not of the JSON object.) + updated_time: type: string format: date-time description: >- - Timestamp of Object update in RFC3339, identical to create timestamp in systems + Timestamp of content update in RFC3339, identical to `created_time` in systems that do not support updates. + (This is the update time of the underlying content, not of the JSON object.) version: type: string - description: |- + description: >- A string representing a version. + + (Some systems may use checksum, a RFC3339 timestamp, or an incrementing version number.) mime_type: type: string description: |- - A string providing the mime-type of the Data Object. + A string providing the mime-type of the `DrsObject`. example: application/json checksums: type: array + minItems: 1 items: $ref: '#/definitions/Checksum' - description: |- - The checksum of the Data Object. At least one checksum must be provided. + description: >- + The checksum of the `DrsObject`. At least one checksum must be provided. + + For blobs, the checksum is computed over the bytes in the blob. + + + For bundles, the checksum is computed over a sorted concatenation of the + checksums of its top-level contained objects (not recursive, names not included). + The list of checksums is sorted alphabetically (hex-code) before concatenation + and a further checksum is performed on the concatenated checksum value. + + + For example, if a bundle contains blobs with the following checksums: + + md5(blob1) = 72794b6d + + md5(blob2) = 5e089d29 + + + Then the checksum of the bundle is: + + md5( concat( sort( md5(blob1), md5(blob2) ) ) ) + + = md5( concat( sort( 72794b6d, 5e089d29 ) ) ) + + = md5( concat( 5e089d29, 72794b6d ) ) + + = md5( 5e089d2972794b6d ) + + = f7a29a04 access_methods: type: array minItems: 1 items: $ref: '#/definitions/AccessMethod' description: |- - The list of access methods that can be used to fetch the Data Object. + The list of access methods that can be used to fetch the `DrsObject`. + Required for single blobs; optional for bundles. + contents: + type: array + description: >- + If not set, this `DrsObject` is a single blob. + + If set, this `DrsObject` is a bundle containing the listed `ContentsObject` s (some of which may be further nested). + items: + $ref: '#/definitions/ContentsObject' description: type: string description: |- - A human readable description of the Data Object. + A human readable description of the `DrsObject`. aliases: type: array items: type: string description: >- A list of strings that can be used to find other metadata - about this Data Object from external metadata sources. These - aliases can be used to represent the Data Object's secondary + about this `DrsObject` from external metadata sources. These + aliases can be used to represent secondary accession numbers or external GUIDs. AccessURL: type: object @@ -373,7 +345,7 @@ definitions: type: string description: >- An arbitrary string to be passed to the `/access` method to get an `AccessURL`. - This string must be unique per object. + This string must be unique within the scope of a single object. Note that at least one of `access_url` and `access_id` must be provided. region: type: string @@ -392,63 +364,43 @@ definitions: status_code: type: integer description: The integer representing the HTTP status code (e.g. 200, 404). - ServiceInfo: - type: object - required: - - version - description: >- - Useful information about the running service. - properties: - version: - type: string - description: Service version - title: - type: string - description: Service name - description: - type: string - description: Service description - contact: - type: object - description: Maintainer contact info - license: - type: object - description: License information for the exposed API - BundleObject: + ContentsObject: type: object properties: name: type: string description: >- - A name declared by the Bundle author that must be - used when materialising the associated data object, + A name declared by the bundle author that must be + used when materialising this object, overriding any name directly associated with the object itself. - This string MUST NOT contain any slashes. + The name must be unique with the containing bundle. + This string is made up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore [A-Za-z0-9.-_]. See http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282[portable filenames]. id: type: string description: >- - A DRS identifier of a Data Object or a nested Data Bundle. + A DRS identifier of a `DrsObject` (either a single blob or a nested bundle). + If this ContentsObject is an object within a nested bundle, then the id is + optional. Otherwise, the id is required. drs_uri: type: array description: >- A list of full DRS identifier URI paths - that may be used obtain the Data Object or Data Bundle. + that may be used to obtain the object. These URIs may be external to this DRS instance. example: - drs://example.com/ga4gh/drs/v1/objects/{object_id} + drs://drs.example.org/314159 items: type: string - type: - type: string - enum: - - object - - bundle + contents: + type: array description: >- - The type of content being referenced. - BundleObject of type bundle will need to be recursed further. + If this ContentsObject describes a nested bundle and the caller specified + "?expand=true" on the request, then this contents array must be present and + describe the objects within the nested bundle. + items: + $ref: '#/definitions/ContentsObject' + required: - name - - id - - type tags: - name: DataRepositoryService diff --git a/scripts/stagepages.sh b/scripts/stagepages.sh index b5f0a806e..72f495458 100644 --- a/scripts/stagepages.sh +++ b/scripts/stagepages.sh @@ -1,7 +1,5 @@ #!/usr/bin/env bash - -set -e -set -v +set -ev if [ "$TRAVIS_BRANCH" != "gh-pages" ]; then if [ "$TRAVIS_BRANCH" == "master" ]; then @@ -20,6 +18,6 @@ if [ "$TRAVIS_BRANCH" != "gh-pages" ]; then fi # do some cleanup, these cause the gh-pages deploy to break -rm -rf node_modules -rm -rf web_deploy -rm -rf spec +# rm -rf node_modules +# rm -rf web_deploy +# rm -rf spec