Skip to content

Commit

Permalink
feat: issue2551068 - Provide way to retrieve file/msg data via rest e…
Browse files Browse the repository at this point in the history
…ndpoint.

Use Allow header to change format of /binary_content endpoint.  If
Allow header for endpoint is not application/json, it will be matched
against the mime type for the file. */*, text/* are supported and will
return the native mime type if present.

Changes:

  move */* mime type from static dict of supported types. It was
     hardcoded to return json only. Now it can return a matching
     non-json mime type for the /binary_content endpoint.

  Edited some errors to explicitly add */* mime type.

  Cleanups to use ', ' separation in lists of valid mime types rather
    than just space separated.

  Remove ETag header when sending raw content. See issue 2551375 for
     background.

  Doc added to rest.txt.

  Small format fix up (add dash) in CHANGES.txt.

  Make passing an unset/None/False accept_mime_type to
    format_dispatch_output a 500 error. This used to be the fallback
    to produce a 406 error after all processing had happened. It
    should no longer be possible to take that code path as all 406
    errors (with valid accept_mime_types) are generated before
    processing takes place.

  Make format_dispatch_output handle output other than json/xml so it
       can send back binary_content data.

  Removed a spurious client.response_code = 400 that seems to not be
    used.

  Tests added for all code paths.

  Database setup for tests msg and file entry. This required a file
    upload test to change so it doesn't look for file1 as the link
    returned by the upload. Download the link and verify the data
    rather than verifying the link.

 Multiple formatting changes to error messages to make all lists of
    valid mime types ', ' an not just space separated.
  • Loading branch information
rouilj committed Dec 8, 2024
1 parent a44e4c8 commit ba0e68a
Show file tree
Hide file tree
Showing 4 changed files with 492 additions and 20 deletions.
7 changes: 6 additions & 1 deletion CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Features:
- issue2551315 - Document use of
RestfulInstance.max_response_row_size to limit data returned
from rest request.
- issue2551330 Add an optional 'filter' function to the Permission
- issue2551330 - Add an optional 'filter' function to the Permission
objects and the addPermission method. This is used to optimize search
performance by not checking items returned from a database query
one-by-one (using the check function) but instead offload the
Expand All @@ -60,6 +60,11 @@ Features:
address. This logs the actual client address when
roundup-server is run behind a reverse proxy. It also appends a
+ sign to the logged address/name. (John Rouillard)
- issue2551068 - Provide way to retrieve file/msg data via rest
endpoint. Raw file/msg data can be retrieved using the
/binary_content attribute and an Accept header to select the mime
type for the data (e.g. image/png for a png file). The existing html
interface method still works and is supported, but is legacy.

2024-07-13 2.4.0

Expand Down
84 changes: 83 additions & 1 deletion doc/rest.txt
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,7 @@ extension ``.json`` or ``.xml`` to the path component of the url. This
will force json or xml (if supported) output. If you use an extension
it takes priority over any accept headers. Note the extension does not
work for the ``/rest`` or ``/rest/data`` paths. In these cases it
returs a 404 error. Adding the header ``Accept: application/xml``
returns a 404 error. Adding the header ``Accept: application/xml``
allows these paths to return xml data.

The rest interface returns status 406 if you use an unrecognized
Expand All @@ -378,6 +378,8 @@ the accept header are available or if the accept header is invalid.
Note: ``dicttoxml2.py`` is an updated version of ``dicttoxml.py``. If
you are still using Python 2.7 or 3.6, you can use ``dicttoxml.py``.

Also the ``/binary_content`` attribute endpoint can be used to
retrieve raw file data in many formats.

General Guidelines
------------------
Expand Down Expand Up @@ -906,6 +908,86 @@ You can retreive a message with a url like
}
}

With Roundup 2.5 you can retrieve the data directly from the rest
interface using the ``Accept`` header value to select a structured (json
or optional xml) representation (as above) or a stream with just the
content data.

Using the wildcard type ``*/*`` in the ``Accept`` header with the url
``.../binary_content`` will return the raw data and the recorded mime
type of the the data as the ``Content-Type``. Using ``*/*`` with
another end point will return ``json`` data. An ``Accept`` value of
``application/octet-stream`` matches any mime type and retrieves the
raw data as ``Content-Type: application/octet-stream``.

To access the contents of a PNG image file (in file23), you use the
following link:
``https://.../demo/rest/data/file/23/binary_content``. To find out the
mime type, you can check this URL:
``https://.../demo/rest/data/file/23/type``.

By setting the header to ``Accept: application/octet-stream; q=1.0,
application/json; q=0.5``, you will receive the binary PNG file with
the header ``Content-Type: application/octet-stream``. If you switch
the ``q`` values, you will receive the encoded JSON version::

{
"data": {
"id": "23",
"type": "<class 'bytes'>",
"link": "https://.../demo/rest/data/file/23/binary_content",
"data": "b'\\x89PNG\\r\\n\\x1a\\n\\x00[...]0\\x00\\x00\\x00IEND\\xaeB`\\x82'",
"@etag": "\"db6adc1b09d95b0388d79c7905bc7982\""
}
}

with ``Content-Type: application/json`` and a (4x larger) json encoded
representation of the binary data.

If you want it returned with a ``Content-Type: image/png`` header,
you can use ``image/png`` or ``*/*`` in the Accept header.

For message files, you can use
``https://.../demo/rest/data/msg/23/binary_content`` with ``Accept:
application/octet-stream; q=0.5, application/json; q=0.4, image/png;
q=0.495, text/*``. It will return the plain text of the message.

Most message files are not stored with a mime type. Getting
``https://.../demo/rest/data/msg/23/type`` returns::

{
"data": {
"id": "23",
"type": "<class 'NoneType'>",
"link": "https://.../demo/rest/data/msg/23/type",
"data": null,
"@etag": "\"ba98927a8bb4c56f6cfc31a36f94ad16\""
}
}

The data attribute will usually be null/empty. As a result, mime type
matching for an item without a mime type is forgiving.

Messages are meant to be human readable, so the mime type ``text/*``
can be used to access any text style mime type (``text/plain``,
``text/x-rst``, ``text/markdown``, ``text/html``, ...) or an empty
mime type. If the item's type is not empty, it will be used as the
Content-Type (similar to ``*/*``). Otherwise ``text/*`` will be the
Content-Type. If your tracker supports markup languages
(e.g. markdown), you should set the mime type (e.g. ``text/markdown``)
when storing your message.

Note that the header ``X-Content-Type-Options: nosniff`` is returned
with a non javascript or xml binary_content response to prevent the
browser from trying to interpret the returned data.

Legacy Method (HTML interface)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

With the addition of file binary content streaming in the rest
interface to Roundup 2.5.0, this method (using the html interface) is
considered legacy but still works.

To retreive the content, you can use the content link property:
``https://.../demo/msg11/``. The trailing / is required. Without the
/, you get a web page that includes metadata about the message. With
Expand Down
95 changes: 83 additions & 12 deletions roundup/rest.py
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,6 @@ class RestfulInstance(object):
__default_patch_op = "replace" # default operator for PATCH method
__accepted_content_type = {
"application/json": "json",
"*/*": "json",
}
__default_accept_type = "json"

Expand Down Expand Up @@ -2232,6 +2231,8 @@ def determine_output_format(self, uri):
3) if empty or missing Accept header
return self.__default_accept_type
4) match and return best Accept header/version
this includes matching mime types for file downloads
using the binary_content property
if version error found in matching type return 406 error
5) if no requested format is supported return 406
error
Expand Down Expand Up @@ -2281,21 +2282,55 @@ def determine_output_format(self, uri):
self.client.response_code = 406
return (None, uri, self.error_obj(
400, _("Unable to parse Accept Header. %(error)s. "
"Acceptable types: %(acceptable_types)s") % {
"Acceptable types: */*, %(acceptable_types)s") % {
'error': e.args[0],
'acceptable_types': " ".join(sorted(
'acceptable_types': ", ".join(sorted(
self.__accepted_content_type.keys()))}))

if not accept_header:
# we are using the default
return (self.__default_accept_type, uri, None)

accept_type = ""
valid_binary_content_types = []
if uri.endswith("/binary_content"):
request_path = uri
request_class, request_id = request_path.split('/')[-3:-1]
try:
designator_type = self.db.getclass(
request_class).get(request_id, "type")
except (KeyError, IndexError):
# class (KeyError) or
# id (IndexError) does not exist
# Return unknown mime type and no error.
# The 400/404 error will be thrown by other code.
return (None, uri, None)

if designator_type:
# put this first as we usually require exact mime
# type match and this will be matched most often.
# Also for text/* Accept header it will be returned.
valid_binary_content_types.append(designator_type)

if not designator_type or designator_type.startswith('text/'):
# allow text/* as msg items can have empty type field
# also match text/* for text/plain, text/x-rst,
# text/markdown, text/html etc.
valid_binary_content_types.append("text/*")

# Octet-stream should be allowed for any content.
# client.py sets 'X-Content-Type-Options: nosniff'
# for file downloads (sendfile) via the html interface,
# so we should be able to set it in this code as well.
valid_binary_content_types.append("application/octet-stream")

for part in accept_header:
if accept_type:
# we accepted the best match, stop searching for
# lower quality matches.
break

# check for structured rest return types (json xml)
if part[0] in self.__accepted_content_type:
accept_type = self.__accepted_content_type[part[0]]
# Version order:
Expand All @@ -2311,6 +2346,8 @@ def determine_output_format(self, uri):
# use default if version = None
try:
self.api_version = int(part[1]['version'])
if self.api_version not in self.__supported_api_versions:
raise ValueError
except KeyError:
self.api_version = None
except (ValueError, TypeError):
Expand All @@ -2323,17 +2360,45 @@ def determine_output_format(self, uri):
return (None, uri,
self.error_obj(406, msg))

if part[0] == "*/*":
if valid_binary_content_types:
self.client.setHeader("X-Content-Type-Options", "nosniff")
accept_type = valid_binary_content_types[0]
else:
accept_type = "json"

# check type of binary_content
if part[0] in valid_binary_content_types:
self.client.setHeader("X-Content-Type-Options", "nosniff")
accept_type = part[0]
# handle text wildcard
if ((part[0] in 'text/*') and
"text/*" in valid_binary_content_types):
self.client.setHeader("X-Content-Type-Options", "nosniff")
# use best choice of mime type, try not to use
# text/* if there is a real text mime type/subtype.
accept_type = valid_binary_content_types[0]

# accept_type will be empty only if there is an Accept header
# with invalid values.
if accept_type:
return (accept_type, uri, None)

self.client.response_code = 400
if valid_binary_content_types:
return (None, uri,
self.error_obj(
406,
_("Requested content type(s) '%s' not available.\n"
"Acceptable mime types are: */*, %s") %
(self.client.request.headers.get('Accept'),
", ".join(sorted(
valid_binary_content_types)))))

return (None, uri,
self.error_obj(
406,
_("Requested content type(s) '%s' not available.\n"
"Acceptable mime types are: %s") %
"Acceptable mime types are: */*, %s") %
(self.client.request.headers.get('Accept'),
", ".join(sorted(
self.__accepted_content_type.keys())))))
Expand Down Expand Up @@ -2597,14 +2662,20 @@ def format_dispatch_output(self, accept_mime_type, output,

output = '<?xml version="1.0" encoding="UTF-8" ?>\n' + \
b2s(dicttoxml(output, root=False))
elif accept_mime_type:
self.client.setHeader("Content-Type", accept_mime_type)
# do not send etag when getting binary_content. The ETag
# is for the item not the content of the item. So the ETag
# can change even though the content is the same. Since
# content is immutable by default, the client shouldn't
# need the etag for writing.
self.client.setHeader("ETag", None)
return output['data']['data']
else:
# FIXME?? consider moving this earlier. We should
# error out before doing any work if we can't
# display acceptable output.
self.client.response_code = 406
output = ("Requested content type '%s' is not available.\n"
"Acceptable types: %s" % (accept_mime_type,
", ".join(sorted(self.__accepted_content_type.keys()))))
self.client.response_code = 500
output = _("Internal error while formatting response.\n"
"accept_mime_type is not defined. This should\n"
"never happen\n")

# Make output json end in a newline to
# separate from following text in logs etc..
Expand Down
Loading

0 comments on commit ba0e68a

Please sign in to comment.