-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a start landmark (to the PDF Profile) #88
base: master
Are you sure you want to change the base?
Conversation
This adds a simple PDF profile to cover Manifests that are comprised only of PDF files (media type of `application/pdf`). Beyond this the profile carries no other requirements at this time, though several possible extensions are possible for properties of link objects. These possibilities are: - `tagged` A boolean property that would indicate if a PDF is semantically tagged to denote its structure. This can have implications for navigation and accessibility within PDF clients/renderers. - `version` A controlled string property that would define the specific version of the PDF (e.g. `1.3`, `1.5`, `2.0`, ...) - `archival` A boolean property that would work in conjunction with `version` to indicate if a PDF is a [PDF-A](https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml) file.
This adds a new section that describes the link parametrs that can be provided to `href` strings in profiles that conform. At present this is solely the `start=n` parameter, but further parameters can be added.
|
||
| Key | Semantics | Type | Values | | ||
| ----- | --------- | -------- | --------- | | ||
| [start](#start) | Specifies the initial page of the PDF to display when displaying this resource | Integer | 1 to (page count of current resource) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing fragments in the reading order might introduce a lot of complexities in the toolkits. We'll need to talk about this, would you be able to come to one of our weekly Zoom meetings (next one)?
That being said, you could use the fragment identifier page
instead of start
, as it is widely supported for PDF:
The list of PDF-open parameters and the action they imply is:
page=<pagenum>
Open the specified (physical) page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @mickael-menu, we want to avoid fragments in the reading order.
This use case isn't restricted to PDF either and would be better addressed in the main spec IMO:
- if this matches a resource, specifying
start
as arel
in thereadingOrder
is enough - if this matches a fragment of a resource, then a new Link Object in
links
would be the better option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, would be happy chat about this! I may not be able to make next weeks, but will try to attend one soon. I'm curious about the fragment discussion and what we can do to support features like this without introducing too much complexity.
@HadrienGardeur can you explain your second point regarding representing fragments as a new object in links
? I'm not sure I follow how this addresses the functionality we're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say that we have the following RWPM:
{
"metadata": {
"title": "Publication containing multiple PDF files",
"conformsTo": "http://readium.org/webpub-manifest/profiles/pdf"
},
"readingOrder": [
{
"href": "introduction.pdf",
"type": "application/pdf"
},
{
"href": "chapter1.pdf",
"type": "application/pdf"
},
{
"href": "chapter2.pdf",
"type": "application/pdf"
}
]
}
If you want to only include a subset of each PDF in the full publication, that's not something that we support with RWPM.
But if the first time that you open that publication, you'd like the jump straight to the first chapter, this could be supported using start
in readingOrder
:
{
"metadata": {
"title": "Publication containing multiple PDF files",
"conformsTo": "http://readium.org/webpub-manifest/profiles/pdf"
},
"readingOrder": [
{
"href": "introduction.pdf",
"type": "application/pdf"
},
{
"rel": "start",
"href": "chapter1.pdf",
"type": "application/pdf"
},
{
"href": "chapter2.pdf",
"type": "application/pdf"
}
]
}
If you need to point to a fragment instead of a resource, this could be handled using links
:
{
"metadata": {
"title": "Publication containing multiple PDF files",
"conformsTo": "http://readium.org/webpub-manifest/profiles/pdf"
},
"links": [
"rel": "start",
"href": "introduction.pdf?page=5"
],
"readingOrder": [
{
"href": "introduction.pdf",
"type": "application/pdf"
},
{
"href": "chapter1.pdf",
"type": "application/pdf"
},
{
"href": "chapter2.pdf",
"type": "application/pdf"
}
]
}
We talked about this on today's call. Nothing set in stone yet, but what came up:
|
Thank you for the feedback! I was unable to join the call, I actually don't think I have the meeting link, is there a way I could get that? As for the specifics here:
|
We paused the weekly calls for the summer but we'll be back end of August, I think. You can send a mail to contact@readium.org to request access to the Readium Slack workspace (mention this PR in the mail). The link and time is shared on the
Only the {
"href":"chapter1.pdf#page=32",
"type": "application/pdf"
}
Sure, that would be very interesting, thanks. Which PDF engine(s) are you using? |
Hey y'all, just checking in here on the progress of this. I have two pieces of input:
|
RWPM doesn't support fragments in To get the best compatibility, and since you have the info, you could process the PDFs by removing the blank pages before packaging them.
Fragment identifiers are not directly specified in the Link object, but we have a convention of using them in some cases as However they are mentioned in the specification of the Locator object. As they are specific to each media type,
It also identifies which fragment identifier specs are recognized:
In practice though, it really depends on what is implemented in each Navigator. For example |
I created a simpler PR to get something released soon, #97. |
Note that I would be in favour of a generic |
Note also that this "start" is in fact a landmark. There is a mechanism defined in Web Publications for landmarks, based on the EPUB solution. It is like a TOC, and therefore can handle fragments. |
This adds a simple PDF profile to cover Manifests that are comprised only of PDF files (a requirement that all resources have a media type of
application/pdf
).This accounts for a structure being used in some projects that represents collections of PDF files as a single resource that can be read by users. This enables such things as representing a resource as a set of PDF files with one per section/chapter, while still allowing for a unified reading experience.
This does not make any alterations to the Manifest, it simply requires that conforming manifests meet the standards laid out in it. The profile also specifies that
start
parameters may be specified in linkhref
strings to allow manifests to specify start pages to enable a feature to skip white space at the start of files.