Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple documents in YAML #46

Closed
VladimirAlexiev opened this issue Jul 4, 2022 · 7 comments · Fixed by #34
Closed

Multiple documents in YAML #46

VladimirAlexiev opened this issue Jul 4, 2022 · 7 comments · Fixed by #34
Labels
UCR Issue on Use Case/Recommendation
Milestone

Comments

@VladimirAlexiev
Copy link
Contributor

VladimirAlexiev commented Jul 4, 2022

Should YAML-LD allow or prohibit multiple documents in YAML?

  • Which YAML parsers support multiple documents?
  • What are useful examples of using multiple documents?
  • If we decide to use them in YAML-LD, how should they be represented? As RDF graphs?
  • Below I formulate a positive use case, but I'm not quite certain we want this because of its complexity

PLEASE VOTE with 👍 or 👎 , thanks!


Eg1: multiple identical keys are forbidden by YAML linters.
But they are ok if they are in different documents.
Example by @ioggstream from #42 (comment):

---
a: 1
...
---
a: 2
...

Eg2: YAML metadata followed by a markdown textual body is widely used in some blog/content management systems:

---
created: 2022-07-03
published: 2022-07-04
title: Frobnification
author: A. U. Thor
...
Frobnification was invented in prehistoric times.
It's a useful meta-process wherein...

As an information architect.
I want to be able to use multiple documents in YAML-LD.
So that I can transmit several closely related documents (graphs) together.

@VladimirAlexiev VladimirAlexiev added the UCR Issue on Use Case/Recommendation label Jul 4, 2022
@ioggstream
Copy link
Contributor

Some notes:

a. Theoretically speaking

  1. a YAML stream includes one or more documents
  2. a stream can be transmitted on the net or archived in a file

In python, when you parse a stream containing multiple documents
you need to use a yaml.safe_load_all instead of yaml.safe_load

b. not sure the eg2 provided above is valid yaml.

@ioggstream
Copy link
Contributor

Which YAML parsers support multiple documents?

In python, when you parse a stream containing multiple documents
you need to use a yaml.safe_load_all instead of yaml.safe_load

What are useful examples of using multiple documents?

In kubernetes, multiple YAML documents are bundled together
to describe related deployment units.

Another example could be bundling in a single file different related
datasets that should be imported (e.g metadata, data)
or (ontology, dataset).

If we decide to use them in YAML-LD, how should they be represented?

As different JSON-LD documents related between them

As RDF graphs?

Aren't they always RDF graphs?

from rdflib import Graph

g = Graph()
for document in yaml.safe_load_all("docs.yamlld"):
  g.parse(document, format="application/ld+yaml")

Below I formulate a positive use case, but I'm not quite certain we want this because of its complexity

I see it more as a bundling method. The complexity lies inside each document.

WDYT?

@VladimirAlexiev
Copy link
Contributor Author

@ioggstream

not sure the eg2 provided above is valid yaml

Does it look better now?

As different JSON-LD documents related between them

But how can we relate documents?

  • JSON and YAML have no idea of "URL" or "document at URL" and setting "base"
  • JSON-LD has @base but it sets the base for terms inside the doc, not "the semantic URL" of the doc itself

Aren't they always RDF graphs?

I agree they should be graphs. Then we need:

  • some way to denote or auto-generate graph IDs for each of the multiple docs (eg #1, #2...)
  • to figure out how it relates to @graph: by default they all go to the default graph (triples not quads)

Eg this

{"@context": {"@base": "http://example.org", "@vocab":"http://example.org/",
              "spouse":{"@type":"@id"},"statedIn":{"@type":"@id"}},
 "@id": "#bart", "spouse": "#marge", "statedIn": ""}

results in these triples (not quads)

<http://example.org#bart> <http://example.org/spouse> <http://example.org#marge> .
<http://example.org#bart> <http://example.org/statedIn> <http://example.org> .

@anatoly-scherbakov
Copy link
Contributor

My two cents about eg2. This form of writing is often known as front matter, originally proposed by Jekyll. Syntax:

---
title: My Cat
tags:
    - article
    - pets
---

My cat is the most handsome cat in the whole world.

A few examples of software that supports YAML front matter for Markdown documents:

I am using this format to source YAML-LD from the front matter.

However, this is not valid YAML and thus I do not believe it applies to the question at hand. Does it?

@gkellogg
Copy link
Member

gkellogg commented Jul 4, 2022

JSON-LD-API has some options and descriptions for processing multiple script elements within an HTML document using extractAllScripts, that would seem relevant.

@VladimirAlexiev
Copy link
Contributor Author

@anatoly-scherbakov
This is also used by pandoc.

I thought the second doc consists of one long string? But that would require some quoting or escaping, else colons and dashes at BOL will throw it off.
Agreed, strike eg2

@ioggstream
Copy link
Contributor

@VladimirAlexiev @gkellogg this will be mainly addressed in ietf-wg-httpapi/mediatypes#55

Thanks for this issue: without this the YAML media type would have missed this piece.

@anatoly-scherbakov wrt the document in the example is valid like @VladimirAlexiev said.

s=("""---
title: My Cat
tags:
    - article
    - pets
---

My cat is the most handsome cat in the whole world.
""")
for d in yaml.safe_load_all(s):
  print(d)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UCR Issue on Use Case/Recommendation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants