Skip to content

Commit

Permalink
add article about creating a dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
marcolarosa committed Oct 25, 2024
1 parent c40f020 commit b355aac
Show file tree
Hide file tree
Showing 45 changed files with 233 additions and 3 deletions.
2 changes: 2 additions & 0 deletions .vitepress/theme/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ import MyLayout from "./MyLayout.vue";

// register components globally
import { FontAwesomeIcon } from "@fortawesome/vue-fontawesome";
import ButtonComponent from "../../src/vue-components/Button.vue";
import CardComponent from "../../src/vue-components/Card.vue";
import Disqus from "../../src/vue-components/Disqus.vue";
import FeatureComponent from "../../src/vue-components/Feature.vue";
Expand All @@ -78,6 +79,7 @@ export default {
...DefaultTheme,
Layout: MyLayout,
enhanceApp({ app }) {
app.component("Button", ButtonComponent);
app.component("CardComponent", CardComponent);
app.component("Disqus", Disqus);
app.component("FeatureComponent", FeatureComponent);
Expand Down
217 changes: 217 additions & 0 deletions src/docs/articles/creating-a-dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
---
title: Defining a vocabulary and creating a dataset
outline: [2, 3]
---

# Defining a vocabulary and using it to create a dataset

Describo is built around RO-Crate which primarily uses schema.org. But what do you do if you need to
describe things that are not defined by schema.org? And how can you stand on the shoulders of the
many Ontologies that came before?

This article will show you, step by step, how to:

- create a Vocabulary for describing TKLabels;
- use the Vocabulary to create the dataset of TKLabel definitions;
- use the AI Assistant to verify the data you created against the RO-Crate spec;
- and finally, load it into your personal datastore in Describo for use in other projects.

::: tip info

- You will need access to the Vocabulary tool in Describo so be sure to
[register](/docs/guide/register.html) which will give you two months of free access to the tool.

- In order to perform the verification steps you will need to
[purchase credits to use the Assistant.](/docs/guide/purchase-credits.html)

- [Detailed documentation for the Vocabulary tool is available here.](/docs/guide/vocabulary.html).

:::

::: warning info

This article does not go into detail about every little feature. It is designed as a step by step
how to that is supposed to be read in conjunction with
[the detailed documentation for the Vocabulary tool.](/docs/guide/vocabulary.html)

:::

## Definitions

- A `class` is a definition for a type of `entity`. For example, defining a class `TKLabel` will
enable you to create entities whose type is `TKLabel`.

## Licensing your work

Too often, licensing is an afterthought. With Describo, licensing your data and metadata is trivial.

Start Describo and select a folder to work in. We'll start by creating the data and metadata
licenses. Look in the navigation bar for a button named&nbsp;<Button>Set Licence</Button>. The
following two images show the license creation dialog. In the first we're defining the data license
(CC-BY-SA 4.0) and in the second, the metadata license (CC0 Public Domain).

<ImageComponent src="/images/articles/creating-a-dataset/dataset1.webp"></ImageComponent>
<ImageComponent src="/images/articles/creating-a-dataset/dataset3.webp"></ImageComponent>

After we've created the licenses we see that the root dataset has the CC-BY-SA license.

<ImageComponent src="/images/articles/creating-a-dataset/dataset2.webp"></ImageComponent>

Whilst the root descriptor has the CC0 license.

<ImageComponent src="/images/articles/creating-a-dataset/dataset4.webp"></ImageComponent>

## Creating the Vocabulary

### Setup

Load the vocabulary tool. On this page you can define the Vocabulary metadata like name, description
and version.

<ImageComponent src="/images/articles/creating-a-dataset/dataset5.webp"></ImageComponent>

### Classes

Navigate to the classes tab.

To create a new class, type the name into the input next to `Add a class to your vocabulary`. As you
type, Describo will lookup the Ontologies known to it. If you find a definition that matches what
you want to describe, select it. Otherwise, type in the class name you want and press enter to
select it.

In this case, create the class `TKLabel` (remember to press enter). Note that it's capitalised and
singular. Also note that we've set the definition to `override`. This means that the class
definition will define the properties so Describo will not traverse the inheritance hierarchy to
determine which properties to show.

Also note that the class hierarchy can be changed here. By default, any class added (from either an
Ontology or invented by you) will be subclassed to Thing. You can change that here. In addition,
your class can have multiple parent classes.

<ImageComponent src="/images/articles/creating-a-dataset/dataset6.webp"></ImageComponent>

### Properties - TKLabel class

Navigate to the properties tab and select the `TKLabel` class.

Upon selection you will see the properties from the hierarchy (inherited from the default parent
Thing). This happens when the class definition is first created even though you set the definition
to override.

<ImageComponent src="/images/articles/creating-a-dataset/dataset7.webp"></ImageComponent>

Go ahead and delete all of those properties. In this example, we just need to define the `@id` and
`name` properties which Describo always makes available.

<ImageComponent src="/images/articles/creating-a-dataset/dataset8.webp"></ImageComponent>

### Properties - Dataset

In order to use the TKLabel class, we need to say `where` it can be used. In this case, we'll edit
the `hasPart` property on Dataset to specify that we can add TKLabel entities at that point. In the
following image we've loaded the Dataset class on the properties tab and expanded the hasPart
property.

Notice the select where we can see TKLabel as an option? Describo allows you to associate anything
you've defined at any point in the hierarchy.

<ImageComponent src="/images/articles/creating-a-dataset/dataset9.webp"></ImageComponent>

By Selecting TKLabel in the select box we see (in the following image) that the hasPart property on
Dataset will now allow TKLabel entities to be created and associated.

<ImageComponent src="/images/articles/creating-a-dataset/dataset10.webp"></ImageComponent>

Before we leave, let's just check the Vocabulary. Press the button labelled <Button>Open the
Vocabulary definition in your browser</Button>. When you do, a new window will open with the Vocab
presented as HTML.

<ImageComponent src="/images/articles/creating-a-dataset/dataset11.webp"></ImageComponent>

And finally, be sure to press the button named <Button>Save Vocabulary</Button> to not only save it,
but also load it into Describo for use.

## Using your Vocabulary

Navigate from the Vocabulary section back to the Describe section.

The Describo metadata editor will show in the middle pane. Go to the `Content` tab.

In the following image we can see the `hasPart` property and the ability to add entities of type
CreativeWork, URL, File, Dataset and our new class `TKLabel`. By pressing the TKLabel button we can
start creating the TKLabel definitions.

<ImageComponent src="/images/articles/creating-a-dataset/dataset12.webp"></ImageComponent>

However, since there are 20 of them, let's use the Bulk Add control. In the next image we see a
dropdown where we've set the @type to TKLabel and defined 6 of the labels using the URL as the value
of the @id property and the name as the value of the name property.

<ImageComponent src="/images/articles/creating-a-dataset/dataset13.webp"></ImageComponent>

When we <Button>Create these entities</Button> we get:

<ImageComponent src="/images/articles/creating-a-dataset/dataset14.webp"></ImageComponent>

And so we can continue to create all of the other TK Labels.

Don't forget to add the other metadata like a description and define yourself as an author!

## Verifying your work

::: tip info

You will need credits to use the Assistant for this part.

:::

An important part of any project is verifying your work. On the Describe tab, in the left panel
where the file browser is, there is another tab `Verify`. Select it and press the button
&nbsp;<Button>Start the assistant</Button>.

<ImageComponent src="/images/articles/creating-a-dataset/dataset15.webp"></ImageComponent>

When you do, the assistant will read the RO-Crate spec and your metadata. When it's ready, you can
query your data for correctness against the spec.

<ImageComponent src="/images/articles/creating-a-dataset/dataset16.webp"></ImageComponent>

In the following image we pressed the button <Button>Check root dataset</Button>. In the message
view we see the question (in the blue box) and the assistant's response (in green).

<ImageComponent src="/images/articles/creating-a-dataset/dataset17.webp"></ImageComponent>

In this way, you can direct the assistant to check your work.

## Using your dataset

So you've now created a dataset that you can use in other work. The question is how do we make it
available for lookups in other projects? Firstly, save your work by pressing <Button>Save
Metadata</Button> and then go back to the dashboard.

Notice the highlighted section in the following screenshot. That's where we manage our saved
templates (which are like our very own personal knowledge base). In the screenshot you can see that
there are two templates: one describing the Describo organisation and one describing me as a Person.

<ImageComponent src="/images/articles/creating-a-dataset/dataset18.webp"></ImageComponent>

Press the button <Button>Load templates from another dataset</Button>. A sidebar will open where you
can select a folder containing an RO-Crate. Select the folder where you just created the TKLabels
dataset. Describo will load the RO-Crate metadata and show you the entity types it contains. In this
case we want to ingest all of the TKLabel entities (20 of them) so select that and
press&nbsp;<Button>Load</Button>.

<ImageComponent src="/images/articles/creating-a-dataset/dataset19.webp"></ImageComponent>

When it's done, those entities will now be loaded into your personal datastore.

<ImageComponent src="/images/articles/creating-a-dataset/dataset20.webp"></ImageComponent>

## Next steps

So if you've made it here then you've created your own TKLabels dataset and made it available to
Describo for future use.

In your next project, you can create a Vocabulary where you might define a property on File entities
called `hasTkLabel` and specify that it should have a TKLabel associated. Then, Describo will offer
these as lookups for you to use when describing your files!
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
3 changes: 3 additions & 0 deletions src/vue-components/Button.vue
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<template>
<div class="inline text-sm bg-blue-500 text-white py-1 px-2 rounded"><slot></slot></div>
</template>
4 changes: 3 additions & 1 deletion src/vue-components/Feature.vue
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
<template>
<div
class="flex flex-row space-x-3 md:place-content-center items-center py-4 px-4 lg:p-4 rounded-lg"
class="flex flex-row space-x-3 items-center py-4 px-4 lg:p-4 rounded-lg"
:class="{ 'place-content-center': props.center }"
>
<div class="text-slate-700" v-if="props.icon">
<font-awesome-icon :icon="props.icon" :size="props.size" class="text-indigo-600" />
Expand All @@ -23,5 +24,6 @@ const props = defineProps({
target: { type: String, default: "" },
icon: { type: Object },
size: { type: String, default: "2x" },
center: { type: Boolean, default: true },
});
</script>
10 changes: 8 additions & 2 deletions src/vue-components/FeatureArticles.vue
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@
</el-radio-group>
</div>

<div class="flex flex-row flex-wrap">
<div class="grid grid-flow-row-dense grid-cols-1 md:grid-cols-2 gap-1">
<div v-for="article of displayedArticles">
<Feature :link="article.link" class="border border-slate-300 rounded-sm m-1">
<Feature :link="article.link" class="bg-blue-100 rounded-sm" :center="false">
<template #title>{{ article.title }}</template>
<template #content>
<div class="flex flex-col">
Expand Down Expand Up @@ -57,6 +57,12 @@ const articles = [
text: "See how the e-discovery tools in Describo can help you.",
keywords: ["e-discovery", "assistant"],
},
{
title: "Defining a vocabulary and using it to create a dataset",
link: "/docs/articles/creating-a-dataset",
text: "See how to define a vocabulary and then use it to define a domain",
keywords: ["vocabulary", "dataset"],
},
];
const keywords = uniq(flattenDeep(articles.map((a) => a.keywords)).sort());
Expand Down

0 comments on commit b355aac

Please sign in to comment.