diff --git a/src/docs/articles/why-use-it.md b/src/docs/articles/why-use-it.md new file mode 100644 index 0000000..528713d --- /dev/null +++ b/src/docs/articles/why-use-it.md @@ -0,0 +1,240 @@ +--- +title: Why Describo? Where does it fit? +aside: false +--- + +# Why Describo? Where does it fit? + +Author: Marco La Rosa, 25/7/2024 + +::: tip Summary + +Describo is a flexible tool designed for researchers, librarians, and archivists working with +text-based content in the early stages of the research data lifecycle. It provides capabilities for +data manipulation, AI-assisted analysis, metadata creation, and standardized output, bridging the +gap between messy workspaces and structured repositories. While complementary to national data +initiatives, Describo focuses on supporting the research process itself, allowing users to work with +their data in customized ways while still producing well-described, standards-compliant research +objects. + +> Attribution: I asked the Describo Assistant to summarise this page for me. + +::: + +I was recently asked whether I had reached out to the +[ARDC HASS and Indigenous Research Data Commons](https://ardc.edu.au/hass-and-indigenous-research-data-commons/) +to see whether there was potential for Describo to become part of a national data platform. I +started writing a response but then realised that despite all the content on this website, I hadn't +articulated "Why" anyone would actually use Describo and how it relates to broader national +initiatives. + +In this article I will attempt to answer those questions. Although I'll refer to Australian national +initiatives, I believe this applies to all national initiatives around the world. + +The [Strategy page on the ARDC website](https://ardc.edu.au/about-us/our-strategy/) describes their +mission as _"accelerate research and innovation by driving excellence in the creation, analysis and +retention of high-quality data assets"_. To support this mission there are a number of program areas +and services to both manipulate data and make it accessible. For example, in the HASS and Indigenous +space, this means services like the [Language Data Commons](https://www.ldaca.edu.au/) and the +[Indigenous Data Network](https://idnau.org/). + +If a research project exists on a continuum spanning from idea to outcome, then these services live +on the right hand side of it. That is, they are there to collect and make accessible work that has +come from the research process; meaning that whilst the work is happening, it's likely not ready to +live in those services. + +In reality, the research process and its associated lifecycle is better thought of as a set of +interrelated stages. + + + +## Workspaces + +In the image we see `Workspaces` on the left in that lovely shade of pink. This is where the work is +done and it is circular because as one gets to know the problem better, they can then further +collect, refine and develop the work; which leads to a better understanding which leads back to +collection and on it goes. + +Workspaces encompass the tools and services that the user needs to perform their work. Workspaces +are messy - just like research is - because they can include anything given that workflows differ by +domain and often by user. Some users might have workspaces based on Python Notebooks whilst others +just need Microsoft Word. There is no wrong answer on this side of the diagram. + +### Online / Shared Workspaces + +In many domains shared, online workspaces are an important part of the complement of tools available +to users provided that the user can upload their data to it given access conditions, ethics +requirements etc. + +Further on I give an example of how the Nyingarn Project had to navigate issues around a shared, +online workspace. Issues that Describo mitigates. + +### Describo lives here + +Describo is for people working with text based content in various formats. It provides tools for +them to manipulate their data and transform it; mine it for information using AI tools and cloud +services; describe what they're finding as linked data entity relationships; and ultimately, publish +their work. + +Describo produces data objects in a standardised format: the +[Research Object Crate (RO-Crate)](https://www.researchobject.org/ro-crate/). So, as the work is +happening, the user can be sure that they will have a sensible data object as and when it's +required. + +But Describo doesn't limit what the user can do. That is up to them and it's designed to be flexible +enough to adapt to many different use cases as I describe later. + +## Repositories + +On the right of the image we have the `Repositories`. This is where the outputs of the research +process go to live when **it makes sense to do so**. I'm specifically highlighting that last +statement because it's a key point to understand. The point at which the process in the middle +(Reusable, Interoperable data objects) is triggered depends on the project and the work being done. +One size does not fit all. Furthermore, the repositories typically have very detailed requirements +that must be met for data to be accepted. + +Incomplete is ok on this side but messy is not. My colleague Dr Mike Jones recently wrote an +excellent article titled +[Rewilding humanities data](https://medium.com/@huni.humanities/rewilding-humanities-data-42d9ece249a2) +that brilliantly parallels data standardisation with the loss of diversity and value lost in tree +plantations. I quote: + +> But, like carefully aligned plantations of trees, there is a danger that the fertility of the +> system will be shortlived. Stripping away complexity means stripping away much of the meaning, +> while the wish to remain in control is too often predicated on centralised models of surveillance +> and the ceding of control to others. + +This is especially true on this side of the diagram. Typically, these `commons` services need to +enforce particular requirements in order for them to accept data. Using LDACA as an example that +means your data must be an RO-Crate (GOOD); the metadata must meet certain minimum requirements +(FINE); should conform to a custom +[Ontology ](https://github.com/Language-Research-Technology/ldac-profile/blob/master/profile/profile.md) +(AAAARRRGGGGHHHHH!!!!!!). + +There is a note at the very top explaining who the audience is but the point I want to make is that +this is not atypical of repositories. Specifically, a constrained set of requirements for data +acceptance with a high barrier of entry regardless. + +### ARDC (largely) lives here + +Let it be known that **I'm not advocating against the work of the ARDC or the funded projects**. The +work that is being done is _A Good Thing™_ but that doesn't mean that we shouldn't be aware of +the compromises required to make that work. + +## So how does Describo relate to the national initiatives? + +In short: it's complementary. + +Describo's target audience is the librarian, archivist, historian who is working to make sense of +text based content. They want to understand it; describe it; reason about it; and finally, make the +results of their efforts - their scholarship - available to a wider audience. For this user Describo +offers tools to help them in their workflow as described in the next section. And its flexibility +allows them to do the messy work of research in the way that suits them. + +Describo is complementary to the national programs because in the end, the user is left with a +research data object that is well described, in standard metadata supported by those initiatives. + +Whilst being complementary to the national initiatives, at this time, Describo is not a part of +them. My hope is that in time this will change. + +## Describo Persona's + +If the discussion above is correct, we now know where Describo fits into the landscape so let's +consider why anyone would want to use it. + +### What's in the box? + + + + + +### What's in the book? + + + + + +### I don't know what I don't know + + + + + +Hopefully this article has made clear the position that Describo aims to take. If you have any +questions or comments, please start a conversation below! + + diff --git a/src/public/images/articles/why/archive-box.jpg b/src/public/images/articles/why/archive-box.jpg new file mode 100644 index 0000000..a12a072 Binary files /dev/null and b/src/public/images/articles/why/archive-box.jpg differ diff --git a/src/public/images/articles/why/archive-box.webp b/src/public/images/articles/why/archive-box.webp new file mode 100644 index 0000000..1a9bfb2 Binary files /dev/null and b/src/public/images/articles/why/archive-box.webp differ diff --git a/src/public/images/articles/why/diary.webp b/src/public/images/articles/why/diary.webp new file mode 100644 index 0000000..3e67613 Binary files /dev/null and b/src/public/images/articles/why/diary.webp differ diff --git a/src/public/images/articles/why/policy.jpg b/src/public/images/articles/why/policy.jpg new file mode 100644 index 0000000..7addcb6 Binary files /dev/null and b/src/public/images/articles/why/policy.jpg differ diff --git a/src/public/images/articles/why/policy.webp b/src/public/images/articles/why/policy.webp new file mode 100644 index 0000000..9dacb82 Binary files /dev/null and b/src/public/images/articles/why/policy.webp differ diff --git a/src/public/images/articles/why/research-data-lifecycle.png b/src/public/images/articles/why/research-data-lifecycle.png new file mode 100644 index 0000000..66556d0 Binary files /dev/null and b/src/public/images/articles/why/research-data-lifecycle.png differ diff --git a/src/public/images/articles/why/research-data-lifecycle.webp b/src/public/images/articles/why/research-data-lifecycle.webp new file mode 100644 index 0000000..870cadc Binary files /dev/null and b/src/public/images/articles/why/research-data-lifecycle.webp differ