GitHub

Notion: https://elianna.notion.site/elianna/Automating-Metadata-a8acd0a54e05497dad0faf3ada5f1708

The goal of this project is to ignore the current metadata system and start over. We want to know

What metadata is necessary for science?
- Resources
  - elicit.org - This search engine is already scraping metadata through GPT-3 including customization of which data you the user wants. Sometimes it doesn’t return results.
    - I’m not sure if they use GPT for the base metadata (authors, etc) or if they use Semantic Scholar’s API.
  - https://www.researchobject.org/ro-crate/1.1/ -> a metadata standard.
    - https://research.manchester.ac.uk/en/publications/ro-crate-metadata-specification-111 - the documentation for the project above.
  - Schema.org
  - FAIR data standards
  - Data standards: (
    1. Data origin: experimental, observational, raw or derived, physical collections, models, images, etc.
    2. Data type: integer, Boolean, character, floating point, etc.
    3. Instrument(s) used
    4. Data acquisition details: sensor deployment methods, experimental design, sensor calibration methods, etc.
    5. File type: CSV, mat, xlsx, tiff, HDF, NetCDF, etc.
    6. Data processing methods, software used
    7. Data processing scripts or codes
    8. Dataset parameter list, including
      - Variable names
      - Description of each variable
      - Units
- Elaboration
  - We should explore this at a granular level at the research object (instead of just research node). What is the metadata for code, pdfs, etc?
  - We’re in the exploratory phase -> we want to see what data is in these papers. What can we get out of this?
  - Error in GPT isn’t a super huge problem at the moment -
    - We’re trying to find ways to make things better!
How can we eliminate the need for scientists to enter that metadata?
- Hypothesis: GPT!
- Other?
What else?

What we have

A section for metadata on the node platform.
- Build out/explore what the Nodes team has discovered through scraping PDFs
- later on down the road -> integrated into search. But shouldn't think of this as integrating into the nodes platform.
Emerging Problem statements - (To be edited!)
- There is no understanding of what metadata is prevalent across all research. There is no standard for that.
  - Connected to ontology.
- Longer term problem statement -> researchers don't have the knowledge processes/workflows to correctly enter metadata in a standardized format.
  - metadata that is unstandardized and it is difficult to read and use.
  - Difficult to seamlessly tie one piece of research to another (what are the relations between research) ununified scientific record.
  - How might we unify the scientific record w metadata standards that can be accessed by different platforms/gateways so that they can use the research in any way they want.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
README.md		README.md
metadata_bot.py		metadata_bot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 3

Languages

Plikt/automatingMetadata

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages