Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KType/KItem export/import #44

Open
MBueschelberger opened this issue Dec 19, 2024 · 0 comments
Open

KType/KItem export/import #44

MBueschelberger opened this issue Dec 19, 2024 · 0 comments
Assignees
Labels
📈 enhancement New feature or request

Comments

@MBueschelberger
Copy link
Member

MBueschelberger commented Dec 19, 2024

In the future, we want to entirely the dump the content of a kitem to various formats, e.g. json, yaml, aasx, hdf5.

As an initial step, we would need to dump the model with all of its fields to a dict, which then can be serialized to json or yaml:

Let us consider the following dict:

data = {
    "name": "MD_data_epoxy_resin",
    "id": "971c0409-fb46-4e22-822e-108f5300efe8",
    "slug": "md_data_epoxy_resin-971c0409",
    "ktype_id": "dataset",
    "created_at": "2024-12-19T05:52:05.915662",
    "updated_at": "2024-12-19T05:52:05.915662",
    "summary": None,
    "avatar_exists": False,
    "annotations": [
        {
            "iri": "https://w3id.org/dimat/BatchUploaded",
            "namespace": "https://w3id.org/dimat",
            "label": "BatchUploaded",
            "id": "971c0409-fb46-4e22-822e-108f5300efe8"
        },
        {
            "iri": "https://w3id.org/dimat/PropertyData",
            "namespace": "https://w3id.org/dimat",
            "label": "PropertyData",
            "id": "971c0409-fb46-4e22-822e-108f5300efe8"
        }
    ],
    "linked_kitems": [],
    "external_links": [],
    "contacts": [],
    "authors": [
        {
            "id": "971c0409-fb46-4e22-822e-108f5300efe8",
            "user_id": "ce234115-f053-4884-b9ca-ae7761eb1131"
        }
    ],
    "affiliations": [],
    "attachments": [
        {
            "id": "971c0409-fb46-4e22-822e-108f5300efe8",
            "name": "MD_data_epoxy_resin.xlsx"
        },
        {
            "id": "971c0409-fb46-4e22-822e-108f5300efe8",
            "name": "subgraph.ttl"
        }
    ],
    "kitem_apps": [
        {
            "id": "971c0409-fb46-4e22-822e-108f5300efe8",
            "title": "pilot-2_upload_carbon_epoxy",
            "kitem_app_id": 207,
            "executable": "pilot-2_upload_carbon_epoxy",
            "description": None,
            "tags": None,
            "additional_properties": {
                "triggerUponUpload": True,
                "triggerUponUploadFileExtensions": [".xlsx"]
            }
        }
    ],
    "custom_properties": {
        "id": "971c0409-fb46-4e22-822e-108f5300efe8",
        "content": {
            "sections": [
                {
                    "id": "id17345875421574uce0p",
                    "name": "General",
                    "entries": [
                        {"id": "id1734587542157e3d8ip", "type": "Text", "label": "Name", "value": "Expoxy Resin"},
                        {"id": "id1734587542157xggcen", "type": "Text", "label": "Version", "value": "v01"},
                        {"id": "id1734587542157p18gqp", "type": "Text", "label": "Description", "value": "Resin for aeronautical application"},
                        {"id": "id1734587542157nmuw93", "type": "Text", "label": "Keywords", "value": "resin, epoxy"},
                        {"id": "id1734587542157xk8ytt", "type": "Number", "label": "MassDensity", "value": 1200},
                        {"id": "id1734587542157etvvgt", "type": "Text", "label": "Anisotropy", "value": "Isostropic"}
                    ]
                }
            ]
        }
    },
    "rdf_exists": True,
    "access_url": "https://cmdb.materials-data.space/knowledge/dataset/md_data_epoxy_resin-971c0409"
}

NOTE: The content of the attachment is not downloaded, the dataframe and the subgraph is not present and the avatar (image represented as binary) is missing. This needs to be added to a potential export function.

The export to hdf5 and aasx requires an additional effort, since we would need a wrapper for both formats.

The easiest case is to start with the hdf5.

A potential hdf5-kitem-wrapper can look like this (not tested, just AI-prompted):

import h5py

with h5py.File('data.h5', 'w') as hdf:
    # Store top-level attributes
    for key in ['name', 'id', 'slug', 'ktype_id', 'created_at', 'updated_at', 'summary', 'avatar_exists', 'rdf_exists', 'access_url']:
        hdf.create_dataset(key, data=data[key])
    
    # Store annotations
    annotations_group = hdf.create_group('annotations')
    for i, annotation in enumerate(data['annotations']):
        annotation_group = annotations_group.create_group(f'annotation_{i}')
        for key, value in annotation.items():
            annotation_group.create_dataset(key, data=value)

    # Store linked_kitems if any
    linked_kitems_group = hdf.create_group('linked_kitems')
    for i, linked_kitem in enumerate(data['linked_kitems']):
        linked_kitem_group = linked_kitems_group.create_group(f'linked_kitem_{i}')
        for key, value in linked_kitem.items():
            linked_kitem_group.create_dataset(key, data=value)

    # Store attachments
    attachments_group = hdf.create_group('attachments')
    for i, attachment in enumerate(data['attachments']):
        attachment_group = attachments_group.create_group(f'attachment_{i}')
        for key, value in attachment.items():
            attachment_group.create_dataset(key, data=value)

    # Store kitem_apps
    kitem_apps_group = hdf.create_group('kitem_apps')
    for i, app in enumerate(data['kitem_apps']):
        app_group = kitem_apps_group.create_group(f'app_{i}')
        for key, value in app.items():
            if key == 'additional_properties':
                for prop_key, prop_value in value.items():
                    app_group.create_dataset(f'additional_properties/{prop_key}', data=prop_value)
            else:
                app_group.create_dataset(key, data=value)

    # Store custom_properties
    custom_properties_group = hdf.create_group('custom_properties')
    for key, value in data['custom_properties'].items():
        if key == 'content':
            content_group = custom_properties_group.create_group('content')
            for section in value['sections']:
                section_group = content_group.create_group(section['id'])
                for entry in section['entries']:
                    section_group.create_dataset(entry['label'], data=entry['value'])
        else:
            custom_properties_group.create_dataset(key, data=value)

The Kitem hence needs to be extended with a export function. This export function can receive an enum for all support export formats:

from enum import Enum

class Format(Enum, str):

    JSON = "json"
    YAML = "yaml"
    HDF5 = "hdf5"


class KItem(BaseModel):

    [...]
    
    def export(self, format: Format) -> Any:
        [....]

Same would need to be implemented for the KTypes.

Later, a corresponding import-function needs to be implemented:

from dsms import import_kitem, import_ktype, Fomat, DSMS

dsms = DSMS(env=".env")

kitem = import_kitem("path/to/file", format=Format.hdf5)
ktype = import_ktype("path/to/file", fomat=Format.hdf5)

dsms.commit()

@MBueschelberger MBueschelberger added the 📈 enhancement New feature or request label Dec 19, 2024
@MBueschelberger MBueschelberger changed the title KType/KItem serialization KType/KItem export/import Dec 19, 2024
@MBueschelberger MBueschelberger transferred this issue from MI-FraunhoferIWM/data2rdf Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants