Skip to content

cognitive-systems-lab/pyNEEM

Repository files navigation

pyneem

Python interface for NEEMhub and OpenEASE.

Integration Daigram

Neem-Structure

ToDo PyNeem

ToDo NEEM-Convertor

  • Get access to MongoDB
  • Get a hidden test dataset in OpenEASE
  • Which backend for build-hook execution?

query procedure

  1. query to openease -> events-ids + time-invervals
  2. get trial-ids (top-level-neem) for events-ids
  3. filter files by data-properties and meta-data
  4. download needed files
  5. create dataset: cut dataset snippets

DVC installation instructions

  1. Install hadoop: https://linuxconfig.org/ubuntu-20-04-hadoop
  2. make sure you added the exports for the hadoop path to your local users .bashrc (otherwise only the hadoop user can pull neems mit dvc)
  3. pip3 install dvc[hdfs]
  4. git clone https://neemgit.informatik.uni-bremen.de/neems/ease-2020-pr2-setting-up-table
  5. got to new folder and dvc pull

NeemHub synchronize

Synchronization encapsulates git and dvc operations. It either clones the remote repository or

if remote not exists:
    -> FAIL

if local exists:
    if is git-repo:
        fetch remote data from neemgit
    else:
        clone remote data
    if conflicts: FAIL
    else: merge
    if local changes:
        if not dvc repos:
            dvc init
        dvc add all
        dvc push all
        git commit -am
        git push   
else:
    clone neemgit

NeemHub Download

  1. Create query to OpenEASE + Specify Datasets
  2. Transform query into a file requesting query
  3. For each dataset: Submit query to to OpenEASE
  4. For each dataset in reply: checkout dvc files
  5. For each file-uri in query pull file from NEEM-Hub
  6. Transform query result to proper format (possible csv or pd.DataFrame)

NeemHub Upload

  1. Check which files have been changed till last filepull
  2. For new files is there a newer version online ? -> If yes, cancel upload -> problem needs to be solved manually
  3. Commit changed data into local HDFS
  4. Create new dvc-files
  5. push upoload to remote NeemHub HDFS
  6. push into NeemGit

Query result for dataset-builder

Table: filename

Semantic mapping between ELAN-Annotations and SOMA

  • The associated research field seems to be ontology alignment / semantic integration
  • The term ontology mapping also exists is however not clearly defined, but seems to fit better in this context, since we are not mapping onto-to-onto but (annotation-scema)-to-onto
  • there are three kinds of properties in an ELAN file: fully mapped properties, ignored properties (not mapped) and undefined properties
  • with converting ELAN to a DataFrame, we have a relational representation
  • most promising candidates are for the moment:
    • rdflib.csv2rdf → not documented and therefore obscure
    • rdfpandas → not powerfull enough
    • pyTARQL, where TARQL is SPARQL for Table

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages