pyneem

Python interface for NEEMhub and OpenEASE.

ToDo PyNeem

tutorial / shell script for installing dependencies on debian based systems
Which steps are necessary for Automatic Data Download?
Which steps are necessary for Automatic Data Upload?
How to do https://dvc.org/doc/api-reference
Look at: https://gitpython.readthedocs.io/en/stable/
Define Interface for dataset creator!
Add File Meta-Data to Ontology / Tripple store

ToDo NEEM-Convertor

Get access to MongoDB
Get a hidden test dataset in OpenEASE
Which backend for build-hook execution?

query procedure

query to openease -> events-ids + time-invervals
get trial-ids (top-level-neem) for events-ids
filter files by data-properties and meta-data
download needed files
create dataset: cut dataset snippets

DVC installation instructions

Install hadoop: https://linuxconfig.org/ubuntu-20-04-hadoop
make sure you added the exports for the hadoop path to your local users .bashrc (otherwise only the hadoop user can pull neems mit dvc)
pip3 install dvc[hdfs]
git clone https://neemgit.informatik.uni-bremen.de/neems/ease-2020-pr2-setting-up-table
got to new folder and dvc pull

NeemHub synchronize

Synchronization encapsulates git and dvc operations. It either clones the remote repository or

if remote not exists:
    -> FAIL

if local exists:
    if is git-repo:
        fetch remote data from neemgit
    else:
        clone remote data
    if conflicts: FAIL
    else: merge
    if local changes:
        if not dvc repos:
            dvc init
        dvc add all
        dvc push all
        git commit -am
        git push   
else:
    clone neemgit

NeemHub Download

Create query to OpenEASE + Specify Datasets
Transform query into a file requesting query
For each dataset: Submit query to to OpenEASE
For each dataset in reply: checkout dvc files
For each file-uri in query pull file from NEEM-Hub
Transform query result to proper format (possible csv or pd.DataFrame)

NeemHub Upload

Check which files have been changed till last filepull
For new files is there a newer version online ? -> If yes, cancel upload -> problem needs to be solved manually
Commit changed data into local HDFS
Create new dvc-files
push upoload to remote NeemHub HDFS
push into NeemGit

Query result for dataset-builder

Table: filename

Semantic mapping between ELAN-Annotations and SOMA

The associated research field seems to be ontology alignment / semantic integration
The term ontology mapping also exists is however not clearly defined, but seems to fit better in this context, since we are not mapping onto-to-onto but (annotation-scema)-to-onto
there are three kinds of properties in an ELAN file: fully mapped properties, ignored properties (not mapped) and undefined properties
with converting ELAN to a DataFrame, we have a relational representation
most promising candidates are for the moment:
- rdflib.csv2rdf → not documented and therefore obscure
- rdfpandas → not powerfull enough
- pyTARQL, where TARQL is SPARQL for Table

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

pyneem

ToDo PyNeem

ToDo NEEM-Convertor

query procedure

DVC installation instructions

NeemHub synchronize

NeemHub Download

NeemHub Upload

Query result for dataset-builder

Semantic mapping between ELAN-Annotations and SOMA

Files

README.md

Latest commit

History

README.md

File metadata and controls

pyneem

ToDo PyNeem

ToDo NEEM-Convertor

query procedure

DVC installation instructions

NeemHub synchronize

NeemHub Download

NeemHub Upload

Query result for dataset-builder

Semantic mapping between ELAN-Annotations and SOMA