Skip to content

A repository of raw datasets of the Enggano flora and fauna lexicon as part of the AHRC-funded research titled Lexical Resources for Enggano, A Threatened Language of Indonesia (AH/W007290/1).

License

Notifications You must be signed in to change notification settings

engganolang/flora-fauna-lexicon

Repository files navigation

Enggano Flora and Fauna Lexicon

Dendi Wijaya ORCID iD icon, Gede Primahadi Wijaya Rajeg ORCID iD icon, Engga Zakaria Sangian ORCID iD icon

The University of Oxford Faculty of Linguistics, Philology and Phonetics, the University of Oxford Arts and Humanities Research Council (AHRC)
This work is part of the AHRC-funded project on the lexical resources for Enggano, led by the Faculty of Linguistics, Philology and Phonetics at the University of Oxford, UK. Visit the central webpage of the Enggano project.

Enggano Flora and Fauna Lexicon by Dendi Wijaya, Gede Primahadi W. Rajeg, and Engga Zakaria Sangian is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

DOI DOI

Overview

If you use the data from this repository (Wijaya et al., 2024), please cite as follows:

Wijaya, D., Rajeg, G. P. W., & Sangian, E. Z. (2024). Enggano Flora and Fauna Lexicon (Version 1). University of Oxford. Dataset. https://doi.org/10.25446/oxford.28091270.v1

This repository holds the annotated databases for the Enggano Flora and Fauna Lexicon. The databases, originally stored as Google Spreadsheets for collaboration, are then accessed and processed using R codes in this repository using several R packages (Bryan, 2023; Cysouw, 2018; D’Agostino McGowan & Bryan, 2023; Moran & Cysouw, 2018; Ooms, 2023; Wickham et al., 2019; Wickham & Bryan, 2023).

The processing includes creating orthography profile and tokenisation/segmentation of the phonemic transcription, and, most importantly, creating links between the Enggano forms and their corresponding pictures (ID) to be used in the Contemporary Enggano Dictionary, which is processed using R here.

The databases consist of four different file types: .rds (R data file), .csv, .tsv, and .xlsx.


Dendi Wijaya gathered the primary data in October 2023 and November 2024; transcribed the forms; translated them into Indonesian and English; provided the IPA transcription; and rename the photos according to the ID of the forms.

Gede Primahadi W. Rajeg (GPWR) checked which items have been in the contemporary Enggano FLEx databases and which one to exclude from the main dictionary databases (e.g., due to duplication, etc.), in consultation with Engga Zakaria Sangian. GPWR also manually annotated the main entry variable of the forms so that complex forms can be subsumed under/linked to their main/root entry in the dictionary; annotated the crossref. column; performed the segmentation of the IPA transcription (and fixed errors); linked the forms ID with the photo by filename; manage this GitHub repository for archiving.

Engga Zakaria Sangian was consulted in a number of meetings for the verification of orthography and inclusion of the forms.

References

Bryan, J. (2023). googlesheets4: Access google sheets using the sheets API V4 (Version 1.1.1) [Computer software]. https://CRAN.R-project.org/package=googlesheets4

Cysouw, M. (2018). qlcData: Processing data for quantitative language comparison (Version 0.2.1) [Computer software]. https://cran.r-project.org/web/packages/qlcData/index.html

D’Agostino McGowan, L., & Bryan, J. (2023). Googledrive: An interface to google drive (Version 2.1.1) [Computer software]. https://CRAN.R-project.org/package=googledrive

Moran, S., & Cysouw, M. (2018). The unicode cookbook for linguists: Managing writing systems using orthography profiles. Language Science Press. https://doi.org/10.5281/zenodo.1296780

Ooms, J. (2023). Writexl: Export data frames to excel ’xlsx’ format (Version 1.4.2) [Computer software]. https://CRAN.R-project.org/package=writexl

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T., Miller, E., Bache, S., Müller, K., Ooms, J., Robinson, D., Seidel, D., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H., & Bryan, J. (2023). Readxl: Read excel files (Version 1.4.3) [Computer software]. https://CRAN.R-project.org/package=readxl

Wijaya, D., Rajeg, G. P. W., & Sangian, E. Z. (2024). Enggano Flora and Fauna Lexicon (Version 1) [Dataset]. University of Oxford. https://doi.org/10.25446/oxford.28091270.v1

About

A repository of raw datasets of the Enggano flora and fauna lexicon as part of the AHRC-funded research titled Lexical Resources for Enggano, A Threatened Language of Indonesia (AH/W007290/1).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages