While the Glottolog CLDF dataset does not contain all of Glottolog's reference data (to keep it of reasonable size), we can still investigate all grammars which are considered MED (i.e. Most Extensive Description) for a language.
Since this kind of query requires joining information for three different entities (namely languages, references and parameters), it is best done using SQLite.
To piece together our query, we first take a peek at the relevant tables:
The Glottolog CLDF dataset provides data for the following parameters:
sqlite> .headers on
sqlite> select * from ParameterTable;
cldf_id|cldf_name|cldf_description|type
level|Level||categorical
category|Category||categorical
classification|Classification||
subclassification|Subclassification||
aes|Agglomerated Endangerment Status||sequential
med|Most Extensive Description||sequential
What we are interested in is the "Most Extensive Description" for each language. Using the cldf_id
as foreign key value we can lookup the codes that
are defined for this parameter:
sqlite> select * from CodeTable where cldf_parameterreference = 'med';
cldf_id|cldf_parameterReference|cldf_name|cldf_description|numerical_value
med-long_grammar|med|long grammar|Grammar with more than 300 pages|0
med-grammar|med|grammar|Grammar with less than 300 pages|1
med-grammar_sketch|med|grammar sketch|Grammar sketch|2
med-phonology_or_text|med|phonology/text|New Testament, Text, Phonology, (typological) Study Of A Specific Feature or Dictionary|3
med-wordlist_or_less|med|Wordlist or less|Wordlist or less|4
sqlite> .schema LanguageTable
CREATE TABLE `LanguageTable` (
`cldf_id` TEXT NOT NULL,
`cldf_name` TEXT,
`cldf_macroarea` TEXT,
`cldf_latitude` REAL CHECK (`cldf_latitude` >= -90 AND `cldf_latitude` <= 90),
`cldf_longitude` REAL CHECK (`cldf_longitude` >= -180 AND `cldf_longitude` <= 180),
`cldf_glottocode` TEXT,
`cldf_iso639P3code` TEXT,
`Countries` TEXT,
`Family_ID` TEXT,
`cldf_languageReference` TEXT,
PRIMARY KEY(`cldf_id`),
FOREIGN KEY(`Family_ID`) REFERENCES `LanguageTable`(`cldf_id`) ON DELETE CASCADE,
FOREIGN KEY(`cldf_languageReference`) REFERENCES `LanguageTable`(`cldf_id`) ON DELETE CASCADE
);
sqlite> .schema SourceTable
CREATE TABLE `SourceTable` (
`id` TEXT,
...
`author` TEXT,
...
`title` TEXT,
...
`year` TEXT,
...
PRIMARY KEY(`id`)
);
These three entities are tied together in a CLDF StructureDataset through
the ValueTable
:
sqlite> .schema valuetable
CREATE TABLE `ValueTable` (
`cldf_id` TEXT NOT NULL,
`cldf_languageReference` TEXT NOT NULL,
`cldf_parameterReference` TEXT NOT NULL,
`cldf_value` TEXT,
`cldf_codeReference` TEXT,
`cldf_comment` TEXT,
`codeReference` TEXT,
PRIMARY KEY(`cldf_id`),
FOREIGN KEY(`cldf_parameterReference`) REFERENCES `ParameterTable`(`cldf_id`) ON DELETE CASCADE,
FOREIGN KEY(`cldf_codeReference`) REFERENCES `CodeTable`(`cldf_id`) ON DELETE CASCADE,
FOREIGN KEY(`cldf_languageReference`) REFERENCES `LanguageTable`(`cldf_id`) ON DELETE CASCADE
);
and a many-to-many relation between values and sources:
sqlite> .schema valuetable_sourcetable
CREATE TABLE `ValueTable_SourceTable` (
`ValueTable_cldf_id` TEXT,
`SourceTable_id` TEXT,
`context` TEXT,
FOREIGN KEY(`ValueTable_cldf_id`) REFERENCES `ValueTable`(`cldf_id`) ON DELETE CASCADE,
FOREIGN KEY(`SourceTable_id`) REFERENCES `SourceTable`(`id`) ON DELETE CASCADE
);
Putting it all together, we can query
select
l.cldf_name, l.cldf_glottocode, l.cldf_iso639P3code, l.cldf_latitude, l.cldf_longitude,
v.cldf_value,
s.author, s.title, cast(s.year as int) as year, s.id
from
valuetable as v,
valuetable_sourcetable as vs,
sourcetable as s,
languagetable as l
where
v.cldf_id = vs.valuetable_cldf_id and
vs.sourcetable_id = s.id and
v.cldf_parameterreference = 'med' and
v.cldf_languagereference = l.cldf_id and
(v.cldf_value = 'long grammar' or v.cldf_value = 'grammar') and
cast(s.year as int) > 2015
order by year desc;
and get results looking like this
cldf_name | cldf_glottocode | cldf_iso639P3code | cldf_latitude | cldf_longitude | cldf_value | author | title | year | id |
---|---|---|---|---|---|---|---|---|---|
Kuy | kuyy1240 | kdt | 14.6698 | 104.911 | long grammar | Chareanraat, Chirat | The Kuay language of Suphanburi | 2525 | hh:g:Chareanraat:Kuay |
Patwin | patw1250 | pwi | 39.2143 | -122.0094 | long grammar | Lawyer, Lewis C. | A Grammar of Patwin | 2021 | hh:g:Lawyer:Patwin |
Min Nan Chinese | minn1241 | nan | 24.5 | 118.17 | long grammar | Chen, Weirong | A Grammar of Southern Min: The Hui’an Dialect | 2020 | hh:g:Chen:Huian:2020 |
Seeku | seek1238 | sos | 11.1285 | -4.58203 | long grammar | McPherson, Laura | A grammar of Seenku | 2020 | hh:g:McPherson:Seenku |
You can use the reference ID (from the last column) to lookup details on the web at https://glottolog.org/resource/reference/id/<ID>
, e.g. https://glottolog.org/resource/reference/id/hh:g:Chareanraat:Kuay
and as always with SQLite, you can output the results to a CSV file via
sqlite> .headers on
sqlite> .mode csv
sqlite> .output recent_grammars.csv
sqlite> select
...> l.cldf_name, l.cldf_glottocode, l.cldf_iso639P3code, l.cldf_latitude, l.cldf_longitude,
...> v.cldf_value,
...> s.author, s.title, cast(s.year as int) as year, s.id
...> from
...> valuetable as v,
...> valuetable_sourcetable as vs,
...> sourcetable as s,
...> languagetable as l
...> where
...> v.cldf_id = vs.valuetable_cldf_id and
...> vs.sourcetable_id = s.id and
...> v.cldf_parameterreference = 'med' and
...> v.cldf_languagereference = l.cldf_id and
...> (v.cldf_value = 'long grammar' or v.cldf_value = 'grammar') and
...> cast(s.year as int) > 2015
...> order by year desc;