Releases · poseidon-framework/poseidon-hs

13 Jul 15:22

v1.2.1.0

2025835

Release v1.2.1.0

This release does not include changes for trident end users.

It adds two new subcommands for (public) archive management, but they are only relevant from a developer's perspective: chronicle creates or updates a dedicated .yml file to document version iterations of Poseidon packages in a Git-managed archive, and timetravel recovers package iterations based on this file to (re)construct said archive from the source repository.

Just as serve, both new subcommands will be omitted in the command line help.

Assets 6

08 Jun 15:15

github-actions

v1.2.0.0

e1b14a5

Release v1.2.0.0

This release comes with a massive rewrite of the server-client infrastructure, so the code behind the API to download and list packages from our public data archives.

The server is now implemented as a (hidden) subcommand of trident: serve. It returns helpful error messages, if an incompatible version of trident tries to connect to it. And it is now capable of serving multiple (so not just the latest, but also older) versions of one package, which is an important step towards computational reproducibility of Poseidon-based pipelines.

All Server-APIs except for zip_file now return a complex JSON datatype with server messages and a payload. The messages contain standard messages like a greeting and in the future perhaps also deprecation warnings. Some APIs also provide information or warnings in the server messages.

All APIs except for zip_file also accept an additional parameter ?client_version=X.X.X, so that the server may in the future respond to old clients that an update is needed in order to understand the API. Our trident list --remote functionality already makes use of this.

Here are the individual APIs:

https://server.poseidon-adna.org/packages: returns a list of all packages.
https://server.poseidon-adna.org/groups: returns a list of all groups.
https://server.poseidon-adna.org/individuals: returns a list of all individuals.
https://server.poseidon-adna.org/zip_file/<package_name>?package_version=1.0.1: returns a zip file of the package with the given name and the given version. If no version is given, it returns the latest.

The client subcommands fetch and list can not yet make full use of this new API in this release, because they lack an interface to request specific package versions. This will be added in a future release. But the output of both subcommands already differs from the previous implementation:

fetch now appends the package version to the directory name when downloading a package. Previously trident fetch -d . -f "*2010_RasmussenNature*" yielded a directory named 2010_RasmussenNature with the package data, but now it creates 2010_RasmussenNature-2.0.1 (or whatever is the latest version of this package in the archive).
fetch does no longer have an option --upgrade, since the new behaviour respects different versions to live side by side in different directories. If users whish to remove old versions, they should do so manually.
list lists not just data (package, groups, individuals) for the latest version of a package, but for all versions. The package version became an explicit output column.

As before, forge and the other subcommands keep ignoring multiple package versions for now, and only read the latest.

The new server is available at a new URL (https://server.poseidon-adna.org), but the old version at (https://c107-224.cloud.gwdg.de) will also keep running for now. New releases of trident (v1.2.0.0+) will by default use the new server, while older versions must connect to the old one.

Assets 6

09 May 16:40

github-actions

v1.1.12.0

43050cd

Release v1.1.12.0

This release implements the changes necessary for the Poseidon schema v2.7.1. That mostly means that the constraints on several .ssf file columns previously considered mandatory and unique were lifted.

Beyond that a number of type constraints specified already in Poseidon v2.7.0 for the .ssf file were finally implemented in poseidon-hs. A broken file will, thus actually be flagged upon reading if it violates the following requirements:

.ssf columns that include Accession_IDs have to feature the correct and valid Accession_IDs according to INSDC specification.
.ssf columns with dates have to be valid dates of the form YYYY-MM-DD.
.ssf columns featuring MD5 hashes require entries with exactly 32 hex-digits.

Both for the .janno and the .ssf file we elevated the log level of the common broken lines error from debug to error. This makes these errors more prominent and more easy to resolve.

Assets 6

03 May 13:17

github-actions

v1.1.11.4

5ccae44

Release v1.1.11.4

This release fixes a core issue in the implementation of Poseidon v2.7.0, where multiple columns of the .ssf file where not defined correctly as list columns. Poseidon v2.7.0 is in itself deprecated, though, and will be replaced as soon as possible with an updated version. This trident release exists thus mainly to provide a working implementation of 2.7.0 for future reference.

Beyond this change in functionality, this release also includes heavy refactoring in the survey subcommand, the golden test infrastructure and the overall version of Haskell poseidon-hs and trident are built with. These changes should not have any user-facing consequences.

Assets 6

20 Mar 15:26

github-actions

v1.1.11.0

5037e27

Release v1.1.11.0

This release implements the changes necessary to make trident capable of handling packages specified for the new Poseidon standard v2.7.0:

A Poseidon package can now include a .ssf file ("sequencing source file") as specified. trident considers it in validate, update, survey and, most importantly, forge, where .ssf files are compiled for new packages just as .janno files.
trident now understands and validates the new .janno columns Country_ISO and Library_Names.
trident now knows the possible value mixed for the .janno column Library_Built.

The behaviour of trident for older package schema versions (v2.5.0 and v2.6.0) should be mostly unchanged. forge and init now return Poseidon v2.7.0 packages, though.

Assets 6

22 Feb 13:02

github-actions

v1.1.10.2

8751db3

Release v1.1.10.2

This release bundles a number or minor changes, new minor command line options and some internal refactoring without immediate consequences for trident.

Changes in command line options

By default validate only tests genotype data by parsing the first 100 SNPs. This limitation is necessary for performance reasons, but can hide issues outside of this tiny subset. We now added an option --fullGeno to validate, which forces parsing of the entire .bed/.geno file.
The .fam file of Plink-formatted genotype data is used inconsistently across different popular aDNA software tools to store group/population name information. See more on this issue in our discussion here. We now added the (global) option --inPlinkPopName and --outPlinkPopName with the arguments asFamily (default), asPhenotype and asBoth to control the reading and writing of the population name from and to Plink .fam files.
The --no-extract option for faster, package-wise data selection in forge was not working properly. We fixed it, renamed it to --packagewise and improved its command line help text.

Bugfixes

As described here, our implementation of .janno file parsing struggled with some encodings of the No-Break Space unicode character. We now decided to delete these characters upon reading, following the assumption that they are generally not desired in a .janno file anyway. In this process we also decided to trim all whitespaces around .janno file fields.

Other changes

The -j option of list, which allows to include additional .janno columns in the output with the --individuals flag now allows to access arbitrary, additional variables.
update writes messages to the CHANGELOG file now with a prefix -, to make it proper markdown.
The verbose debug-level (with --logMode VerboseLog) warnings about missing standard columns in the .janno file were turned off.
The important "schema version mismatch" error message was made more verbose and clear.
trident failes gracefully now if one or all -d/--baseDirs do not exist.
The important "broken lines" error message in the .janno reading process now reminds users to turn on --logMode VerboseLog to get more information.

Assets 6

13 Jan 19:21

github-actions

v1.1.7.0

809afc1

Release v1.1.7.0

This release clarifies a long standing uncertainty how trident treats individual ID duplicates. It adds a new feature to the forge language to specify individuals more precisely and thus resolve duplication conflicts.

trident does not allow individuals with identical identifiers, so Poseidon_IDs, within one package. And we generally also discourage such duplicates across packages in package collections. But there is no reason to enforce this unnecessarily for subcommands where it does not matter. Here are the rules we defined now:

Generally, so in the subcommands ìnit, fetch, genoconvert, update, list, summarise, and survey, trident logs a warning if it observes duplicates in a package collection found in the base dirs. But it proceeds normally then.
Deviating from this, the special subcommand validate stops with an error if it observes duplicates. This behaviour can be changed with the new flag --ignoreDuplicates.
The forge subcommand, finally, also ignores duplicates in the base dirs, except (!) this conflict exists within the entities in the --forgeString. In this case it stops with an informative error:

[Error]   There are duplicated individuals, but forge does not allow that
[Error]   Please specify in your --forgeString or --forgeFile:
[Error]   <Inuk.SG> -> <2010_RasmussenNature:Greenland_Saqqaq.SG:Inuk.SG>
[Error]   <Inuk.SG> -> <2011_RasmussenNature:Greenland_Saqqaq.SG:Inuk.SG>
[Error]   Error in the forge selection: Unresolved duplicated individuals

This already shows that the -f/--forgeString selection language of forge (and incidentally also fetch) includes a new syntactic element since this release: Individuals can now be described not just with <individual>, but also more specifically <package:group:individual>. Such defined individuals take precedence over differently defined ones (so: directly with <individual> or as a subset of *package* or group). This allows to resolve duplication issues precisely -- at least in cases where the duplicated individuals differ in source package or primary group.

Assets 6

08 Jan 19:32

github-actions

v1.1.6.0

76c0231

Release v1.1.6.0

Additional columns in .janno files (V 1.1.5.0)

This release changes the way additional columns in .janno files are treated.

So far trident fully ignored additional variables, which had the consequence that trident forge dropped them without warning. With this new release, additional variables are loaded and carried along in forge. For merging different .janno files A and B the following rules apply regarding additional columns:

If A has an additional column which is not in B then empty cells in the rows imported from B are filled with n/a.
If A and B share additional columns with identical column name, then they are treated as semantically identical units and merged accordingly.
In the resulting .janno file, all additional columns from both A and B are sorted alphabetically and appended after the normal, specified variables.

The following example illustrates the described behaviour:

A.janno

Poseidon_ID	Group_Name	Genetic_Sex	AdditionalColumn1	AdditionalColumn2
XXX011	POP1	M	A	D
XXX012	POP2	F	B	E
XXX013	POP1	M	C	F

B.janno

Poseidon_ID	Group_Name	Genetic_Sex	AdditionalColumn3	AdditionalColumn2
YYY022	POP5	F	G	J
YYY023	POP5	F	H	K
YYY024	POP5	M	I	L

A.janno + B.janno

Poseidon_ID	Group_Name	Genetic_Sex	AdditionalColumn1	AdditionalColumn2	AdditionalColumn3
XXX011	POP1	M	A	D	n/a
XXX012	POP2	F	B	E	n/a
XXX013	POP1	M	C	F	n/a
YYY022	POP5	F	n/a	J	G
YYY023	POP5	F	n/a	K	H
YYY024	POP5	M	n/a	L	I

Minor changes (V 1.1.6.0)

--verbose in trident validate was deprecated. The respective output is now logged on the DEBUG level, so can be accessed with --logMode VerboseLog
Trailing slashes in --outPath for init, genoconvert and forge are now automatically removed. This prevents a common, confusing error, where a trailing slash would cause trident to assume the name of the resulting package is empty.

Assets 6

03 Dec 08:47

github-actions

v1.1.4.2

822c82b

Release v1.1.4.2

With this release trident becomes able to handle the changes introduced for Poseidon v2.6.0.

The contributor field in the POSEIDON.yml file is optional now and can be left blank.
The contributor field now also can hold an ORCID in a subfield orcid. trident checks the structural correctness of this identifier.
trident now recognizes the new available entries for the Capture_Type variable in the .janno file.

Beyond that:

Already V 1.1.3.1 closed a loophole in .bib file validation, where .janno files could have arbitrary references if the .bib file was not correctly referenced in the POSEIDON.yml file.
V 1.1.4.1 added a small validation check for the janno columns Date_BC_AD_Start, Date_BC_AD_Median and Date_BC_AD_Stop: Ages bigger than 2022 now trigger an error, because they are factually impossible and indicate that somebody accidentally entered a BP age.
V 1.1.4.2 added parsing for Accession IDs. Wrong IDs are ignored (for now), so this is a non-breaking change.

Assets 6

15 Aug 08:53

github-actions

v1.1.3.0

d7d13b8

Release v1.1.3.0

This release introduces a major change to the progress indicators in package downloading, reading, forging and converting. It also includes some minor code changes in the poseidon-hs library and the poseidon server executable.

Trident

From a trident user perspective only the change in the progress indicators is relevant. So far we used updating (self-overwriting) counters, which were great for interactive use of trident in modern terminal emulators. They are not suitable for use in scripts, though, because the command line output does not yield well structured log files. We therefore decided to integrate the progress indicators with our general logging infrastructure.

Loading packages (so the Initializing packages... phase) now stays silent by default. With --logMode VerboseLog you can list the packages that are currently loading:

[Debug]   [10:56:05] Package 20: ./2015_LlorenteScience/POSEIDON.yml
[Debug]   [10:56:05] Package 21: ./2017_KennettNatureCommunications/POSEIDON.yml
[Debug]   [10:56:06] Package 22: ./2016_MartinianoNatureCommunications/POSEIDON.yml
[Debug]   [10:56:06] Package 23: ./2016_BroushakiScience/POSEIDON.yml
[Debug]   [10:56:06] Package 24: ./2017_LindoPNAS/POSEIDON.yml
[Debug]   [10:56:06] Package 25: ./2021_Zegarac_SoutheasternEurope/POSEIDON.yml

forge and genoconvert now print a log message every 10k SNPs:

[Info]    SNPs:    220000    5s
[Info]    SNPs:    230000    5s
[Info]    SNPs:    240000    5s
[Info]    SNPs:    250000    5s
[Info]    SNPs:    260000    6s
[Info]    SNPs:    270000    6s

fetch now prints a log message whenever a +5% threshold is reached.

[Info]    Package size: 15.3MB
[Info]    MB:      0.8      5.2%
[Info]    MB:      1.6     10.5%
[Info]    MB:      2.4     15.7%
[Info]    MB:      3.2     20.9%
[Info]    MB:      4.0     26.1%

Server

The server has been updated in the following ways:

It now uses Co-Log for logging
A new option -c now makes it ignore checksums, which is useful for a fast start of the server if need be
Zip files are now stored in a separate folder, to keep the (git-backed) repository itself clean
There is a new API named /compatibility/<version> which accepts a client version (from trident) and returns a JSON tuple of Haskell-type (Bool, Maybe String). The first element is simply a Boolean saying if the client version is compatible with the server or not, the second is an optional Warning message the server can return to the client. This will become important in the future.

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes in command line options

Bugfixes

Other changes

Additional columns in .janno files (V 1.1.5.0)

Minor changes (V 1.1.6.0)

Trident

Server

Releases: poseidon-framework/poseidon-hs

Release v1.2.1.0

Release v1.2.0.0

Release v1.1.12.0

Release v1.1.11.4

Release v1.1.11.0

Release v1.1.10.2

Changes in command line options

Bugfixes

Other changes

Release v1.1.7.0

Release v1.1.6.0

Additional columns in .janno files (V 1.1.5.0)

Minor changes (V 1.1.6.0)

Release v1.1.4.2

Release v1.1.3.0

Trident

Server