Skip to content

Latest commit

 

History

History
599 lines (463 loc) · 22.7 KB

CHANGELOG.md

File metadata and controls

599 lines (463 loc) · 22.7 KB

Changelog

Unreleased

[v1.1.8] - 2025-02-06 Thu

  • Add: update modules and dictionaries.

[v1.1.7] - 2025-01-06 Mon

  • Fix: GNF_VERIFIER_URL now works for web GUI.

[v1.1.6] - 2024-06-11 Tue

  • Add: GitHub icon for the web GUI
  • Fix: discrepancy between shown version and real version.

[v1.1.5] - 2024-05-23 Thu

  • Add: update dictionaries
  • Add: update modules

v1.1.4 - 2023-02-02 Fri

  • Add: use slices package for sorting.
  • Add #140: support nomenclatural annotations without space, like sp.nov..
  • Fix #152: fix timeout for verification of large list of names.

v1.1.3 - 2023-07-01 Sat

  • Add #139: nom. nov. as nomenclatural annotation.
  • Fix #138: ignore last dash in a token, if it is not after a letter.

v1.1.2 - 2023-03-24 Fri

  • Add: update dictionaries according to gnverifier data from 2023-03.

v1.1.1 - 2023-03-06 Mon

  • Add: move everything from internal to pkg to make gnfinder useful as a library.

v1.1.0 - 2023-02-22 Wed

  • Add #107: GET version for find in API.
  • Add #136: refactor code to more standard file layout.
  • Add #135: emphasize API/Web interfaces in README.

v1.0.4 - 2022-11-03 Thu

  • Fix #132: return words surronding a name without preprocessing.

v1.0.3 - 2022-10-13 Thu

  • Fix #131: recognize no-break and wide spaces as spaces during tokenization.

v1.0.2 - 2022-10-13 Thu

  • Fix: update gndoc to v0.3.2 to increasae threshold to access remote URLs

v1.0.1 - 2022-09-30 Fri

  • Add: all modules update.
  • Add: bayes module output for OddsDetails.
  • Fix: verifier test, update gnverifier dependency to v1.0.0.

v1.0.0 - 2022-08-24 Wed

  • Add #127: prepare v1.0.0.
  • Add: Nix build and shell files.
  • Add: rename dictionaries.

v0.19.5 - 2022-05-10 Tue

  • Add: update gnverifier to v1.0.0-RC1
  • Add: MCSbase to web UI.

v0.19.4 - 2022-05-03 Tue

  • Add: species group, cardinality score for verification.

v0.19.3 - 2022-04-10 Sun

  • Add: update output to use MainTaxon, templates.

v0.19.2 - 2022-04-09 Sat

  • Add: update gnlib to v0.13.0.

v0.19.1 - 2022-04-09 Sat

  • Fix: output field misspelling in JSON.

v0.19.0 - 2022-04-09 Sat

  • Add: add IRMNG to web UI
  • Add: update to gnlib v0.12.0, use its stats module.

[v0.18.3] - 2022-03-22 Tue

  • Add: update Go to v1.18, modules.
  • Fix #119: Taxon, Morphological should not be recognized as uninomials.

[v0.18.2] - 2022-03-03 Thu

  • Add: fix typo in home.html

v0.18.1 - 2022-03-01 Tue

  • Add #114: add an option to show ambiguous uninomials.
  • Add #113: show ambiguous genera, if there are species names with them.

v0.18.0 - 2022-02-28 Mon

  • Add #117: bring verificaton in sync with gnames v0.8.0
  • Add #116: add --all-matches flag to show all verification results.
  • Add: update input and output objects and REST API Introducing some backward incompatibility. See https://apidoc.globalnames.org/gnfinder-beta

v0.17.0 - 2022-01-06

  • Add #111: update bayes calculations.
  • Add #110: update verification process using most recent code. Stats for kingdoms distribution and the main clade that contains most of the names in the text. Verification JSON is not fully backward compatible.
  • Add #109: add classification path to CSV and TSV outputs.

v0.16.3 - 2021-10-31

  • Add: update dictionaries with Algaebase and fixes

v0.16.2 - 2021-10-28

  • Fix #108: remove confuxing red 'x' from web-UI results.

v0.16.1 - 2021-10-17

  • Add #106: Add API documentation.

v0.16.0 - 2021-06-23

  • Add #94: Add web-based user interface.
  • Add #105: Support for URL name-finding in REST API.
  • Add #104: merge petectLanguage to language. It allows to simplify logic for language settings. It also changes API signature for parameters. Now parameter "language" recognizes

          * "": empty string that goes to default "eng" setting for language
          * "detect": finds language by an algorithm
          * "eng": sets language to English
          * "deu": sets language to German
    
          All other settings default to "eng" (English)
    
  • Fix #103: remove conflict between language and detectlanguage parameters.
  • Add: update Echo web framework to v4.5.0
  • Fix #102: 'language' parameter for REST API.
  • Fix #101: BOM interferes with offsets when -U flag is used.
  • Add #99: add TSV format and make ouput format an option for REST API.

  • Add: update modules

  • Add: update Go to 1.17

  • Add #98: an option to return names positions in bytes from the text start instead of UTF-8 characters.

  • Fix #100: fix csv/tsv fields number for verification

  • Add #96: Zenodo DOI for citing GNfinder.
  • Add #92: return UTF8-encoded text only.
  • Add #91: convert/extract plain texts locally.
  • Add #89: configuration file and environment variables.
  • Add #87: support PDFs, MS Word, Excel, RTF, HTML, UTF16 etc via Apache Tika.
  • Add #86: an option to return unique found names.
  • Add #85: originalInput field with UTF8 text used for name-finding.
  • Add #84: metadata about file and name-finding duration.
  • Add: gnf.Find now takes string instead of []byte.
  • Fix: veification for REST interface

[v.0.12.0]

  • Add #81: represent new lines in verbatim output as "\n".
  • Add #80: use CSV, JSON, JSON pretty for output.
  • Add #79: adjust prior odds using the density of found names in a text.
  • Add #78: fix Odds value for names with 'grey' genus and species.
  • Add #77: add RESTful interface.
  • Add #76: remove subcommands from CLI.
  • Add #75: update tests, remove ginkgo depencency for tests.
  • Add #73: benchmark and optimize tokenizer.
  • Add #71: use embed introduced in Go v1.16.
  • Add #70: migrate code to use gner tokenizer.
  • Add #69: Output Odds as a log10.
  • Add #68: Refactor the code with interfaces to be consistent with other projects.
  • Add #64: Remove common words from species.
  • Add #63: Remove geo-names as uninomials.
  • Add #62: Remove human names as uninomials.
  • Add: Update dictionaries.
  • Fix #51: Remove 'Piper' from black list, add new words to dictionaries.
  • Add #49: Cleanup protobuf and JSON outputs. Introducing backward incompatible changes in the output. Standardising CLI JSON to camelcase, introducing cardinality instead of string for a name type, adding canonical simple and full canonical foms for matched and current names. Removing current name unless it is a synonym.
  • Add #46: gRPC serves nomenclatural annotation and words surrounding name-strings.
  • Add #44: save nomenclatural annotation for new species, combinations, subscpecies etc.
  • Add #45: return desired number of words before and after a name-candidate.
  • Add #39: Export to C shared library.
  • Add better Handling of the version.
  • Fix [#42]: No null pointers in verifier results.
  • Fix #41: More words in black list.
  • Add #37: add to git protob and version files.
  • Add #36: Refactor GNfinder options.
  • Add #35: Add version info to gRPC server.
  • Add #34: Better language detection.
  • Add #33: Make it possible to force Bayes not only "on" but also "off".
  • Add #32: Add benchmarks to gnfinder_test.go.
  • Add #31: Speedup name-finding for large numbers of small texts. Solving only partialy by preloading Bayes training data. We are going to do other optimizations later.
  • Fix #30: Tokenizer breaks if a text ends on a dash followed by space.
  • Add #29: Enhance verification results. Now preferred data sources have the same fields as the best result. Classification has IDs and ranks.
  • Add: Update dictionaries setting latin common names to grey dictionary.
  • Add: Dictionaries update.
  • Add #28: Generic names from ICN (botanical) code might have authors in parentheses that look the same as subgenus part of ICZN names. As a result parsing such names creates fake uninomials. We removed such fake uninomials from uninomial white dictionary.
  • Add #27: Refactor code to make it more maintainable
  • Add #26: Command line app tests
  • Fix #25: Make CLI app work again (cobra-based cli does not allow root command with input without flags so gndinfer text.txt was broken).
  • Fix #24: Canonical form for matched names
  • Fix #23: ExactMatch results have editDistance > 0 somtimes
  • Add more tests for gnindex.
  • Add #21: support updated gnindex API
  • Add #22: Go module support for more stable builds
  • Add #19: bring gRPC output close to cli output. Breaks backward compatibility of gRPC.
  • Add #20: update API interaction with gnindex.
  • Add #17: return offsets for the start and the end of name-strings.
  • Fix #18: gRPC works with diacritics in text input.
  • Add #16: docker support. Command make docker creates docker image.
  • Add #15: enable gRPC to set data-source IDs for verification.
  • Add #14: setting for name verification data-sources as well as command line flag. Currently tests for gRPC are located in Ruby gem gndinder project.
  • Add #12: gRPC-based HTTP API to access gnfinder from other languages.
  • Add StemEditDistance for fuzzy matching by stem.
  • Add #11: Quality Summary and Preferred data sources in verification.
  • Add #9: Additional information how to install in README.md.
  • Add #8: Retry verification if any error happens in the process.
  • Add #7: Add EditDistance field to verification output.
  • Add #6: Add 'NoMatch' value to verification 'MatchType'.
  • Fix #5: Hide verification "data" if it is empty.
  • Remove #6: Remove Verified field, as it repeats 'NoMatch' information.
  • Add #4: Name resolution attempts several times in case of timeout

  • Fix #3: Name verification breaks on large documents with thousands of words

  • Add: Tokenizer for breaking a text into tokens.
  • Add: Heuristic rules for scientific name finding.
  • Add: Bayes rules for scientific name finding.
  • Add: White, Black, and Grey dictionaries, common european words dictionary.
  • Add: Bayes training script to create reference data for Bayes algorithms.
  • Add: Command line application gnfinder is created using cobra framework.
  • Add: Name-verification via gnindex.
  • Add: Makefile to simplify compilation of the command line tool.

Footnotes

This document follows changelog guidelines