Skip to content

Releases: dkpro/dkpro-core

DKPro Core 1.10.0

10 Sep 18:42
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core 1.10.0

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

https://dkpro.github.io/dkpro-core

This is a feature release.

Notable changes since DKPro Core 1.9.3

  • Added support for Arabic to CoreNlpSegmenter (thanks @Jibun)
  • Added support for Token "form" to CoNLL writers (thanks @Jibun)
  • Added ability to provide extra non-standard parameters to CoreNlpSegmenter (thanks @Jibun)
  • Added ArkTreet POS tagger trainer (thanks @schrieveslaach)
  • Added WebAnno TSV3 reader/writer
  • Added reader for Leipzig Corpora Collection
  • Upgraded to CoreNLP 3.9.1 (stanfordnlp and corenlp modules)
  • Upgraded to OpenNLP 1.9.0
  • Upgraded to PDFBox 2.0.9 (io-pdf module)
  • Upgraded to LanguageTool 4.2
  • Upgraded to CogComp 4.0.7 (lbj module)
  • Upgraded to Tika 1.18 (io-tika module)
  • Improved handling of multi-line annotations in brat module (thanks @parisni)
  • Fix discontinuous annotations crashing the brat reader by reading only the first fragment
  • Added dataset description for GUM 4.1.0 dataset
  • Removed PARAM_INTERN_TAGS
  • Improved component metadata

A more detailed overview of the changes in this release can be found here.

Thanks for contributions go to: @Jibun, @parisni, @schrieveslaach, @jgrivolla

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.9.3

28 Jul 08:56
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core 1.9.3

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

This is a bug-fix and minor feature release.

Notable changes since DKPro Core 1.9.2

  • Added ability to restore Backmapper alignment data after a CAS restore
  • Added ability to specify a cluster resource name for the ArkTweet POS-tagger trainer
  • Added PARAM_MODEL_ENCODING to TreeTaggerChunker
  • Fixed issue that DictionaryAnnotator did not match at the sentence end
  • Ensured that all parameters have a description

A more detailed overview of the changes in this release can be found here.

Thanks for contributions go to: @nilsreiter, @mjunsilo, @schrieveslaach, @jkirsch

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.9.2

28 Jul 08:53
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core 1.9.2

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

This is a bug-fix and minor feature release.

Notable changes since DKPro Core 1.9.1

  • Allow explicitly specifying a model artifact when running a model-based component
  • Fixed auto-loading of models in CoreNLP module
  • Fixed issue causing PdfReader to create annotations with leading/trailing whitespace
  • Added more OMTD-SHARE metadata and UIMA capabilities
  • Avoid failing when encountering a discontinuous segment in brat files

A more detailed overview of the changes in this release can be found here.

Thanks for contributions go to: @nilsreiter, @mjunsilo

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.9.1

05 Apr 16:24
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core 1.9.1

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

This is a bug-fix and minor feature release.

Notable changes since DKPro Core 1.9.0

  • Included OMTD-SHARE metadata
  • Improved mapping capabilities and robustness of the BratReader
  • Added option to mark split tokens in CamelCasTokenSegmenter
  • Fixed hash for CC-BY 4.0 license in dataset API
  • Fixed NPE in CoNLL 2012 reader
  • Upgrade to LanguageTool 4.1
  • Upgrade to ICU4J 61.1
  • Upgrade to JTok 2.1.18
  • Upgrade to OpenNLP 1.8.4

A more detailed overview of the changes in this release can be found here.

Thanks for contributions go to: @nilsreiter, @mjunsilo

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.9.0

03 Jan 12:39
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.9.0

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

Major news

  • New dataset API and a range of datasets supported out-of-the-box
  • Added various components to train models, e.g. in OpenNLP, Stanford NLP or LingPipe modules
  • Added documentation on how to integrate new modules
  • Improved component metadata and reference documentation
  • Upgraded to UIMA 2.10.2, uimaFIT 2.4.0
  • Fixed build on Windows
  • Introduced "form" feature on Token annotations
  • Introduced "syntacticFunction" feature on Token annotations
  • Introduced "coarseValue" feature on POS annotations
  • Introduced "flavor" feature on Dependency annotations
  • Switched to using UD categories for POS types
  • Note the section on incompatible changes below!

Analysis components

  • New "CoreNLP" module provides improved and modernized wrappers for Stanford CoreNLP
  • Integrated UDPipe 1.1.0
  • Integrated CISStem German stemmer
  • Integrated segmenter based on ICU 60.1
  • Integrated NER models from FREME project
  • Integrated POS tagger and lemmatizer from IXA-Pipe
  • Integrated OpenNLP parser models from IXA
  • Integrated NLP4J 1.1.3
  • Integrated LingPipe
  • Upgraded to OpenNLP 1.8.3
  • Upgraded to MaltParser 1.9.1
  • Upgraded to LanguageTool 3.9
  • Upgraded to Stanford CoreNLP 3.8.0
  • Upgraded to GATE 8.2

Data formats

  • Support for AnCora corpus format
  • Support for NYT corpus format
  • Support for LXF format
  • Support for LIF format
  • Support for CoNLL-U format
  • Support for CoNLL 2003 and 2008 formats
  • Support for GermEval in CoNLL 2002 reader
  • Support for TüBa D/Z chunk format
  • Improved performance of CoNLL 2000 and 2002 writers

Incompatible changes

  • The Constituent Java type PRN has been deprectated because it is a reserved name on Windows systems and prevents building DKPro Core there. The type definition still exists, but not JCas class is generated. As a replacement, the type PARN has been introduced. (#414)
  • All POS types have been have been prefixed with POS_ to avoid clashes with reserved names on Windows systems. The type definitions of the old types still exist, but not JCas classes are generated anymore.

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.8.0

24 Jun 14:13
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.8.0

a collection of interoperable software components for natural language processing (NLP) based on the Apache UIMA framework.

Changed minimal system requirements

  • Requires Java 8 (Issue #369)
  • Upgrade Apache UIMA to version 2.8.1 (Issue #662)
  • Upgrade uimaFIT to version 2.2.0 (Issue #664)
  • Upgrade Spring Framework to version 3.2.16 (Issue #815)

Major improvements

  • Extensive automatically generated reference documentation (e.g. Issues #753, #635, #589)
  • New framework for text normalization and transformation (e.g. Issue #537)
  • New validation framework, mainly for improved bug detection in unit tests (Issue #728)
  • Writer components write to console if no target is specified (Issue #700)
  • Renamed some components for a more uniform naming scheme (e.g. Issue #717)
  • Writers per default refuse to overwrite files (Issue #669, #564)
  • Dependency parsers and readers consistently create a self-looped ROOT node (Issue #628)

Analysis components

  • Added JTok component, Java-based configurable tokenizer and sentence splitter (Issue #695)
  • Added RFTagger component, a tool for the annotation of text with fine-grained part-of-speech tags. (Issue #684)
  • Added RegexTokenizer and WhitespaceTokenizer components - simple whitespace tokenizers (Issue #552)
  • Added MateTools SRL component (Issue #483)

Data formats

  • Added Brat format (Issue #656)
  • Added Mallet LDA (Issue #602)
  • Added Reuters-21578 Text Classification (Issue #691)
  • Added RTF (Issue #588)
  • Added Solr (Issue #576)
  • Added UIMA Json (Issue #455)
  • Added Writer for one sentence per line (Issue #673)
  • Improved TEI to support (Issue #594, #596)

Types

  • Added MorphologicalFeatures type to support morphological analysis (e.g. Issue #244)
  • Added Div type for generic document structure (e.g. Issue #598)
  • Added id feature on Token and Sentence (e.g. Issue #609)
  • Added MetadataStringField type (Issue #672)

Further highlights in this release include:

  • Upgrade Spring framework to version 3.2.16 (Issue #815)
  • Upgrade GATE to version 8.0 (Issue #387)
  • Upgrade Stanford CoreNLP to version 3.6.0 (Issue #706)
  • Upgrade OpenNlp to version 1.6.0 (Issue #634)
  • Upgrade LanguageTool to version 3.3 (Issue #819)
  • Upgrade MaltParser to version 1.8.1 (Issue #734)
  • Added Bintray as a repository to DKPro Core (#694)

A more detailed overview of the changes and bug corrections in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.7.0 (ASL)

18 Jul 11:53
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.7.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

Analysis components

  • hunpos - wrapper for hunpos, a HMM pos tagger including models for many languages;
  • langdetect - wrapper for language-detection, a language detection tool for java;
  • mallet - wrapper for topic modelling using MALLET;
  • textnormalizer - original components for text normalization, e.g. spelling correction, umlaut normalization, expressive lengthening normalization.

Data formats

  • io.conll - support for CoNLL 2000, 2002, 2009 and 2012 formats;
  • io.ditop - support for DiTop topic model visualization format;
  • io.penntree - support for combined and chunked formats;
  • io.tueppdz - support for TüPP-D/Z format.

Further highlights in this release include:

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.7.0 (GPL)

18 Jul 11:55
Compare
Choose a tag to compare

We are pleased to announce the release of

DKPro Core, version 1.7.0 (ASL & GPL)

a collection of interoperable software components for natural language
processing (NLP) based on the Apache UIMA framework.

Analysis components

  • hunpos - wrapper for hunpos, a HMM pos tagger including models for many languages;
  • langdetect - wrapper for language-detection, a language detection tool for java;
  • mallet - wrapper for topic modelling using MALLET;
  • textnormalizer - original components for text normalization, e.g. spelling correction, umlaut normalization, expressive lengthening normalization.

Data formats

  • io.conll - support for CoNLL 2000, 2002, 2009 and 2012 formats;
  • io.ditop - support for DiTop topic model visualization format;
  • io.penntree - support for combined and chunked formats;
  • io.tueppdz - support for TüPP-D/Z format.

Further highlights in this release include:

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.6.2 (GPL)

18 Jul 11:57
Compare
Choose a tag to compare

#summary Release notes for DKPro Core 1.6.2

We are pleased to announce the release of

DKPro Core, version 1.6.2 (ASL & GPL)

a collection of interoperable software components for natural language processing

(NLP) based on the Apache UIMA framework.

Bug fixes:

  • io.conll - Conll2006Reader does not support POS-tag mapping
  • io.tcf - Dependencies leak between sentences
  • opennlp - Unable to set model of OpenNlpSegmenter

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.

DKPro Core 1.6.2 (ASL)

18 Jul 11:57
Compare
Choose a tag to compare

#summary Release notes for DKPro Core 1.6.2

We are pleased to announce the release of

DKPro Core, version 1.6.2 (ASL & GPL)

a collection of interoperable software components for natural language processing

(NLP) based on the Apache UIMA framework.

Bug fixes:

  • io.conll - Conll2006Reader does not support POS-tag mapping
  • io.tcf - Dependencies leak between sentences
  • opennlp - Unable to set model of OpenNlpSegmenter

A more detailed overview of the changes in this release can be found here.

When upgrading, please mind that you should not mix different versions of DKPro Core components in your projects - they may not be compatible with each other.