Skip to content

LogonProcessing_BatchGeneration

ErikVelldal edited this page Jun 12, 2008 · 14 revisions

Overview

This document describes how to perform batch generation within LOGON. In extension of this it also describes how to produce a generation treebank on the basis of an existing ERG treebank. We first give a step-by-step descripton of how to do this using only menu choices from the podium. Then show how the same steps can be carried out from the command-line using the generate script provided in the LOGON source tree (i.e. $LOGONROOT/generate).

Using the menu options

For the podium approach, there are two main steps; 1) generate and 2) update. In the first step we exhaustively generate all "paraphrases" for the input MRS. In the update step we identify and label the references among these alternative realizations by matching them against the references in the original parse treebank.

1) Generation

  • Load the grammar: (from the LKB top panel) Load | Complete grammar (and choose for example logon/lingo/erg/lkb/script to load the latest ERG grammar)

  • Initialize the generator: (from the LKB top panel): Options | Expand menu, and then Generate | Index

  • Select appropriate skeletons (e.g. english): (from the [incr tsdb()] podium) Options | Skeleton Root

  • Select the skeleton you want to use and create the target profile: File | Create

  • Select the corresponding gold profile (assumed to be thinned, i.e. containing the MRSs for the preferred parses).

  • Select generation as batch processing mode: Process | Switches | Generation

  • Optionally set the maximum number of edges (e.g. 50000): Process | Variables | Chart size limit

  • Generate: Process | All Items

2) Update


  • In this step we identify and label the references among the newly generated sets of paraphrases. First we set some switches controlling how the realizations are matched against the references of the original parse treebank:

    • Trees | Switches | Update Exact Match

      Trees | Switches | Preterminal Yield

  • Then perform the labeling step: Trees | Update

(ps: Some [incr tsdb()] Lisp variables that are relevant for the matching/labeling of references include the following:BR *redwoods-update-exact-p*, *derivations-comparison-level*, and *derivations-yield-skews*.)

Using the command-line script

The procedure described above can also be performed by using the $LOGONROOT/generate script.

generate [ skeleton ]

We document the available command-line options below, as well as some of related and relevant Lisp varibles.

--source

  • Compile the LOGON system from source.

--count n

  • Parallelize processing and start-up n full instantiations of the parser client.

--limit n

  • Sets tsdb::*tsdb-maximal-number-of-results*

--best n

--is

  • Surpress MRS specification about information structure in generation. Controlled by way of the variable mrs::*ignored-sem-features*.

--suffix string

  • Append string to the name for the newly created profile (e.g. when more than one run per day needs to be recorded).

--jacy BR --gg

  • Changes the grammar from the default ERG and sets the appropriate language for skeletons correspondingly.

--gold string

  • Specify the gold profile (the default being gold/${grammar}/${skeleton})

--update

  • Do not perform the update step (i.e. automatically identifying and labeling references).

--cache

  • In the same pass, create a feature cache for the newly created generation treebank using default feature settings (this can take a while) and then train a MaxEnt realization ranker (again using only default estimation settings).

Note that in order to control the maximum number of edges allowed in the chart during generation, look for tsdb::*tsdb-maximal-number-of-edges* in $LOGONROOT/generate (currently defaults to 100,000).

Clone this wiki locally