You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 19, 2018. It is now read-only.
Markus M. Geipel edited this page Jun 12, 2013
·
1 revision
A Metamorph definition is used to declare, what is to be counted: From each literal emitted, a key is created by concatenating its name and value. . The data to be analyzed may either be available on HDFS or in form of an HTable.
Further examples can be found in src/main/resources/statistics
Data on HDFS
use job_countInFile.sh INPUT_PATH FORMAT MORPH_DEF. The following restrictions apply to the input data: Records must be separated by the newline character (MARCXML is thus not admissible). Data may be uncompressed or gzipped. If gzipped, the data should be split into files of >64MB, otherwise the ingest cannot be distributed on cluster.