-
Notifications
You must be signed in to change notification settings - Fork 0
raman162/POSTagging
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Project 2: Part of speech tagger The assignment is in Java. http://java.sun.com/ The JDK is needed to compile and run this assignment. Please supply which version of Java you are using with your submission. ~> java -version will tell you which version you have. It should be something like "1.5.0_12". To build, you should have ant installed. http://ant.apache.org/ It might also be convenient to have make. After making your changes, The file src/cs481/postag/POSTag.java contains a Majority Tag POS tagger. This should be the only file that is modified to create a bigram HMM POS tagger. Training and testing data is provided in dat. To compile: ~> ant Or, if you also have make: ~> make (Your IDE may take care of that step automatically.) To run: ~> cd bin ~/bin> java -Xmx512m cs481.postag.POSTag ../dat/train.xml ../dat/test_1.xml out.xml Note that -Xmx512m specifies how much RAM to use. Set it to an appropriate value for your machine. To find the accuracy of your tagging: ~/bin> java -Xmx512m cs481.postag.POSDiff out.xml ../dat/test_1.xml > log In the default configuration, that should return: *** Similarity = 27308 / 28432 = 96.04670793472144 % Grading: To grade this assignment, I have a separate testing set test_2.xml. I will run your program on test_2.xml and see what accuracy it gets according to POSDiff as above. This is will be the accuracy component of your grade.
About
Part Of Speech Tagger. Implemented Vertibri Algorithm along with smoothing
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published