-
Notifications
You must be signed in to change notification settings - Fork 2
Feature Unification Example
For Natural Language Parsing and Translation, features greatly simplifies the grammar.
Suppose that we want to check some grammatical rules of English:
- Subject-Verb Agreement:
Subject(i) Verb(am watching) ok
Subject(they) Verb(is watching) nok
Subject(the men) Verb(is watching) nok
- Determiner-Noun Agreement
Det(these) Noun(men) ok
Det(two) Noun(man) nok
- Case Enforcement: Subject should have nominative case, Object should have accusative case:
i am watching them ok
me am watching them nok
the men are watching her ok
the men are watching he nok
In order to realize the above agreement & enforcement rules, we define a 3-dimensional feature vector(or list): Case(Nominative,Accusative),Person(1st,2nd,3rd),Number(Singular,Plural) which can be represented in the grammar as:
[case=nom|acc, pers=1|2|3, numb=sing|plur]
Some pronouns can have all three features:
i [case=nom,numb=sing,pers=1]
them [case=acc,numb=plur,pers=2]
but some pronouns are case-neutral, i.e. they have the same form for both cases, then we simply omit the feature:
it [numb=plur,pers=3]
you [numb=plur,pers=2]
Nouns has no case, has always 3rd person, but can be singular or plural:
man [numb=sing,pers=3]
men [numb=plur,pers=3]
Determiners can also have number:
a [numb=sing]
two [numb=plur]
this [numb=sing]
these [numb=plur]
Some determiners can be number-neutral:
the []
Verbs or Verb Phrases can also have number:
am watching [numb=sing,pers=1]
is watching [numb=sing,pers=3]
are watching [numb=plur]
watch [numb=sing,pers=1] or [numb=plur]
watches [numb=sing,pers=3]
watched []
The following grammar demonstrates feature unification (see feature.grm):
S -> NP(case=nom,numb,pers) VP NP(case=acc)
NP -> i [case=nom,numb=sing,pers=1]
NP -> he [case=nom,numb=sing,pers=3]
NP -> she [case=nom,numb=sing,pers=3]
NP -> it [numb=sing,pers=3]
NP -> we [case=nom,numb=plur,pers=1]
NP -> you [numb=plur,pers=2]
NP -> they [case=nom,numb=plur,pers=3]
NP -> me [case=acc,numb=sing,pers=1]
NP -> him [case=acc,numb=sing,pers=3]
NP -> her [case=acc,numb=sing,pers=3]
NP -> us [case=acc,numb=plur,pers=1]
NP -> them [case=acc,numb=plur,pers=3]
NP -> Det Noun [pers=3]
Det -> this [numb=sing]
Det -> these [numb=plur]
Det -> a [numb=sing]
Det -> two [numb=plur]
Det -> the
Det ->
Noun -> man [numb=sing]
Noun -> men [numb=plur]
VP -> am Ving [numb=sing,pers=1]
VP -> is Ving [numb=sing,pers=3]
VP -> are Ving [numb=plur]
VP -> was Ving [numb=sing]
VP -> were Ving [numb=plur]
VP -> Ved
VP -> V [numb=sing,pers=1]
VP -> Vs [numb=sing,pers=3]
VP -> V [numb=plur]
V -> watch
Vs -> watches
Ving -> watching
Ved -> watched
you can run the following code to see parse tree of some sentences:
from GLRParser import Parser, ParseError, UnifyError, GrammarError
parser = Parser()
parser.parse_grammar("feature.grm")
parser.compile()
sents = ["i am watching them", "me am watching them",
"the men are watching her", "the man are watching her"]
for sent in sents:
print(sent)
try:
parser.parse(sent)
tree = parser.make_tree()
tree2 = parser.unify_tree(tree)
print(tree2.pformat_ext())
except UnifyError as ue:
print(ue,"\n")
except ParseError as pe:
print(pe,"\n")
It gives the following output:
i am watching them :
S(
#1[numb=sing,pers=1]
NP(
#2[case=nom,numb=sing,pers=1]
i
)
VP(
#23[numb=sing,pers=1]
am
Ving(
#34[]
watching
)
)
NP(
#13[case=acc,numb=plur,pers=3]
them
)
)
me am watching them :
Unify error feat=case src=acc param=nom super=S#1 sub=NP#9
the men are watching her :
S(
#1[numb=plur,pers=3]
NP(
#14[numb=plur,pers=3]
Det(
#19[]
the
)
Noun(
#22[numb=plur]
men
)
)
VP(
#25[numb=plur]
are
Ving(
#34[]
watching
)
)
NP(
#11[case=acc,numb=sing,pers=3]
her
)
)
the man are watching her :
Unify error feat=numb src=plur param=sing super=S#1 sub=VP#25