-
Notifications
You must be signed in to change notification settings - Fork 0
/
output.txt
36 lines (30 loc) · 1.65 KB
/
output.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Answer to Question No.1
The number of unique words in the training corpus is 41739
Answer to Question No.2
The number of tokens in the training corpus is 2568210
Answer to Question No.3
The percentage of words unseen in test is 3.6028823058446755
The percentage of tokens unseen in test is 1.603346113628442
Answer to Question No.4
Percentage of unique bigrams in test not in training is 25.327695560253698
Percentage of bigram tokens in test not in training is 21.704586493318885
Answer to Question No.5
['<s>', 'i', 'look', 'forward', 'to', 'hearing', 'your', 'reply', '.', '</s>']
1. Unigram Log Probability -90.25609082365422
1. Unigram Average Log Probability -10.028454535961579
2. Bigram Model Evaluation -0.0
As the bigram model log probability evaluation is zero, there is no average log probability. Perplexity is undefined
3.Bigram Add One Log Probability -97.13956016607362
3.Bigram Add One Average Log Probability -10.793284462897068
Answer to Question No 6
Perplexity of sentence under unigram model 1044.3970236213079
Perplexity of sentence under add one bigram model 1774.607755085189
Answer to Question No.7
The unigram log probability for the test corpus is -28124.445292806824
The unigram average log probability for the test corpus is -10.15689609707722
Perplexity of test corpus under unigram model 1141.6431838536823
Bigram Model Evaluation on test corpus -0.0
As the bigram model log probability evaluation is zero, there is no average log probability. Perplexity is undefined
Add One Smoothing Bigram Log Probability -31072.441669718435
Add One Smoothing Bigram Average Log Probability -11.221539064542592
Add One Smoothing Bigram Perplexity 2387.920456133782