Language Models trained with LSTM using the Lord of The Rings trilogy corpus.
- Two methods were explored:
-
sparse_categorical_crossentropy
model where the input sentences were vectorized usingTextVectorization
from Keras, yielding input shape of (MAX_SEQ_LEN
,) whereMAX_SEQ_LEN
is the maximum sequence of the input sentences (Tx). Each individual sequence in theMAX_SEQ_LEN
would correspond to the vectorized integer number for the correspondingN_UNIQUE_CHARS
of the corpus (whereN_UNIQUE_CHARS
is the unique characters found in the corpus) -
categorical_crossentropy
model or One-Hot Encoding model where the input sentences were vectorized into one-hot encoded arrays, yielding input shape of (MAX_SEQ_LEN
,N_UNIQUE_CHARS
), whereMAX_SEQ_LEN
is the maximum sequence of the input sentences (Tx). For each of the individual sequence in theMAX_SEQ_LEN
there would be corresponding one-hot-encoded vector of lengthN_UNIQUE_CHARS
where N_UNIQUE_CHARS` is the unique characters found in the corpus.
-