Training an NER model to detect aliases #10963
Replies: 1 comment 4 replies
-
Hi @BreezySMX , Interesting project :) You might need to consider what the model is learning. Since NER is heavily dependent on context, your model may be learning the boundary words such as 'a.k.a.', 'also known as,' etc. There might even be an argument for using business rules here (since As a learning experiment, I think the flow is good. However, it's unclear if you're using actual dev data. You shouldn't need to use extra data to check if it works at all if you have dev data. Lastly, the |
Beta Was this translation helpful? Give feedback.
-
Hello,
This is my first time working with spaCy, or any type of ML, for a project and I just wanted a quick sanity check to make sure what I think I understand, and what I'm doing is correct.
I'm trying to train spacy to better recognize ALIAS in data. I figured what better training data than super heroes, and used the following formats (please ignore any inaccuracies in the data, as far as Marvel trivia goes, I used a list of ending sentences and random.choice to pick one of the endings):
With 173 heroes and 6 variations of how aliases are seen in text, (prob more but this is all I could think of) I ended up 1038 sets of training data.
I used the following to create my training data:
and then ran the following from the command line:
spacy train config.cfg --paths.train="train.spacy" --paths.dev="dev" --gpu-id 0 --output="output_folder"
once the training finished I ran the model again against another document that didn't contain any of the names or aliases in my training data with the following:
And it worked, it successfully identified names and aliases in the data.
My next step is to use the tutorial found here and extract the relationships between the name and alias entity. Unfortunately, the data used for the training comes from Prodigy, which I don't have the means to purchase, so I'm left trying to turn my data into the same format as them, or find an alternate way to do the relationship extraction.
Am I on the right track for this? or did I just get lucky?
Thanks for the feedback :).
Beta Was this translation helpful? Give feedback.
All reactions