Skip to content

IrenaeusChan/SpamFilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

22a65f8 · Apr 16, 2015

History

6 Commits
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015
Apr 16, 2015

Repository files navigation

SpamFilter

Bayesian Spam Filter to organize Spam and Ham

spam.py

This script will create 2 dictionaries utilizing the two provided test cases “learning_ham” and “learning_spam”. After creating the two dictionaries, the script will then create two files titled “outputHam.txt” and “outputSpam.txt” which contains the total number of words matched for all data sets, a list of all the words, their frequencies, P(word|spam or ham), and P(spam or ham|word).

To use this script, ensure that learning datasets are in the current directory, then on the command line type: python spam.py

This will produce the two files stated above.

In addition to learning the two dataset provided, the spam.py program also comes with the function to determine whether or not a folder containing email messages is considered spam or not. Simply going into the program and choosing the correct path for the variable fileTest will allow users to receive a list of files within the test folder that are either SPAM or HAM depending on the set confidence level.

About

Bayesian Spam Filter to organize Spam and Ham

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages