A simple resume parser used for extracting information from resumes
- Extract name
- Extract email
- Extract mobile numbers
- Extract skills
- Extract total experience
- Extract college name
- Extract degree
- Extract designation
- Extract company names
- You can install this package using
pip install pyresparser
- For NLP operations we use spacy and nltk. Install them using below commands:
# spaCy
python -m spacy download en_core_web_sm
# nltk
python -m nltk.downloader words
Official documentation is available at: https://www.omkarpathak.in/pyresparser/
- PDF and DOCx files are supported on all Operating Systems
- If you want to extract DOC files you can install textract for your OS (Linux, MacOS)
- Note: You just have to install textract (and nothing else) and doc files will get parsed easily
- Import it in your Python project
from pyresparser import ResumeParser
data = ResumeParser('/path/to/resume/file').get_extracted_data()
For running the resume extractor you can also use the cli
usage: pyresparser [-h] [-f FILE] [-d DIRECTORY] [-r REMOTEFILE]
optional arguments:
-h, --help show this help message and exit
-f FILE, --file FILE resume file to be extracted
directory containing all the resumes to be extracted
remote path for resume file to be extracted
-re CUSTOM_REGEX, --custom-regex CUSTOM_REGEX
custom regex for parsing mobile numbers
custom skills CSV file against which skills are
searched for
the information export format (json)
- If you are running the app on windows, then you can only extract .docs and .pdf files
The module would return a list of dictionary objects with result as follows:
[ { 'college_name': ['Marathwada Mitra Mandal’s College of Engineering'], 'company_names': None, 'degree': ['B.E. IN COMPUTER ENGINEERING'], 'designation': ['Manager', 'TECHNICAL CONTENT WRITER', 'DATA ENGINEER'], 'email': 'omkarpathak27@gmail.com', 'mobile_number': '8087996634', 'name': 'Omkar Pathak', 'no_of_pages': 3, 'skills': ['Operating systems', 'Linux', 'Github', 'Testing', 'Content', 'Automation', 'Python', 'Css', 'Website', 'Django', 'Opencv', 'Programming', 'C', ...], 'total_experience': 1.83 } ]
- Special thanks to dataturks for their annotated dataset
