Curated libraries for a faster workflow
- Image: makesense.ai
- Text: doccano, prodigy, dataturks, brat
- Audio: audio-annotator
- Words: curse-words, badwords, LDNOOBW, english-words (A text file containing over 466k English words), 10K most common words
- Text Corpus: project gutenberg, oscar (big multilingual corpus), nlp-datasets, 1 trillion n-grams, The Big Bad NLP Database, litbank
- Summarization Data: curation-corpus
- Conversational data: conversational-datasets, cornell-movie-dialog-corpus
- Image: 1 million fake faces, flickr-faces, CIFAR-10, The Street View House Numbers (SVHN), STL-10, imagenette, objectnet, Yahoo Flickr Creative Commons 100 Million (YFCC100m)
- Dataset search engine: datasetlist, UCI Machine Learning Datasets, Google Dataset Search, fastai-datasets, Data For Everyone
- Audio: pydub
- Video: pytube (download youtube vidoes), moviepy
- Image: py-image-dataset-generator (auto fetch images from web for certain search)
- News: news-please
- PDF: camelot, tabula-py, Parsr, pdftotext
- Excel: openpyxl
- Remote file: smart_open
- Crawling: pyppeteer (chrome automation), MechanicalSoup, libextract
- Google sheets: gspread
- Google drive: gdown, pydrive
- Python API for datasets: pydataset
- Google maps location data: geo-heatmap
- Tex to Speech: gtts
- Databases: blaze (pandas and numpy interface to databases)
- Text: nlpaug, noisemix
- Image: imgaug, albumentations, augmentor, solt
- Audio: audiomentations, muda
- OCR data: TextRecognitionDataGenerator
- Automatic augmentation: deepaugment(image)
- Missing values: missingno
- Split images into train/validation/test: split-folders
- Class Imbalance: imblearn
- Categorical encoding: category_encoders
- Numerical data: numerizer (convert natural language numerics into ints and floats)
- Data Validation: pandera (validation for pandas)
- Data Cleaning: pyjanitor (janitor ported to python)
- Parsing: pyparsing, parse
- Natural date parser: dateparser
- Unicode: text-unidecode
- Emoji: emoji
- Weak Supervision: snorkel
- View Jupyter notebooks through CLI: nbdime
- Parametrize notebooks: papermill
- Access notebooks programatically: nbformat
- Convert notebooks to other formats: nbconvert
- Extra utilities not present in frameworks: mlxtend
- Maps in notebooks: ipyleaflet
- Data Exploration: bamboolib (a GUI for pandas)
- Automatic feature engineering: featuretools, autopandas, tsfresh (automatic feature engineering for time series)
- Custom distance metric learning: metric-learn, pytorch-metric-learning
- Time series: python-holidays, skits
- DAG based dataset generation: DFFML
- Bruteforce through all scikit-learn model and parameters: auto-sklearn, tpot
- Curations: bert-related-papers
- Autogenerate ML code: automl-gs, mindsdb, autocat (auto-generate text classification models in spacy)
- ML from command line (or Python or HTTP): DFFML
- Pretrained models: modeldepot, pytorch-hub, papers-with-code, pretrained-models.pytorch
- Find SOTA models: sotawhat
- Gradient Boosting: catboost, lightgbm (GPU-capable), thunderbm (GPU-capable)
- Hidden Markov Models: hmmlearn
- Genetic Programming: gplearn
- Active Learning: modal
- Support Vector Machines: thundersvm (GPU-capable)
- Rule based classifier: sklearn-expertsys
- Probabilistic modeling: pomegranate
- Graph Embedding and Community Detection: karateclub
- Anomaly detection: adtk
- Spiking Neural Network: norse
- Fuzzy Learning: fylearn, scikit-fuzzy
- Dimensionality reduction: fbpca
- Noisy Label Learning: cleanlab
- Few Shot Learning: keras-fewshotlearning
- Libraries: spacy , nltk, corenlp, deeppavlov, kashgari, camphr (spacy plugin for transformers, elmo, udify), transformers, simpletransformers, ernie, stanza, scispacy (spacy for medical documents)
- Preprocessing: textacy
- Text Extractio: textract (Image, Audio, PDF)
- Text Generation: gp2client, textgenrnn, gpt-2-simple
- Summarization: textrank, pytldr, bert-extractive-summarizer
- Spelling Correction: JamSpell, pyhunspell, pyspellchecker, cython_hunspell, hunspell-dictionaries, autocorrect (can add more languages), symspellpy
- Contraction Mapping: contractions
- Keyword extraction: rake, pke, phrasemachine
- Multiply Choice Question Answering: mcQA
- Sequence to sequence models: headliner
- Transfer learning: finetune
- Translation: googletrans, word2word, translate-python
- Embeddings: pymagnitude (manage vector embeddings easily), chakin (download pre-trained word vectors), sentence-transformers, InferSent, bert-as-service, sent2vec, sense2vec, zeugma (pretrained-word embeddings as scikit-learn transformers), BM25Transformer, laserembeddings, glove-python
- Multilingual support: polyglot, inltk (indic languages), indic_nlp
- NLU: snips-nlu
- Semantic parsing: quepy
- Inflections: inflect
- Contractions: pycontractions
- Coreference Resolution: neuralcoref
- Readability: homer
- Language Detection: language-check
- Topic Modeling: guidedlda, enstop, top2vec
- Clustering: spherecluster (kmeans with cosine distance), kneed (automatically find number of clusters from elbow curve), kmodes
- Metrics: seqeval (NER, POS tagging)
- String match: jellyfish (perform string and phonetic comparison),flashtext (superfast extract and replace keywords), pythonverbalexpressions: (verbally describe regex), commonregex (readymade regex for email/phone etc)
- Sentiment: vaderSentiment (rule based)
- Text distances: textdistance, editdistance, word-mover-distance, wmd-relax (word mover distance for spacy)
- PID removal: scrubadub
- Profanity detection: profanity-check
- Visualization: stylecloud (wordclouds), scattertext
- Fuzzy Search : fuzzywuzzy
- Named Entity Recognition(NER) : spaCy , Stanford NER, sklearn-crfsuite, med7(spacy NER for medical records)
- Fill blanks: fitbert
- Dictionary: vocabulary
- Nearest neighbor: faiss
- Sentence Segmentation: nnsplit
- Knowledge Distillation: textbrewer
- Library: speech_recognition, pyannotate
- Diarization: resemblyzer
- Factorization machines (FM), and field-aware factorization machines (FFM): xlearn, DeepCTR
- Scikit-learn like API: surprise
- Recommendation System in Pytorch: CaseRecommender
- Apriori algorithm: apyori
- Image processing: scikit-image, imutils
- Segmentation Models in Keras: segmentation_models
- Face recognition: face_recognition, face-alignment (find facial landmarks)
- GANS: mimicry
- Face swapping: faceit, faceit-live
- Video summarization: videodigest
- Semantic search over videos: scoper
- OCR: keras-ocr, pytesseract
- Object detection: luminoth
- Image hashing: ImageHash
- Pytorch: Keras like summary for pytorch, skorch (wrap pytorch in scikit-learn compatible API), catalyst
- Einstein notation: einops
- Scikit-learn: scikit-lego, iterstrat (cross-validation for multi-label data)
- Keras: keras-radam, larq (binarized neural networks), ktrain (fastai like interface for keras), tavolo (useful techniques from kaggle as utilities), tensorboardcolab (make tensorfboard work in colab), tf-sha-rnn
- Tensorflow: tensorflow-addons
- Learning curve: lrcurve (plot realtime learning curve in Keras), livelossplot
- Notifications: knockknock (get notified by slack/email), jupyter-notify (notify when task is completed in jupyter)
- Progress bar: fastprogress
- Visualize keras models: keras-vis
- Interpret models: eli5, lime, shap, alibi, tf-explain, treeinterpreter, pybreakdown, xai, lofo-importance
- Interpret BERT: exbert
- Interpret word2vec: word2viz
- Keras: keras-tuner
- Scikit-learn: sklearn-deap (evolutionary algorithm for hyperparameter search), hyperopt-sklearn
- General: hyperopt, optuna, evol, talos
- Draw CNN figures: nn-svg
- Visualization for scikit-learn: yellowbrick, scikit-plot
- XKCD like charts: chart.xkcd
- Convert matplotlib charts to D3 charts: mpld3
- Generate graphs using markdown: mermaid
- Visualize topics models: pyldavis
- High dimensional visualization: umap
- Visualization libraries: pygal, plotly, plotnine
- Interactive charts: bokeh
- Visualize architectures: netron
- Activation maps for keras: keract
- Create interactive charts online: flourish-studio
- Color Schemes: open-color,mplcyberpunk(cyberpunk style for matplotlib)
- Transpiling: sklearn-porter (transpile sklearn model to C, Java, JavaScript and others), m2cgen
- Pickling extended: cloudpickle, jsonpickle
- Parallelize Pandas: pandarallel, swifter, modin
- Parallelize numpy operations: numba
- Configuration Management: config, python-decouple
- Data Validation: schema, jsonschema, cerebrus, pydantic, marshmallow, validators
- Enable CORS in Flask: flask-cors
- Caching: cachetools, cachew (cache to local sqlite)
- Authentication: pyjwt (JWT)
- Task Queue: rq, schedule
- Database: flask-sqlalchemy, tinydb
- Logging: loguru
- Generate frontend with python: streamlit
- Generate images to fool model: foolbox
- Generate phrases to fool NLP models: triggers
- General: cleverhans
- Datetime compatible API for Bikram Sambat: nepali-date
- bloom filter: python-bloomfilter
- Run python libraries in sandbox: pipx
- Pretty print tables in CLI: tabulate
- Leaflet maps from python: folium
- Debugging: PySnooper
- Date and Time: pendulum
- Create interactive prompts: prompt-toolkit
- Concurrent database: pickleshare
- Aync: tomorrow
- Testing: crosshair(find failure cases for functions)
- CLI tools: gitjk: Undo what you just did in git