From 6539c7f5b0c798901c9546be5de33f3badcc8b30 Mon Sep 17 00:00:00 2001 From: Mastan Sayyad Date: Mon, 17 Jun 2024 19:22:24 +0530 Subject: [PATCH] Added resources for ML --- README.md | 510 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 509 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 1588b1dab..52d34f306 100644 --- a/README.md +++ b/README.md @@ -41,6 +41,13 @@ If later found out the points will be deducted. you cant be earning more than 60 - [Model optimizing](#model-optimization) - [Model deploying](#model-deployment) - [Machine learning algorithms](#machine-learning-algorithms) + - [Machine Learning Python](#machine-learning-python) + - [Python General-Purpose Machine Learning](#python-general-purpose-machine-learning) + - [Data Manipulation | Data Analysis | Data Visualization](#data-manipulation--data-analysis--data-visualization) + - [Machine Learning R](#machine-learning-r) + - [R General-Purpose Machine Learning](#r-general-purpose-machine-learning) + - [Data Manipulation | Data Analysis | Data Visualization](#data-manipulation--data-analysis--data-visualization-1) +- [Kaggle Competition Source Code](#kaggle-competition-source-code) - [Books](#books) - [Datasets](#datasets) - [GitHub Repositories](#github-repositories) @@ -429,6 +436,485 @@ If later found out the points will be deducted. you cant be earning more than 60 +### Machine Learning Python +> Machine learning using Python, that you can learn. + +#### Python General-Purpose Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Resource NameDescription
XADFast and easy-to-use backpropagation tool.
AimAn easy-to-use & supercharged open-source AI metadata tracker.
RexMexA general-purpose recommender metrics library for fair evaluation.
ChemicalXA PyTorch based deep learning library for drug pair scoring.
Microsoft ML for Apache SparkA distributed machine learning framework for Apache Spark.
ShapleyA data-driven framework to quantify the value of classifiers in a machine learning ensemble.
igelA delightful machine learning tool that allows you to train/fit, test and use models without writing code.
ML Model buildingA repository containing Classification, Clustering, Regression, and Recommender Notebooks with illustrations.
ML/DL project templateA template for deep learning projects using PyTorch Lightning.
PyTorch FrameA Modular Framework for Multi-Modal Tabular Learning.
PyTorch GeometricGraph Neural Network Library for PyTorch.
PyTorch Geometric TemporalA temporal extension of PyTorch Geometric for dynamic graph representation learning.
Little Ball of FurA graph sampling extension library for NetworkX with a Scikit-Learn like API.
Karate ClubAn unsupervised machine learning extension library for NetworkX with a Scikit-Learn like API.
Auto_ViMLAutomatically Build Variant Interpretable ML models fast! Comprehensive Python AutoML toolkit.
PyODPython Outlier Detection toolkit for detecting outlying objects in multivariate data.
steppyLightweight Python library for fast and reproducible machine learning experimentation.
steppy-toolkitCurated collection of neural networks, transformers, and models for efficient machine learning.
CNTKMicrosoft Cognitive Toolkit (CNTK), an open-source deep-learning toolkit.
CoulerUnified interface for constructing and managing machine learning workflows on different engines.
auto_mlAutomated machine learning for production and analytics.
dtaidistanceHigh performance library for time series distances (DTW) and clustering.
einopsDeep learning operations reinvented for pytorch, tensorflow, jax, and others.
machine learningAutomated build consisting of a web-interface and programmatic-interface API for support vector machines.
XGBoostPython bindings for eXtreme Gradient Boosting (Tree) Library.
ChefBoostA lightweight decision tree framework for Python with categorical feature support and advanced techniques.
Apache SINGAAn Apache Incubating project for developing an open source machine learning library.
+ +#### Data Manipulation | Data Analysis | Data Visualization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Resource NameDescription
DataComPyA library to compare Pandas, Polars, and Spark data frames with stats and match accuracy adjustment.
DataVisualizationA GitHub repository to learn data visualization basics to intermediate levels.
CartopyA Python package for geospatial data processing and map production.
SciPyA Python-based ecosystem for mathematics, science, and engineering.
NumPyA fundamental package for scientific computing with Python.
AutoVizAutomatic visualization of any dataset with a single line of Python code.
NumbaPython JIT (just in time) compiler to LLVM aimed at scientific Python.
MarsA tensor-based framework for large-scale data computation.
NetworkXA high-productivity software for complex networks.
igraphBinding to igraph library - General purpose graph library.
PandasHigh-performance, easy-to-use data structures and data analysis tools for Python.
ParaMontePython library for Bayesian data analysis and visualization via Monte Carlo and MCMC simulations.
VaexHigh performance Python library for lazy Out-of-Core DataFrames, suitable for big tabular datasets.
PyTables (tables)Manage hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data.
PyTorch GeometricLibrary for deep learning on irregular input data such as graphs, point clouds, and manifolds.
bqplotAn API for plotting in Jupyter (IPython).
bokehInteractive Web Plotting for Python.
plotlyCollaborative web plotting for Python and matplotlib.
altairA Python to Vega translator for visualization.
d3pyA plotting library for Python based on D3.js.
PyDexterSimple plotting for Python; wrapper for D3xterjs to render charts in-browser.
ggplotSame API as ggplot2 for R (Deprecated).
ggfortifyUnified interface to ggplot2 popular R packages.
Kartograph.pyRendering beautiful SVG maps in Python.
pygalA Python SVG Charts Creator.
PyQtGraphA pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
+ +### Machine Learning R +> Machine learning using R. + +#### R General-Purpose Machine Learning + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Resource NameDescription
Clever Algorithms For Machine LearningCollection of machine learning algorithms implemented in various languages, including R.
CORElearnPackage for classification, regression, feature evaluation, and ordinal evaluation.
CubistRule- and instance-based regression modeling.
e1071Miscellaneous functions of the Department of Statistics (e1071), TU Wien.
earthMultivariate adaptive regression spline models.
elasticnetElastic-net for sparse estimation and sparse PCA.
ElemStatLearnData sets, functions, and examples from "The Elements of Statistical Learning".
evtreeEvolutionary learning of globally optimal trees.
forecastTime series forecasting using various models including ARIMA, ETS, TBATS.
forecastHybridAutomatic ensemble and cross validation of time series models.
fpcFlexible procedures for clustering.
frbsFuzzy rule-based systems for classification and regression tasks. [Deprecated]
GAMBoostGeneralized linear and additive models by likelihood-based boosting. [Deprecated]
gamboostLSSBoosting methods for generalized additive models for location, scale, and shape.
gbmGeneralized boosted regression models.
glmnetLasso and elastic-net regularized generalized linear models.
glmpathL1 regularization path for generalized linear models and Cox proportional hazards model.
GMMBoostLikelihood-based boosting for generalized mixed models. [Deprecated]
grplassoFitting user-specified models with group Lasso penalty.
grpregRegularization paths for regression models with grouped covariates.
h2oFramework for fast, parallel, and distributed machine learning algorithms at scale.
hdaHeteroscedastic discriminant analysis. [Deprecated]
Introduction to Statistical LearningBook covering statistical learning methods, useful for practical applications.
ipredImproved predictors for classification and regression tasks.
kernlabKernel-based machine learning lab for support vector machines and kernel methods.
klaRClassification and visualization techniques.
L0LearnFast algorithms for best subset selection in regression models.
+ +#### Data Manipulation | Data Analysis | Data Visualization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Resource NameDescription
dplyrA data manipulation package that helps solve common data manipulation problems.
ggplot2A data visualization package based on the grammar of graphics.
tmap and leaflettmap for visualizing geospatial data with static maps and leaflet for interactive maps.
tm and quantedaMain packages for managing, analyzing, and visualizing textual data.
shinyBasis for interactive displays and dashboards in R.
htmlwidgets, including plotly, dygraphs, highcharter, etc.Brings JavaScript libraries for interactive visualizations to R.
+ +### Kaggle Competition Source Code +> Kaggle Source code and experiments results. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
RepositoryDescription
open-solution-home-creditSource code and experiments results for Home Credit Default Risk competition.
open-solution-googleai-object-detectionSource code and experiments results for Google AI Open Images - Object Detection Track competition.
open-solution-salt-identificationSource code and experiments results for TGS Salt Identification Challenge.
open-solution-ship-detectionSource code and experiments results for Airbus Ship Detection Challenge.
open-solution-data-science-bowl-2018Source code and experiments results for 2018 Data Science Bowl.
open-solution-value-predictionSource code and experiments results for Santander Value Prediction Challenge.
open-solution-toxic-commentsSource code for Toxic Comment Classification Challenge.
wiki challengeImplementation of Dell Zhang's solution to Wikipedia's Participation Challenge.
kaggle insultsKaggle Submission for "Detecting Insults in Social Commentary".
kaggle_acquire-valued-shoppers-challengeCode for the Kaggle acquire valued shoppers challenge.
kaggle-cifarCode for the CIFAR-10 competition at Kaggle using cuda-convnet.
kaggle-blackboxDeep learning made easy for Kaggle competitions.
kaggle-accelerometerCode for Accelerometer Biometric Competition at Kaggle.
kaggle-advertised-salariesPredicting job salaries from ads - a Kaggle competition.
kaggle-amazonAmazon access control challenge at Kaggle.
kaggle-bestbuy_bigCode for the Best Buy competition at Kaggle.
kaggle-bestbuy_smallCode for the Best Buy competition at Kaggle (small version).
Kaggle Dogs vs. CatsCode for Kaggle Dogs vs. Cats competition.
Kaggle Galaxy ChallengeWinning solution for the Galaxy Challenge on Kaggle.
Kaggle GenderA Kaggle competition: discriminate gender based on handwriting.
Kaggle MerckMerck challenge at Kaggle.
Kaggle StackoverflowPredicting closed questions on Stack Overflow.
### Books @@ -455,7 +941,29 @@ If later found out the points will be deducted. you cant be earning more than 60 "Data Mining: Practical Machine Learning Tools and Techniques" provides a comprehensive overview of the field of data mining and machine learning. Authored by Ian H. Witten, Eibe Frank, and Mark A. Hall, this book is widely regarded as an essential resource for students, researchers, and practitioners in the field. free - + + + Distributed Machine Learning Patterns + This book teaches you how to take machine learning models from your personal laptop to large distributed clusters. You’ll explore key concepts and patterns behind successful distributed machine learning systems, and learn technologies like TensorFlow, Kubernetes, Kubeflow, and Argo Workflows directly from a key maintainer and contributor, with real-world scenarios and hands-on projects. Paid + + + + Grokking Machine Learning + Grokking Machine Learning teaches you how to apply ML to your projects using only standard Python code and high school-level math. + Paid + + + + Machine Learning Bookcamp + Learn the essentials of machine learning by completing a carefully designed set of real-world projects. + Paid + + + Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow + Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This bestselling book uses concrete examples, minimal theory, and production-ready Python frameworks (Scikit-Learn, Keras, and TensorFlow) to help you gain an intuitive understanding of the concepts and tools for building intelligent systems. + Paid + + ### Datasets