Table of Contents
Classical Machine Learning¶
A Builderโs Guide to Mastering Traditional Algorithms with scikit-learn¶
Contents¶
๐ Preface¶
Part I โ PART I โ Foundations¶
Chapter 1: What Is Machine Learning?
1.1 Supervised vs Unsupervised Learning
1.2 Types of models (classification, regression, clustering)
1.3 Typical ML pipeline
1.4 Role of scikit-learn
Chapter 2: Anatomy of scikit-learn
2.1 How fit
, predict
, transform
, score
work
2.2 Pipelines and cross-validation
2.3 Hyperparameters vs parameters
2.4 API consistency
Part II โ Core Algorithms (Supervised Learning)¶
Chapter 3: Dummy Classifiers โ The Baselinec
3.1 Math Intuition: No mathโrandom or majority voting.
3.2 Code Walkthrough: Implement on Iris dataset; compare strategies.
3.3 Parameter Explanations: Strategy options (most_frequent, stratified).
3.4 Your personal growth and career alignment.
3.4 Source Code Dissection of DummyClassifier.
Chapter 4: Logistic & Linear Regression
4.1 Math Intuition + Geometry: Sigmoid function, log-odds, decision boundary.
4.2 Code Walkthrough: Binary/multi-class on Wine dataset.
4.3 Parameter Explanations: C (regularization), solvers, multi_class.
4.4 Model Tuning + Diagnostics: Grid search C; check coefficients for interpretability.
4.5 Source Code Dissection of LogisticRegression.
4.6 Math Intuition + Geometry: Least squares, hyperplanes; Ridge/Lasso penalties.
4.7 Code Walkthrough: Predict Boston Housing prices; compare OLS vs Ridge.
4.8 Parameter Explanations: Alpha for regularization, degree for polynomial.
4.9 Model Tuning + Diagnostics: Cross-validate alpha; plot residuals.
4.10 Source Code Dissection of LinearRegression.
Chapter 5: K-Nearest Neighbors (KNN)
5.1 Math Intuition + Geometry: Distance metrics (Euclidean), voting in feature space.
5.2 Code Walkthrough: Classify on Iris dataset with varying k.
5.3 Parameter Explanations: n_neighbors, weights, metric.
5.4 Model Tuning + Diagnostics: Elbow plot for k; curse of dimensionality.
5.5 Source Code Dissection of KNeighborsClassifier.
Chapter 6: Decision Trees
6.1 Math Intuition + Geometry: Entropy/ Gini, recursive splitting.
6.2 Code Walkthrough: Build on HAR dataset; visualize tree.
6.3 Parameter Explanations: max_depth, min_samples_split, criterion.
6.4 Model Tuning + Diagnostics: Prune with CV; feature importance.
6.5 Source Code Dissection of DecisionTreeClassifier.
Chapter 7: Support Vector Machines (SVM)
7.1 Math Intuition + Geometry: Margins, kernels, Lagrange multipliers.
7.2 Code Walkthrough: RBF SVM on HAR dataset with PCA.
7.3 Parameter Explanations: C, gamma, kernel types.
7.4 Model Tuning + Diagnostics: Grid search; plot decision boundaries.
7.5 Deep Dive: Advanced kernel math.
7.6 Source Code Dissection of SVC.
Chapter 8: Naive Bayes Classifiers
8.1 Math Intuition + Geometry: Bayes theorem, conditional independence.
8.2 Code Walkthrough: Text classification on a simple dataset.
8.3 Parameter Explanations: Alpha (smoothing), priors.
8.4 Model Tuning + Diagnostics: Handle zero probabilities; compare variants.
8.5 Source Code Dissection of GaussianNB.
Chapter 9: Random Forests and Bagging
9.1 Math Intuition + Geometry: Bootstrap aggregating, ensemble voting.
9.2 Code Walkthrough: Random Forest on Wine dataset.
9.3 Parameter Explanations: n_estimators, max_features, bootstrap.
9.4 Model Tuning + Diagnostics: OOB score; feature importance.
9.5 Source Code Dissection of RandomForestClassifier.
Chapter 10: Gradient Boosting (HistGradientBoostingClassifier)
10.1 Math Intuition + Geometry: Gradient descent on residuals, additive trees.
10.2 Code Walkthrough: Boost on HAR dataset.
10.3 Parameter Explanations: learning_rate, max_depth, early_stopping.
10.4 Model Tuning + Diagnostics: Monitor loss; avoid overfitting.
10.5 Deep Dive: XGBoost comparison.
Part III โ Core Algorithms (Unsupervised Learning)¶
Chapter 11: K-Means Clustering
11.1 Math Intuition + Geometry: Centroids, within-cluster sum of squares.
11.2 Code Walkthrough: Cluster Iris dataset; elbow method for k.
11.3 Parameter Explanations: n_clusters, init, n_init.
11.4 Model Tuning + Diagnostics: Silhouette scores; visualize clusters.
11.5 Source Code Dissection of KMeans.
Chapter 12: Hierarchical Clustering
12.1 Math Intuition + Geometry: Dendrograms, linkage methods.
12.2 Code Walkthrough: Agglomerative clustering on Wine dataset.
12.3 Parameter Explanations: linkage, affinity, n_clusters.
12.4 Model Tuning + Diagnostics: Cut dendrogram; compare linkages.
12.5 Source Code Dissection of AgglomerativeClustering.
Chapter 13: DBSCAN and Density-Based Clustering
13.1 Math Intuition + Geometry: Core points, density reachability.
13.2 Code Walkthrough: Detect clusters in noisy data.
13.3 Parameter Explanations: eps, min_samples.
13.4 Model Tuning + Diagnostics: Handle noise; parameter sensitivity.
13.5 Source Code Dissection of DBSCAN.
Part IV โ Model Evaluation & Tuning¶
Chapter 14: Model Evaluation Metrics
14.1 Accuracy, precision, recall, F1
14.2 Confusion Matrix, ROC, PR Curves
14.3 When metrics disagree
Chapter 15: Cross-Validation & StratifiedKFold
15.1 Why we need CV
15.2 KFold vs Stratified
15.3 cross_validate
, GridSearchCV
, RandomizedSearchCV
Chapter 16: Hyperparameter Tuning
16.1 Grid search vs random search
16.2 Search space design
16.3 Practical examples with SVM and RF
Chapter 17: Probability Calibration
17.1 Why predicted probabilities can lie
17.2 Platt scaling (sigmoid), isotonic regression
17.3 CalibratedClassifierCV
explained
Chapter 18: Choosing Decision Thresholds
18.1 Predicting probabilities vs predicting classes
18.2 Optimizing for F1, cost-sensitive thresholds
18.3 Manual threshold tuning with plots
Part V โ Data Engineering & Preprocessing¶
Chapter 19: Feature Scaling and Transformation
19.1 StandardScaler, MinMaxScaler
19.2 When to scale and why
19.3 Scaling inside pipelines
Chapter 20: Dimensionality Reduction
20.1 PCA: Math and scikit-learn usage
20.2 Using PCA with pipelines
20.3 Visualization
Chapter 21: Dealing with Imbalanced Datasets
21.1 What is imbalance?
21.2 SMOTE and oversampling
21.3 Class weights vs resampling
Part VI โ Advanced Topics¶
Chapter 22: Pipelines and Workflows
22.1 Building maintainable ML pipelines
22.2 Pipeline
, ColumnTransformer
, custom steps
Chapter 23: Under the Hood of scikit-learn
23.1 How fit
is structured
23.2 Estimator base classes
23.3 Digging into the source
Appendices & Templates¶
A. Glossary of ML terms
B. scikit-learn cheat sheet
C. Tips for debugging models
D. Further reading and learning roadmap