导图社区机器学习课程总结

机器学习课程总结

以理论为主，包括优化、泛化、监督学习、无监督学习等内容。

编辑于2022-01-05 19:58:19

Q¹Z²H³

他的近期作品查看更多>>

机器学习课程总结
以理论为主，包括优化、泛化、监督学习、无监督学习等内容。

机器学习课程总结

社区模板帮助中心，点此进入>>

Q¹Z²H³

他的近期作品查看更多>>

机器学习课程总结
以理论为主，包括优化、泛化、监督学习、无监督学习等内容。

相似推荐
大纲

互联网9大思维
- 34.6k
- 941
- 2.4k
- 397
- 0
MindMaster
组织架构-单商户商城webAPP 思维导图。
- 14.8k
- 3
- 185
- 9
- 0
Kacyun
域控上线
- 1.6k
- 165
- 11
- 4
- 0
jackrao
python思维导图
- 5.4k
- 537
- 242
- 7
- 0
(*^▽^*)
css
- 1.2k
- 1
- 43
- 3
- 0
A张舫
CSS
- 3.3k
- 268
- 189
- 33
- 0
journey
计算机操作系统思维导图
- 4.3k
- 342
- 204
- 18
- 0
journey
计算机组成原理
- 1.5k
- 98
- 70
- 8
- 0
journey
IMX6UL(A7)
- 528
- 41
- 5
- 0
- 0
Handler XU
考试学情分析系统
- 705
- 51
- 10
- 1
- 0
蒋龙

Machine Learning

Supervised Learning

Framwork 1,2

Training set, test set, validation set, loss function

Empirical loss, population loss

Optimization vs generalization

Memorization function

Cross validation

Overfit vs underfit, classical view vs modern view

Unsupervised learning

Semi-supervised learning

notice manifold assumption

Linear Method 6

Perceptron algorithm

Limitations and theoretical guarantees

Logistic regression

Cross entropy for probabilities

Motivation and intuitions

Ridge Lasso SVM 9 10

Ridge regression: definition, intuition

Lasso regression: definition, intuition

L1 'ball' gives a diamond , namely sparse intersections

Compressed sensing 7/8/9

Compare with Lasso

RIP condition

Theory analysis for compressed sensing

Compressed sensing in non-linear setting

understand the result

SVM 9/10

Hard margin

Soft margin

Understand how to use kernel, and why we use kernel

What is kernel trick?

What does Mercer’s theorem mean?

Decision Tree, Random Forest, Boosting 17/18/19/20

Decision tree and Boolean functional analysis

Understand why decision tree can be convert into a sparse low degree polynomial

KM, LMN, Compressed sensing

Gini index, can use that to fit a decision tree

Random forest’s algorithm

Adaboost algorithm

boosting framework

Training error proof

generalization proof

Gradient boosting interpretation

XGBoost

Unsupervised Learning

Framwork 2

PCA 15

Definitions

computing the important directions of the data set

Different interpretations of PCA

notice we need to center the data points

Power method

Nearest Neighbor 16/17

Problem formulation

NN -- RNN -- Randomized

Locality sensitive hashing

Understand the definition, proof and intuition

How to construct a LSH library for ℓ_2

Metric learning:

NCA

LMNN

K-means/Spectral graph cluster 24/25

K-means

How to construct a graph, and how it generalizes standard clustering

Graph Laplacian

Theorem of #connected components = # 0 eigenvalues of L

Spectral graph clustering algorithm

notice power method

The relationship between the algorithm and ratio cut problem

t-SNE 26

SNE algorithm

Crowding problem

keep relative ordering instead of exact distance, shift to longer distance

Why picking student t-distribution?

Optimization Theory

GD, SGD 3,4

Zeroth order, first order, second order methods

Smoothness, convexity, strong convexity

Hessian matrix, eigenvalue

Convergence analysis of GD on convex and smooth functions

Telescoping step

Limitation of GD, why use SGD

Batch size, convergence analysis of SGD

Optimization for training vs generalization, is the optimal necessary?

SVRG 5

SVRG analysis

Two layers NN 14/15

the relationship between optimization proof for neural network and the classical kernel method

Definition of Z matrix and H matrix

What’s the intuition

notcie whether consider a_i

Updating rule of f(x)-y

The implication and limitation of this type of analysis

Why H(t)≈H(0)

Convergence theorem

Matrix completion, no-convex optimization 23 24

Non-convex analysis for GD

Matrix completion

Problem formulation

collaborative filtering

Low rank

Known entries are uniformly distributed

Otherwise, we may find adversarial examples, e.g., some columns/rows are empty

Incoherence

How to solve?

minimizing rank(A) -- hard, optimizing the number of non-zero singular values

Alternative minimization

Escape saddle points

Main theorem, basic assumptions

assume all local min are equally good

Not required: how to prove the theorem

Escape local min: not required

Generalization Theory

No Free Lunch 10

Understand the related proofs

PAC Learning 11

ERM algorithm

Bayes optimal

Agnostic PAC learning

VC Dimension 11,12

Only need to understand the definition, and how to use it

Rademacher Complexity 12 13 14

definitions

Understand the proofs, and simple example

Use basic lemmas to upper bound Rademacher complexity