导图社区 few-shot learning

few-shot learning

overview on deep-learning based few-shot learning

编辑于2019-07-10 07:41:47

矩阵

Paang Kwan-fuk

他的近期作品查看更多>>

few-shot learning

社区模板帮助中心，点此进入>>

Paang Kwan-fuk

他的近期作品查看更多>>

相似推荐
大纲

互联网9大思维
- 34.6k
- 941
- 2.4k
- 397
- 0
MindMaster
组织架构-单商户商城webAPP 思维导图。
- 14.8k
- 3
- 185
- 9
- 0
Kacyun
域控上线
- 1.6k
- 165
- 11
- 4
- 0
jackrao
python思维导图
- 5.4k
- 537
- 242
- 7
- 0
(*^▽^*)
css
- 1.2k
- 1
- 43
- 3
- 0
A张舫
CSS
- 3.3k
- 268
- 189
- 33
- 0
journey
计算机操作系统思维导图
- 4.3k
- 342
- 204
- 18
- 0
journey
计算机组成原理
- 1.5k
- 98
- 70
- 8
- 0
journey
IMX6UL(A7)
- 528
- 41
- 5
- 0
- 0
Handler XU
考试学情分析系统
- 705
- 51
- 10
- 1
- 0
蒋龙

deep learning based few-shot learning

metric learning

advantages: spare costly computations; good performancedisadvantages: parameter updates only occur within the long time horizon of the outer training loop, preventsing these methods from performing adaptation at test time (not adaptive)

Siamese network

$L^1$ distancesoftmax loss

Matching network

feature extractor with memory (LSTM)cosine distance

Prototypical network

prototype: mean of the samples with the same class$L^2$ distancesoftmax loss

relation network

learn a metric space

GNN

learn a metric spacetransductive inference

TADAM

task dependent: weight predictionadaptive metric: distance scaling

Embedded Class Models and Shot-Free Meta Training

embedded class model: learnable prototypeshot-free: any number of ways and any number of shots

meta learning

advantages: domain independentdisadavntages: explicitly encode a particular strategy for the meta-learner to follow (namely, adaptation via gradient descent at test time). In a particular domain, there may exist better strategies that exploit the structure of the task, but gradient-based methods will be unable to discover them

initialization

advantages: adaptivedisadvantages: backpropagation through gradient descent steps is costly in terms of memory, and thus the total number of steps must be kept small

MAML

inner and outer optimizationprone to overfitting

reptile

repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task incorporating an L2 loss which updates the meta-model parameters towards the instance-specific adapted models

ridge regression

optimization

LSTM-based meta-learner

fuse SGD to LSTM to update weightsneed fine-tuning on the target problem

weight prediction

learnet

predict factorized weights of conv layers

meta network

aim: rapid learning and generalizationfast predict part of the weights of feature extractor and the base learnerslowly learn weights of the meta learner and the rest ones of the base learner

dynamic few-shot learning without fogetting

dynamic: not forget previous classes attention-based weight generator $G: (X,W_\text{base},\phi) \mapsto w$$X$: input features$W_\text{base}$: base class weights$\phi$: learnable weights$w = \phi_\text{avg}\odot w_\text{avg} + \phi_\text{att}\odot\w_\text{att}$ cosine distance: normalized fully-connected layersoftmax loss

Predicting Parameters from Activations

category-agnostic parameter predictor $\phi:\bar a_y \mapsto w_y$, a linear transformation$\bar a_y$: mean of the activations

Latent Embedding Optimization

generate latent embeddings of model parameters via variational inference

memory based

an RNN iterates over an examples of given problem and accumulates the knowledge required to solve that problem in its hidden activations, or external memory face issues in ensuring that they reliably store all the, potentially long term, historical information of relevance without forgetting

Memory-Augmented Neural Networks

the NTM saves samples that have seen and takes $(x_t,y_{t - 1})$ as input and predict $y_t$

LEARNING TO REMEMBER RARE EVENTS

key-value pairs as (activation, groundtruth)

Memory Matching Networks

memory matching network writes the features of a set of labelled images (support set) into memory and reads from memory when performing inference to holistically leverage the knowledge in the set Contextual Learner employs the memory slots in a sequential manner to predict the parameters of CNNs for unlabelled images

transductive learning

TPN

probabilistic methods

hallucination

AGA

hallucination

imaginary data

DAGAN

DADA

deep learning

large scale labelled dataset

training is time-costing

laborous to collect data

insufficient extensibility

retrain to adapt to novel classes