导图社区 few-shot learning
overview on deep-learning based few-shot learning
编辑于2019-07-10 07:41:47deep learning based few-shot learning
metric learning
advantages: spare costly computations; good performancedisadvantages: parameter updates only occur within the long time horizon of the outer training loop, preventsing these methods from performing adaptation at test time (not adaptive)
Siamese network
$L^1$ distancesoftmax loss
Matching network
feature extractor with memory (LSTM)cosine distance
Prototypical network
prototype: mean of the samples with the same class$L^2$ distancesoftmax loss
relation network
learn a metric space
GNN
learn a metric spacetransductive inference
TADAM
task dependent: weight predictionadaptive metric: distance scaling
Embedded Class Models and Shot-Free Meta Training
embedded class model: learnable prototypeshot-free: any number of ways and any number of shots
meta learning
advantages: domain independentdisadavntages: explicitly encode a particular strategy for the meta-learner to follow (namely, adaptation via gradient descent at test time). In a particular domain, there may exist better strategies that exploit the structure of the task, but gradient-based methods will be unable to discover them
initialization
advantages: adaptivedisadvantages: backpropagation through gradient descent steps is costly in terms of memory, and thus the total number of steps must be kept small
MAML
inner and outer optimizationprone to overfitting
reptile
repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task incorporating an L2 loss which updates the meta-model parameters towards the instance-specific adapted models
ridge regression
optimization
LSTM-based meta-learner
fuse SGD to LSTM to update weightsneed fine-tuning on the target problem
weight prediction
learnet
predict factorized weights of conv layers
meta network
aim: rapid learning and generalizationfast predict part of the weights of feature extractor and the base learnerslowly learn weights of the meta learner and the rest ones of the base learner
dynamic few-shot learning without fogetting
dynamic: not forget previous classes attention-based weight generator $G: (X,W_\text{base},\phi) \mapsto w$$X$: input features$W_\text{base}$: base class weights$\phi$: learnable weights$w = \phi_\text{avg}\odot w_\text{avg} + \phi_\text{att}\odot\w_\text{att}$ cosine distance: normalized fully-connected layersoftmax loss
Predicting Parameters from Activations
category-agnostic parameter predictor $\phi:\bar a_y \mapsto w_y$, a linear transformation$\bar a_y$: mean of the activations
Latent Embedding Optimization
generate latent embeddings of model parameters via variational inference
memory based
an RNN iterates over an examples of given problem and accumulates the knowledge required to solve that problem in its hidden activations, or external memory face issues in ensuring that they reliably store all the, potentially long term, historical information of relevance without forgetting
Memory-Augmented Neural Networks
the NTM saves samples that have seen and takes $(x_t,y_{t - 1})$ as input and predict $y_t$
LEARNING TO REMEMBER RARE EVENTS
key-value pairs as (activation, groundtruth)
Memory Matching Networks
memory matching network writes the features of a set of labelled images (support set) into memory and reads from memory when performing inference to holistically leverage the knowledge in the set Contextual Learner employs the memory slots in a sequential manner to predict the parameters of CNNs for unlabelled images
transductive learning
TPN
probabilistic methods
hallucination
AGA
hallucination
imaginary data
DAGAN
DADA
deep learning
large scale labelled dataset
training is time-costing
laborous to collect data
insufficient extensibility
retrain to adapt to novel classes