阅读列表:4

2023-07-11 二 22:36 2025-02-21 五 22:25

使用 RBM 模型的（显示有些场景深度网络不一定好，可能是过拟合）： Hugo Larochelle and Yoshua Bengio. Classification using discriminative restricted boltzmann machines. In Andrew McCallum and Sam Roweis, editors, Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pages 536–543. Omnipress, 2008.

以下就是说明深度网络对复杂数据有更好的学习能力的论文？这是如何分析的？ Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Zoubin Ghahramani, editor, Twenty-fourth International Conference on Machine Learning (ICML 2007), pages 473–480. Omnipress, 2007. URL http://www.machinelearning.org/proceedings/icml2007/papers/331.pdf

Training strategies `[0/7]`

☐
The curse of highly variable functions for local kernel machines.

修改历史

Yoshua Bengio, Olivier Delalleau, and Nicolas Le Roux.
☐ Scaling learning algorithms towards AI.
☐ Greedy Layer-Wise Training of Deep Networks 这篇是 2006 年
☐
Exploring Strategies for Training Deep Neural Networks
2009 年 Yoshua Bengio 团队的文章，算是对当时训练方式的综述？

Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM) to initialize the parameters of a deep belief network (DBN), a generative model with many layers of hidden causal variables.

在这之前，训练多层网络是一个困难问题，以至于只能训练到 2 层，然而加大深度的动机来自于 complexity theory of circuits

(Salakhutdinov and Murray, 2008; Larochelle and Bengio, 2008) 说明 deep architectures 不一定比浅层的 kernel 模型好，

但 (Larochelle et al., 2007) 则显示 there has been evidence of a benefit when the task is complex enough, and there is enough data to capture that complexity

这里提到了作者认为什么是好的表征： Each layer in a multi-layer neural network can be seen as a representation of the input obtained through a learned transformation. What makes a good internal representation of the data? We believe that it should disentangle the factors of variation that inherently explain the structure of the distribution
- When such a representation is going to be used for unsupervised learning, we would like it to preserve information about the input while being easier to model than the input itself.(“更易于建模”是指，经过内部表示后的数据结构更简单，模式更清晰，减少了原始数据中的复杂性和噪声，使得后续的模型更容易捕捉数据的本质特征。)
- When a representation is going to be used in a supervised prediction or classification task, we would like it to be such that there exists a “simple” (i.e., somehow easy to learn) mapping from the representation to a good prediction
(Fahlman and Lebiere, 1990; Lengelle´ and Denoeux, 1996) 等人用监督学习的方式构造这种内部表征. However, as we discuss here, the use of a supervised criterion at each stage may be too greedy and does not yield as good generalization as using an unsupervised criterion. 注意这里的用词是 too greedy ，这里的 greedy 说的是太过于早地去预测某个目标，从而抛弃了输入的部分特征：

Aspects of the input may be ignored in a representation tuned to be immediately useful (with a linear classifier) but these aspects might turn out to be important when more layers are available.

Combining unsupervised (e.g., learning about p(x)) and supervised components (e.g., learning about p(y|x)) can be be helpful when both functions p(x) and p(y|x) share some structure.

Hinton 在 2006 的 RBN deep belief network 提出的训练方法打开了训练更深网络的大门，在再深度网络中：
- Upper layers of a DBN are supposed to represent more “abstract” concepts that explain the input observation x,
- Lower layers extract “low-level features” from x.
In other words, this model first learns simple concepts, on which it builds more abstract concepts.

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Andrew McCallum and Sam Roweis, editors, Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pages 1096–1103. Omnipress, 2008. URL http://icml2008.cs.helsinki.fi/papers/592.pdf. 通过 mask input 来防止自编码器训练出 trivial 的拷贝网络

Goeffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527–1554, 2006.

Geoffrey E. Hinton. To recognize shapes, first learn to generate images. Technical Report UTMLTR 2006-003, University of Toronto, 2006.

Efficient Training

[2110.04366] Towards a Unified View of Parameter-Efficient Transfer Learning
- Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig
- ICLR2022; Ar5iv;
[2302.01107] A Survey on Efficient Training of Transformers
- Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen
- IJCAI 2023 under review; Ar5iv;
[2303.07910] Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm
- Hengyuan Zhao, Hao Luo, Yuyang Zhao, Pichao Wang, Fan Wang, Mike Zheng Shou
- Ar5iv
  
  编写一个 elisp 函数（command），检查光标当前是否是一个 https://arxiv.org 开头的 link 如果不是什么都不要做，如果是，比如 [2302.01107] A Survey on Efficient Training of Transformers 则把 link 中最后一个段落的字段提取出来，放到 Ar5iv 字段中，并且保存在粘贴板里，并发出 message 提醒

radioLinkPopups

#collections

如对本文有任何疑问，欢迎通过 github issue 或进行反馈

阅读列表:4

Genernal Deeplearning

为什么要深度网络

Training strategies `[0/7]`

Efficient Training

Genernal Deeplearning

为什么要深度网络

Training strategies [0/7]

Efficient Training

Training strategies `[0/7]`