Lecture I : Introduction of Deep Learning

• Introduction of Deep Learning
• Step1 : define a set of function
• Step2 : goodness of function
• Step3 : pick the best function
• Why Deep?
• "Hello World" for Deep Learning

Lecture II : Tips for Training Deep Neural Network

• Recipe of Deep Learning
• Choosing proper loss
• Mini-batch
• New activation function
• Momentum
• Early Stopping
• Weight Decay
• Regularization
• Dropout
• Network Structure

Lecture III : Variants of Neural Network

• Convolutional Neural Network(CNN)
• Recurrent Neural Network(RNN)

Lecture IV : Next Wave

• Supervised Learning
• Ultra Deep Network
• Attention Model
• Reinforcement Learning
• Unsuperivised Learning
• Image:Realizing what the World Looks Like
• Text:Understanding the Meaning of Words
• Audio:Learning human language without supervision

# 1 Introduction of Deep Learning

## 1.1 Three Steps for Deep Learning

• Step1: define a set of function (Neural Network)
• Step2: goodness of function
• Step3: pick the best function

## 1.2 Step1: Neural Network

### 1.2.2 Output Layer(Option)

• Softmax(归一化指数函数)：它能将一个含任意实数的k维向量Z“压缩”到另一个k维向量$\sigma(Z)$中，使得每一个元素的范围都在(0, 1)之间，并且所有元素的和为1。

### 1.2.3 Example Application

• Handwriting Digit Recognition

• Total Loss:

## 1.4 Step3: Pick the Best Function

• RBM(Restricted Boltzmann Machine): 受限玻尔兹曼机，这部分可以参考链接：https://zhuanlan.zhihu.com/p/22794772

• Then Compute $\partial L / \partial w$ , if Negative then Increase w; elif Positive then decrease w

• $\eta$ is called “learning rate”

• Randomly pick a starting point

• Backpropagation(反向传播算法)：an efficient way to compute $\partial L / \partial w$ , link below:

## 1.5 Deep is Better

### 1.5.2 Thin + Tall is Better

• Neural network consists of neurons

• A hidden layer network can represent any continuous function

• Using multiple layers of neurons to represent some functions are much simper

• Less parameters, less data

## 1.6 Toolkit

### 1.6.2 Example of Handwriting Digit Recognition

#### Testing

score = model.evaluate(x_test, y_test)
print('Total loss on Testing Set: ', score[0])
print('Accuracy of Testing Set: ', score[1])
result = model.predict(x_test)


### 1.6.3 GPU to Speeding Training

• Way1

THEANO_FLAGGS=device=gpu0 python YourCode.py

• Way2

import os
os.environ["THEANO_FLAGS"] = "device=gpu0"


# 2 Tips for Training Deep Neural Network

## 2.1 Good Results on Training Data

### 2.1.3 New Activation Function

#### ReLU

model.add(Activation('sigmoid'))


#### Learning Rates

• If learning rate is too large, total loss may not decrease after each update
• If learning rate is too small, training would be too slow

Notes:

• Learning rate is smaller and smaller for all parameters
• Smaller derivatives, larger learning rate, and vice versa

## 2.2 Good Results on Testing Data

### 2.2.1 Early Stopping

#### Why Overfitting

• Learning target is defined by the training data.
• The parameters achieving the learning target do not necessary have good results on the testing data.

### 2.2.2 Weight Decay

Weight decay is one kind of regularization.

• Our brain prunes out the useless link between neurons.
• Doing the same thing to machine’s brain imporves the performance.

### 2.2.3 Dropout

#### Training

• Each time before updating the parameters

• Each neuron has p% to dropout
• The structure of the network is changed.
• Using the new network for training
• For each mini-batch, we resample the dropout neurons

### 2.2.4 Network Structure

e.g. CNN is another good example.

# 3 Variants of Neural Network

## 3.1 Convolutional Neural Network (CNN)

### 3.1.1 Why CNN for Image

• When processing image, the first layer of fully connected network would be very large.
• Some patterns are much smaller than the whole image. A neuron does not have to see the whole image to discover the pattern.
• The same patterns appear in different regions.
• Subsampling the pixels will not change the object, so we can subsample the pixels to make image smaller.

### 3.1.2 Three Steps

#### Step1: Convolutional Neural Network

##### Max Pooling

• Smaller than the original image.
• The number of the channel is the number of filters.

# 4 Next Wave

## 4.2 Reinforcement Learning

### 4.2.3 Difficulties of Reinforcement Learning

• It may be better to sacrifice immediate reward to gain more long-term reward.
• Agent’s actions affect the subsequent data it receives.

## 4.3 Unsupervised Learning

### 4.3.2 Text: Understanding the Meaning of Words

• Machine learn the meaning of words from reading a lot of documents without supervision
• A word can be understood by its context

### 4.3.3 Audio: Learning Human Language Without Supervision

• Audio segment corresponding to an unknown word (Fixed-length vector)
• The audio segments correspondsing to words with similar pronunciations are close to each other.

Deep Learning
• 上學期的「機器學習」錄影
• Deep generative model (Part 1):
• Deep generative model (Part 2):

[机器学习入门] 李宏毅机器学习笔记-37(Deep Reinforcement Learning;深度增强学习入门)

PDF VIDEO

# Deep Reinforcement Learning

## Scenario of Reinforcement Learning

### Learning to paly Go

AlphaGo 采取的策略是先用监督学习learn的不错后，再用增强学习狂下棋。

### Example: Playing Video Game

Play yourself: http://www.2600online.com/spaceinvaders.htm l • How

## Outline

Alpha Go 用的方法是：policy-based + value-based + model-based

## Policy-based Approach

Learning an Actor

function是Pi，input是environment，output是Action。

### Step 1： Neural Network of function

NN的好处就是比较generalized，即使没见过的场景，也可能给出合理的结果。

### Step 2:goodness of function

Review: Supervised learning

### Step 3: pick the best function

The probability of the actions not sampled will decrease.

## Value-based Approach

Learning a Critic

end！