• 【中英】【吴恩达课后测验】Course 3 -结构化机器学习项目 - 第二周测验 - 自动驾驶（案例研究） 上一篇：【课程3 - 第一周测验】※※※※※ 【回到目录】※※※※※下一篇：【课程4 -第一周测验】 第2周测验...
【中英】【吴恩达课后测验】Course 3 -结构化机器学习项目 - 第二周测验 - 自动驾驶（案例研究）

上一篇：【课程3 - 第一周测验】※※※※※ 【回到目录】※※※※※下一篇：【课程4 -第一周测验】

第2周测验 - 自动驾驶（案例研究）

为了帮助你练习机器学习策略，本周我们将介绍另一种场景并询问你将如何做。我们认为这个在机器学习项目中工作的“模拟器”将给出一个引导机器学习项目的任务。

你受雇于一家创业的自动驾驶的创业公司。您负责检测图片中的路标（停车标志，行人过路标志，前方施工标志）和交通信号标志（红灯和绿灯），目标是识别哪些对象出现在每个图片中。例如，上面的图片包含一个行人过路标志和红色交通信号灯标志。
您的100,000张带标签的图片是使用您汽车的前置摄像头拍摄的，这也是你最关心的数据分布，您认为您可以从互联网上获得更大的数据集，即使互联网数据的分布不相同，这也可能对训练有所帮助。你刚刚开始着手这个项目，你做的第一件事是什么？假设下面的每个步骤将花费大约相等的时间（大约几天）。

【 】 花几天时间去获取互联网的数据，这样你就能更好地了解哪些数据是可用的。
【 】 花几天的时间检查这些任务的人类表现，以便能够得到贝叶斯误差的准确估计。
【 】 花几天的时间使用汽车前置摄像头采集更多数据，以更好地了解每单位时间可收集多少数据。
【★】 花几天时间训练一个基本模型，看看它会犯什么错误。

As seen in the lecture multiple times , Machine Learning is a highly iterative process. We need to create, code, and experiment on a basic model, and then iterate in order to find out the model that works best for the given problem.。

正如在视频中多次看到的，机器学习是一个高度迭代的过程。我们需要在基本模型上创建、编码和实验，然后迭代以找出对给定问题最有效的模型。

您的目标是检测道路标志（停车标志、行人过路标志、前方施工标志）和交通信号（红灯和绿灯）的图片，目标是识别这些图片中的哪一个标志出现在每个图片中。 您计划在隐藏层中使用带有ReLU单位的深层神经网络。

对于输出层，使用Softmax激活将是输出层的一个比较好的选择，因为这是一个多任务学习问题，对吗？

【 】  True
【★】  False

Softmax  would have been a good choice if one and only one of the possibilities (stop sign, speed bump, pedestrian crossing, green light and red light) was present in each image. Since it is not the case , softmax activation cannot be used.

如果每个图片中只有一个可能性：停止标志、减速带、人行横道、红绿灯， 那么SoftMax将是一个很好的选择。由于不是这种情况，所以不能使用Softmax 激活函数。

你正在做误差分析并计算错误率，在这些数据集中，你认为你应该手动仔细地检查哪些图片（每张图片都做检查）？

【 】 随机选择10,000图片
【 】 随机选择500图片
【★】 500张算法分类错误的图片。
【 】 10,000张算法分类错误的图片。

It is of prime importance to look at those images on which the algorithm has made a mistake. Since it is not practical to look at every image the algorithm has made a mistake on , we need to randomly choose 500 such images and analyse the reason for such errors.

查看算法分类出错的那些图片是非常重要的，由于查看算法分类错误造成的每个图片都不太实际，所以我们需要随机选择500个这样的图片并分析出现这种错误的原因。

在处理了数据几周后，你的团队得到以下数据：

100,000 张使用汽车前摄像头拍摄的标记了的图片。
900,000 张从互联网下载的标记了道路的图片。

每张图片的标签都精确地表示任何的特定路标和交通信号的组合。 例如，y(i)${y}^{\left(i\right)}$$y^{(i)}$= ⎡⎣⎢⎢⎢⎢⎢⎢10010⎤⎦⎥⎥⎥⎥⎥⎥$\left[\begin{array}{c}1\\ 0\\ 0\\ 1\\ 0\end{array}\right]$$\begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 0 \end{bmatrix}$ 表示图片包含了停车标志和红色交通信号灯。
因为这是一个多任务学习问题，你需要让所有y(i)${y}^{\left(i\right)}$$y^{(i)}$向量被完全标记。 如果一个样本等于⎡⎣⎢⎢⎢⎢⎢⎢0?11?⎤⎦⎥⎥⎥⎥⎥⎥$\left[\begin{array}{c}0\\ ?\\ 1\\ 1\\ ?\end{array}\right]$$\begin{bmatrix} 0 \\ ? \\ 1 \\ 1 \\ ? \end{bmatrix}$，那么学习算法将无法使用该样本，是正确的吗？

【 】 正确
【★】 错误

In the lecture on multi-task learning, you have seen that you can compute the cost even if  some entries haven’t been labeled. The algorithm won’t be influenced by the fact that some entries in the data weren’t labeled.

在多任务学习的视频中，您已经看到，即使某些条目没有被标记，您也可以计算成本。该算法不会受到数据中某些条目未标记的样本的影响。

你所关心的数据的分布包含了你汽车的前置摄像头的图片，这与你在网上找到并下载的图片不同。如何将数据集分割为训练/开发/测试集?

【 】 将10万张前摄像头的图片与在网上找到的90万张图片随机混合，使得所有数据都随机分布。 将有100万张图片的数据集分割为：有60万张图片的训练集、有20万张图片的开发集和有20万张图片的测试集。
【 】 将10万张前摄像头的图片与在网上找到的90万张图片随机混合，使得所有数据都随机分布。将有100万张图片的数据集分割为：有98万张图片的训练集、有1万张图片的开发集和有1万张图片的测试集。
【★】  选择从互联网上的90万张图片和汽车前置摄像头的8万张图片作为训练集，剩余的2万张图片在开发集和测试集中平均分配。
【 】 选择从互联网上的90万张图片和汽车前置摄像头的2万张图片作为训练集，剩余的8万张图片在开发集和测试集中平均分配。

As seen in lecture, it is important to distribute your data in such a manner that your training and dev set have a distribution that resembles the “real life” data. Also , the test set should contain adeqate amount of  “real-life” data you actually care about.

正如在课堂上看到的那样，分配数据的方式非常重要，您的训练和开发集的分布类似于“现实生活”数据。此外，测试集应包含您实际关心的足够数量的“现实生活”数据。

假设您最终选择了以下拆分数据集的方式:

数据集
图片数量
算法产生的错误

训练集
随机抽取94万张图片（从90万张互联网图片 + 6万张汽车前摄像头拍摄的图片中抽取）
8.8%

训练-开发集
随机抽取2万张图片（从90万张互联网图片 + 6万张汽车前摄像头拍摄的图片中抽取）
9.1%

开发集
2万张汽车前摄像头拍摄的图片
14.3%

测试集
2万张汽车前摄像头拍摄的图片
14.8%

您还知道道路标志和交通信号分类的人为错误率大约为0.5％。以下哪项是真的（检查所有选项）?

【 】 由于开发集和测试集的错误率非常接近，所以你过拟合了开发集。
【★】 你有一个很大的数据不匹配问题，因为你的模型在训练-开发集上比在开发集上做得好得多。
【★】 你有一个很大的可避免偏差问题，因为你的训练集上的错误率比人为错误率高很多。
【 】你有很大的方差的问题，因为你的训练集上的错误率比人为错误率要高得多。
【 】 你有很大的方差的问题，因为你的模型不能很好地适应来自同一训练集上的分布的数据，即使是它从来没有见过的数据。
根据上一个问题的表格，一位朋友认为训练数据分布比开发/测试分布要容易得多。你怎么看？

【 】 你的朋友是对的。 （即训练数据分布的贝叶斯误差可能低于开发/测试分布）。
【 】 你的朋友错了。（即训练数据分布的贝叶斯误差可能比开发/测试分布更高）。
【★】 没有足够的信息来判断你的朋友是对还是错。
【 】 无论你的朋友是对还是错，这些信息都对你没有用。

To get an idea of this, we will have to measure human-level error separately on both distributions.The algorithm does better on the distribution data it is trained on. But we do not know for certain that it was because it was trained on that data or if it was really easier than the dev/test distribution.

为了了解这一点，我们必须在两个分布上分别测量人的水平误差，该算法对训练的分布数据有更好的效果。但我们不确定这是因为它被训练在数据上，或者它比开发/测试分布更容易。

博主注：博主未能理解其意思，有能力的读者可以看一下英文吧。
您决定将重点放在开发集上, 并手动检查是什么原因导致的错误。下面是一个表, 总结了您的发现:

开发集总误差
14.3%

由于数据标记不正确而导致的错误
4.1%

由于雾天的图片引起的错误
8.0%

由于雨滴落在汽车前摄像头上造成的错误
2.2%

其他原因引起的错误
1.0%

在这个表格中，4.1％、8.0％这些比例是总开发集的一小部分（不仅仅是您的算法错误标记的样本），即大约8.0 / 14.3 = 56％的错误是由于雾天的图片造成的。

从这个分析的结果意味着团队最先做的应该是把更多雾天的图片纳入训练集，以便解决该类别中的8%的错误，对吗？

【★】 错误，因为这取决于添加这些数据的容易程度以及您要考虑团队认为它会有多大帮助。
【 】 是的，因为它是错误率最大的类别。正如视频中所讨论的，我们应该对错误率进行按大小排序，以避免浪费团队的时间。
【 】 是的，因为它比其他的错误类别错误率加在一起都大(8.0 > 4.1+2.2+1.0)。
【 】 错误，因为数据增强(通过清晰的图像+雾的效果合成雾天的图像)更有效。
你可以买一个专门设计的雨刮，帮助擦掉正面相机上的一些雨滴。 根据上一个问题的表格，您同意以下哪些陈述？

【★】 对于挡风玻璃雨刷可以改善模型的性能而言，2.2％是改善的最大值。
【 】对于挡风玻璃雨刷可以改善模型的性能而言，2.2％是改善最小值。
【 】 对于挡风玻璃雨刷可以改善模型的性能而言，改善的性能就是2.2％。
【 】 在最坏的情况下，2.2%将是一个合理的估计，因为挡风玻璃刮水器会损坏模型的性能。

You will probably not improve performance by more than 2.2% by solving the raindrops problem. If your dataset was infinitely big, 2.2% would be a perfect estimate of the improvement you can achieve by purchasing a specially designed windshield wiper that removes the raindrops.

一般而言，解决了雨滴的问题你的错误率可能不会完全降低2.2%，如果你的数据集是无限大的, 改善2.2% 将是一个理想的估计, 买一个雨刮是应该可以改善性能的。

您决定使用数据增强来解决雾天的图像，您可以在互联网上找到1,000张雾的照片，然后拿清晰的图片和雾来合成雾天图片，如下所示：

你同意下列哪种说法？（检查所有选项）

【 】 只要你把它与一个更大（远大于1000）的清晰/不模糊的图像结合在一起，那么对雾的1000幅图片就没有太大的过拟合的风险。
【 】 将合成的看起来像真正的雾天图片添加到从你的汽车前摄像头拍摄到的图片的数据集对与改进模型不会有任何帮助，因为它会引入可避免的偏差。
【★】 只要合成的雾对人眼来说是真实的，你就可以确信合成的数据和真实的雾天图像差不多，因为人类的视觉对于你正在解决的问题是非常准确的。

If the synthesized images look realistic, then the model will just see them as if you had added useful data to identify road signs and traffic signals in a foggy weather.

如果合成的图像看起来逼真, 就好像您在有雾的天气中添加了有用的数据来识别道路标志和交通信号一样。

在进一步处理问题之后，您已决定更正开发集上错误标记的数据。 您同意以下哪些陈述？ （检查所有选项）。

【★】 您不应更正训练集中的错误标记的数据, 以免现在的训练集与开发集更不同。

Deep learning algorithms are quite robust to having slightly different train and dev distributions.

深度学习算法对于略有不同的训练集和开发集分布是相当强大的。（博主注：意思是小改动会造成大差异）
【 】 您应该更正训练集中的错误标记数据, 以免您现在的训练集与开发集更不同。
【 】 您不应该更正测试集中错误标记的数据，以便开发和测试集来自同一分布。
【★】 您还应该更正测试集中错误标记的数据，以便开发和测试集来自同一分布。

Because you want to make sure that your dev and test data come from the same distribution for your algorithm to make your team’s iterative development process is efficient.

因为你想确保你的开发和测试数据来自相同的分布，以使你的团队的迭代开发过程高效。

到目前为止，您的算法仅能识别红色和绿色交通灯，该公司的一位同事开始着手识别黄色交通灯（一些国家称之为橙色光而不是黄色光，我们将使用美国的黄色标准），含有黄色灯的图像非常罕见，而且她没有足够的数据来建立一个好的模型，她希望你能用转移学习帮助她。

你告诉你的同事怎么做？

【★】 她应该尝试使用在你的数据集上预先训练过的权重，并用黄光数据集进行进一步的微调。
【 】 如果她有10,000个黄光图像，从您的数据集中随机抽取10,000张图像，并将您和她的数据放在一起，这可以防止您的数据集“淹没”她的黄灯数据集。
【 】 你没办法帮助她，因为你的数据分布与她的不同，而且缺乏黄灯标签的数据。
【 】 建议她尝试多任务学习，而不是使用所有数据进行迁移学习。

You have trained your model on a huge dataset, and she has a small dataset. Although your labels are different, the parameters of your model have been trained to recognize many characteristics of road and traffic images which will be useful for her problem. This is a perfect case for transfer learning, she can start with a model with the same architecture as yours, change what is after the last hidden layer and initialize it with your trained parameters.

你已经在一个庞大的数据集上训练了你的模型，并且她有一个小数据集。 尽管您的标签不同，但您的模型参数已经过训练，可以识别道路和交通图像的许多特征，这些特征对于她的问题很有用。 这对于转移学习来说是一个完美的例子，她可以从一个与您的架构相同的模型开始，改变最后一个隐藏层之后的内容，并使用您的训练参数对其进行初始化。

另一位同事想要使用放置在车外的麦克风来更好地听清你周围是否有其他车辆。 例如，如果你身后有警车，你就可以听到警笛声。 但是，他们没有太多的训练这个音频系统，你能帮忙吗？

【 】 从视觉数据集迁移学习可以帮助您的同事加快步伐，多任务学习似乎不太有希望。
【 】 从您的视觉数据集中进行多任务学习可以帮助您的同事加快步伐，迁移学习似乎不太有希望。
【 】 迁移学习或多任务学习可以帮助我们的同事加快步伐。
【★】 迁移学习和多任务学习都不是很有希望。

The problem he is trying to solve is quite different from yours. The different dataset structures make it probably impossible to use transfer learning or multi-task learning.

他试图解决的问题与你的问题完全不同，不同的数据集结构可能无法使用迁移学习或多任务学习。

要识别红色和绿色的灯光，你一直在使用这种方法：

A：将图像x$x$$x$输入到神经网络，并直接学习映射以预测是否存在红光(和/或)绿光y$y$$y$。

一个队友提出了另一种两步走的方法：

B：在这个两步法中，您首先要检测图像中的交通灯（如果有），然后确定交通信号灯中照明灯的颜色。

在这两者之间，方法B更多的是端到端的方法，因为它在输入端和输出端有不同的步骤，这种说法正确吗？

【 】 正确
【★】 错误

(A) is an end-to-end approach as it maps directly the input (x) to the output (y).

A是一种端到端的方法，因为它直接将输入（x）映射到输出（y）。

Approach A (in the question above) tends to be more promising than approach B if you have a  (fill in the blank).如果你有一个 ,在上面的问题中方法A往往比B方法更有效，

【★】 大训练集
【 】 多任务学习的问题。
【 】 偏差比较大的问题。
【 】 高贝叶斯误差的问题。

In many fields, it has been observed that end-to-end learning works better in practice, but requires a large amount of data. Without a larger amout of data , the application of End-To-End Deep Learning is futile.

在许多领域，据观察，端到端学习在实践中效果更好，但需要大量数据。 如果没有大量的数据，端到端深度学习的应用是效果比较差的。

Autonomous driving (case study)

1

To help you practice strategies for machine learning, in this week we’ll present another scenario and ask how you would act. We think this “simulator” of working in a machine learning project will give a task of what leading a machine learning project could be like!

You are employed by a startup building self-driving cars. You are in charge of detecting road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. As an example, the above image contains a pedestrian crossing sign and red traffic lights

Your 100,000 labeled images are taken using the front-facing camera of your car. This is also the distribution of data you care most about doing well on. You think you might be able to get a much larger dataset off the internet, that could be helpful for training even if the distribution of internet data is not the same.

You are just getting started on this project. What is the first thing you do? Assume each of the steps below would take about an equal amount of time (a few days).

Spend a few days training a basic model and see what mistakes it makes.

Spend a few days checking what is human-level performance for these tasks so that you can get an accurate estimate of Bayes error.

Spend a few days getting the internet data, so that you understand better what data is available.

Spend a few days collecting more data using the front-facing camera of your car, to better understand how much data per unit time you can collect.

As discussed in lecture, applied ML is a highly iterative process. If you train a basic model and carry out error analysis (see what mistakes it makes) it will help point you in more promising directions.

2

Your goal is to detect road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. You plan to use a deep neural network with ReLU units in the hidden layers.

For the output layer, a softmax activation would be a good choice for the output layer because this is a multi-task learning problem. True/False?

True

False

Softmax would be a good choice if one and only one of the possibilities (stop sign, speed bump, pedestrian crossing, green light and red light) was present in each image.

3

You are carrying out error analysis and counting up what errors the algorithm makes. Which of these datasets do you think you should manually go through and carefully examine, one image at a time?

10,000 randomly chosen images

500 images on which the algorithm made a mistake

10,000 images on which the algorithm made a mistake

500 randomly chosen images

Focus on images that the algorithm got wrong. Also, 500 is enough to give you a good initial sense of the error statistics. There’s probably no need to look at 10,000, which will take a long time.

4

After working on the data for several weeks, your team ends up with the following data:

100,000 labeled images taken using the front-facing camera of your car.
Each image’s labels precisely indicate the presence of any specific road signs and traffic signals or combinations of them. For example,⎡⎣⎢⎢⎢⎢⎢⎢10010⎤⎦⎥⎥⎥⎥⎥⎥$\left[\begin{array}{c}1\\ 0\\ 0\\ 1\\ 0\end{array}\right]$$\begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 0 \end{bmatrix}$ means the image contains a stop sign and a red traffic light.
Because this is a multi-task learning problem, you need to have all your y(i) vectors fully labeled. If one example is equal to ⎡⎣⎢⎢⎢⎢⎢⎢0?11?⎤⎦⎥⎥⎥⎥⎥⎥$\left[\begin{array}{c}0\\ ?\\ 1\\ 1\\ ?\end{array}\right]$$\begin{bmatrix} 0 \\ ? \\ 1 \\ 1 \\ ? \end{bmatrix}$ then the learning algorithm will not be able to use that example. True/False?

True

False

As seen in the lecture on multi-task learning, you can compute the cost such that it is not influenced by the fact that some entries haven’t been labeled.

5

The distribution of data you care about contains images from your car’s front-facing camera; which comes from a different distribution than the images you were able to find and download off the internet. How should you split the dataset into train/dev/test sets?

Choose the training set to be the 900,000 images from the internet along with 80,000 images from your car’s front-facing camera. The 20,000 remaining images will be split equally in dev and test sets.

Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 600,000 for the training set, 200,000 for the dev set and 200,000 for the test set.

Choose the training set to be the 900,000 images from the internet along with 20,000 images from your car’s front-facing camera. The 80,000 remaining images will be split equally in dev and test sets.

Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 980,000 for the training set, 10,000 for the dev set and 10,000 for the test set.

As seen in lecture, it is important that your dev and test set have the closest possible distribution to “real”-data. It is also important for the training set to contain enough “real”-data to avoid having a data-mismatch problem.

6

Assume you’ve finally chosen the following split between of the data:

Dataset:
Contains:
Error of the algorithm:

Training
940,000 images randomly picked from (900,000 internet images + 60,000 car’s front-facing camera images)
8.8%

Training-Dev
20,000 images randomly picked from (900,000 internet images + 60,000 car’s front-facing camera images)
9.1%

Dev
20,000 images from your car’s front-facing camera
14.3%

Test
20,000 images from the car’s front-facing camera
14.8%

You also know that human-level error on the road sign and traffic signals classification task is around 0.5%. Which of the following are True? (Check all that apply).

You have a large variance problem because your model is not generalizing well to data from the same training distribution but that it has never seen before.

You have a large variance problem because your training error is quite higher than the human-level error.

You have a large data-mismatch problem because your model does a lot better on the training-dev set than on the dev set

You have a large avoidable-bias problem because your training error is quite a bit higher than the human-level error.

Your algorithm overfits the dev set because the error of the dev and test sets are very close.

7

Based on table from the previous question, a friend thinks that the training data distribution is much easier than the dev/test distribution. What do you think?

Your friend is right. (I.e., Bayes error for the training data distribution is probably lower than for the dev/test distribution.)

Your friend is wrong. (I.e., Bayes error for the training data distribution is probably higher than for the dev/test distribution.)

There’s insufficient information to tell if your friend is right or wrong.

The algorithm does better on the distribution of data it trained on. But you don’t know if it’s because it trained on that no distribution or if it really is easier. To get a better sense, measure human-level error separately on both distributions.

8

You decide to focus on the dev set and check by hand what are the errors due to. Here is a table summarizing your discoveries:

Overall dev set error
14.3%

Errors due to incorrectly labeled data
4.1%

Errors due to foggy pictures
8.0%

Errors due to rain drops stuck on your car’s front-facing camera
2.2%

Errors due to other causes
1.0%

in this table, 4.1%, 8.0%, etc.are a fraction of the total dev set (not just examples your algorithm mislabeled). I.e. about 8.0/14.3 = 56% of your errors are due to foggy pictures.

The results from this analysis implies that the team’s highest priority should be to bring more foggy pictures into the training set so as to address the 8.0% of errors in that category. True/False?

True because it is the largest category of errors. As discussed in lecture, we should prioritize the largest category of error to avoid wasting the team’s time.

True because it is greater than the other error categories added together (8.0 > 4.1+2.2+1.0).

False because this would depend on how easy it is to add this data and how much you think your team thinks it’ll help.

False because data augmentation (synthesizing foggy images by clean/non-foggy images) is more efficient.

9

You can buy a specially designed windshield wiper that help wipe off some of the raindrops on the front-facing camera. Based on the table from the previous question, which of the following statements do you agree with?

2.2% would be a reasonable estimate of the maximum amount this windshield wiper could improve performance.

2.2% would be a reasonable estimate of the minimum amount this windshield wiper could improve performance.

2.2% would be a reasonable estimate of how much this windshield wiper will improve performance.

2.2% would be a reasonable estimate of how much this windshield wiper could worsen performance in the worst case.

Yes. You will probably not improve performance by more than 2.2% by solving the raindrops problem. If your dataset was infinitely big, 2.2% would be a perfect estimate of the improvement you can achieve by purchasing a specially designed windshield wiper that removes the raindrops.

10

You decide to use data augmentation to address foggy images. You find 1,000 pictures of fog off the internet, and “add” them to clean images to synthesize foggy days, like this:

Which of the following statements do you agree with?

So long as the synthesized fog looks realistic to the human eye, you can be confident that the synthesized data is accurately capturing the distribution of real foggy images (or a subset of it), since human vision is very accurate for the problem you’re solving.

Adding synthesized images that look like real foggy pictures taken from the front-facing camera of your car to training dataset won’t help the model improve because it will introduce avoidable-bias.

There is little risk of overfitting to the 1,000 pictures of fog so long as you are combing it with a much larger (>>1,000) of clean/non-foggy images.

Yes. If the synthesized images look realistic, then the model will just see them as if you had added useful data to identify road signs and traffic signals in a foggy weather. I will very likely help.

11

After working further on the problem, you’ve decided to correct the incorrectly labeled data on the dev set. Which of these statements do you agree with? (Check all that apply).

You should also correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution

You should correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set.

You should not correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution

You should not correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set.

12

So far your algorithm only recognizes red and green traffic lights. One of your colleagues in the startup is starting to work on recognizing a yellow traffic light. (Some countries call it an orange light rather than a yellow light; we’ll use the US convention of calling it yellow.) Images containing yellow lights are quite rare, and she doesn’t have enough data to build a good model. She hopes you can help her out using transfer learning.

What do you tell your colleague?

She should try using weights pre-trained on your dataset, and fine-tuning further with the yellow-light dataset.

If she has (say) 10,000 images of yellow lights, randomly sample 10,000 images from your dataset and put your and her data together. This prevents your dataset from “swamping” the yellow lights dataset.

You cannot help her because the distribution of data you have is different from hers, and is also lacking the yellow label.

Recommend that she try multi-task learning instead of transfer learning using all the data.

Yes. You have trained your model on a huge dataset, and she has a small dataset. Although your labels are different, the parameters of your model have been trained to recognize many characteristics of road and traffic images which will be useful for her problem. This is a perfect case for transfer learning, she can start with a model with the same architecture as yours, change what is after the last hidden layer and initialize it with your trained parameters.

13

Another colleague wants to use microphones placed outside the car to better hear if there’re other vehicles around you. For example, if there is a police vehicle behind you, you would be able to hear their siren. However, they don’t have much to train this audio system. How can you help?

Transfer learning from your vision dataset could help your colleague get going faster. Multi-task learning seems significantly less promising.

Multi-task learning from your vision dataset could help your colleague get going faster. Transfer learning seems significantly less promising.

Either transfer learning or multi-task learning could help our colleague get going faster.

Neither transfer learning nor multi-task learning seems promising.

Yes. The problem he is trying to solve is quite different from yours. The different dataset structures make it probably impossible to use transfer learning or multi-task learning.

14

To recognize red and green lights, you have been using this approach:

(A) Input an image (x) to a neural network and have it directly learn a mapping to make a prediction as to whether there’s a red light and/or green light (y).
A teammate proposes a different, two-step approach:

(B) In this two-step approach, you would first (i) detect the traffic light in the image (if any), then (ii) determine the color of the illuminated lamp in the traffic light.
Between these two, Approach B is more of an end-to-end approach because it has distinct steps for the input end and the output end. True/False?

True

False

Yes. (A) is an end-to-end approach as it maps directly the input (x) to the output (y).

15

Approach A (in the question above) tends to be more promising than approach B if you have a __ (fill in the blank).

Large training set

Large bias problem.

Problem with a high Bayes error.

Yes. In many fields, it has been observed that end-to-end learning works better in practice, but requires a large amount of data.
展开全文
• 【中英】【吴恩达课后测验】Course 3 - 结构化机器学习项目 - 第一周测验 上一篇：【课程2 - 第三周编程作业】※※※※※ 【回到目录】※※※※※下一篇：【课程3 - 第二周测验】 第一周测验 - 和平之城中...


【中英】【吴恩达课后测验】Course 3 - 结构化机器学习项目 - 第一周测验

上一篇：【课程2 - 第三周编程作业】※※※※※ 【回到目录】※※※※※下一篇：【课程3 - 第二周测验】

第一周测验 - 和平之城中的鸟类识别(案例研究)

1. 问题 1

问题陈述

这个例子来源于实际项目，但是为了保护机密性，我们会对细节进行保护。

现在你是和平之城的著名研究员，和平之城的人有一个共同的特点：他们害怕鸟类。为了保护他们，你必须设计一个算法，以检测飞越和平之城的任何鸟类，同时警告人们有鸟类飞过。市议会为你提供了10,000,000张图片的数据集，这些都是从城市的安全摄像头拍摄到的。它们被命名为:

y = 0: 图片中没有鸟类
y = 1: 图片中有鸟类

你的目标是设计一个算法，能够对和平之城安全摄像头拍摄的新图像进行分类。

有很多决定要做：

评估指标是什么？
你如何将你的数据分割为训练/开发/测试集?

成功的指标

市议会告诉你，他们想要一个算法：

拥有较高的准确度
快速运行，只需要很短的时间来分类一个新的图像。
可以适应小内存的设备，这样它就可以运行在一个小的处理器上，它将用于城市的安全摄像头上。

请注意: 有三个评估指标使您很难在两种不同的算法之间进行快速选择，并且会降低您的团队迭代的速度，是真的吗？

【★】正确
【 】错误

2. 问题 2

经过进一步讨论，市议会缩小了它的标准：

“我们需要一种算法，可以让我们尽可能精确的知道一只鸟正飞过和平之城。”
“我们希望经过训练的模型对新图像进行分类不会超过10秒。”
“我们的模型要适应10MB的内存的设备.”

如果你有以下三个模型，你会选择哪一个？

【 】A

测试准确度
运行时间
内存大小

97%
1 sec
3MB

【 】B

测试准确度
运行时间
内存大小

99%
13 sec
9MB

【 】C

测试准确度
运行时间
内存大小

97%
3 sec
2MB

【★】D

测试准确度
运行时间
内存大小

98%
9 sec
9MB

3. 问题 3

根据城市的要求，您认为以下哪一项是正确的？

【★】准确度是一个优化指标; 运行时间和内存大小是令人满意的指标。
【 】准确度是一个令人满意的指标; 运行时间和内存大小是一个优化指标。
【 】准确性、运行时间和内存大小都是优化指标，因为您希望在所有这三方面都做得很好。
【 】准确性、运行时间和内存大小都是令人满意的指标，因为您必须在三项方面做得足够好才能使系统可以被接受。

4. 问题 4

结构化你的数据

在实现你的算法之前，你需要将你的数据分割成训练/开发/测试集，你认为哪一个是最好的选择？

【 】A

训练集
开发集
测试集

3,333,334
3,333,333
3,333,333

【 】B

训练集
开发集
测试集

6,000,000
3,000,000
1,000,000

【★】C

训练集
开发集
测试集

9,500,000
250,000
250,000

【 】D

训练集
开发集
测试集

6,000,000
1,000,000
3,000,000

5. 问题 5

在设置了训练/开发/测试集之后，市议会再次给你了1,000,000张图片，称为“公民数据”。 显然，和平之城的公民非常害怕鸟类，他们自愿为天空拍照并贴上标签，从而为这些额外的1,000,000张图像贡献力量。 这些图像与市议会最初给您的图像分布不同，但您认为它可以帮助您的算法。

你不应该将公民数据添加到训练集中，因为这会导致训练/开发/测试集分布变得不同，从而损害开发集和测试集性能，是真的吗？

【 】True
【★】False

6. 问题 6

市议会的一名成员对机器学习知之甚少，他认为应该将1,000,000个公民的数据图像添加到测试集中，你反对的原因是：

【★】这会导致开发集和测试集分布变得不同。这是一个很糟糕的主意，因为这会达不到你想要的效果。
【 】公民的数据图像与其他数据没有一致的x- >y映射(类似于纽约/底特律的住房价格例子)。
【 】一个更大的测试集将减慢迭代速度，因为测试集上评估模型会有计算开销。
【★】测试集不再反映您最关心的数据(安全摄像头)的分布。（博主注：训练集是摄像头拍的，用他人拍的数据去测试摄像头拍的，势必会导致准确度下降，要添加也应该添加到整个数据集中，保证同一分布。）

7. 问题 7

你训练了一个系统，其误差度如下（误差度 = 100％ - 准确度）：

训练集误差
4.0%

开发集误差
4.5%

这表明，提高性能的一个很好的途径是训练一个更大的网络，以降低4%的训练误差。你同意吗？

【 】是的，因为有4%的训练误差表明你有很高的偏差。
【 】是的，因为这表明你的模型的偏差高于方差。
【 】不同意，因为方差高于偏差。
【★】不同意，因为没有足够的信息，这什么也说明不了。（博主注：想一下贝叶斯最优误差，我们至少还要一个人们对图片的识别误差值，请看下面的题。）

8. 问题 8

你让一些人对数据集进行标记，以便找出人们对它的识别度。你发现了准确度如下：

鸟类专家1
错误率：0.3%

鸟类专家2
错误率：0.5%

普通人1 (不是专家)
错误率：1.0%

普通人2 (不是专家)
错误率：1.2%

如果您的目标是将“人类表现”作为贝叶斯错误的基准线（或估计），那么您如何定义“人类表现”？

【 】0.0% (因为不可能做得比这更好)
【★】0.3% (专家1的错误率)
【 】0.4% (0.3 到 0.5 之间)
【 】0.75% (以上所有四个数字的平均值)

9. 问题 9

您同意以下哪项陈述？

【★】学习算法的性能可以优于人类表现，但它永远不会优于贝叶斯错误的基准线。
【 】学习算法的性能不可能优于人类表现，但它可以优于贝叶斯错误的基准线。
【 】学习算法的性能不可能优于人类表现，也不可能优于贝叶斯错误的基准线。
【 】学习算法的性能可以优于人类表现，也可以优于贝叶斯错误的基准线。

10. 问题 10

你发现一组鸟类学家辩论和讨论图像得到一个更好的0.1%的性能，所以你将其定义为“人类表现”。在对算法进行深入研究之后，最终得出以下结论：

人类表现
0.1%

训练集误差
2.0%

开发集误差
2.1%

根据你的资料，以下四个选项中哪两个尝试起来是最有希望的？（两个选项。）

【 】尝试增加正则化。
【 】获得更大的训练集以减少差异。
【★】尝试减少正则化。
【★】训练一个更大的模型，试图在训练集上做得更好。

11. 问题 11

你在测试集上评估你的模型，并找到以下内容：

人类表现
0.1%

训练集误差
2.0%

开发集误差
2.1%

测试集误差
7.0%

这意味着什么？（两个最佳选项。）

【 】你没有拟合开发集
【★】你应该尝试获得更大的开发集。
【 】你应该得到一个更大的测试集。
【★】你对开发集过拟合了。

12. 问题 12

在一年后，你完成了这个项目，你终于实现了：

人类表现
0.10%

训练集误差
0.05%

开发集误差
0.05%

你能得出什么结论？ （检查所有选项。）

【★】现在很难衡量可避免偏差，因此今后的进展将会放缓。
【 】统计异常(统计噪声的结果)，因为它不可能超过人类表现。
【 】只有0.09％的进步空间，你应该很快就能够将剩余的差距缩小到0％
【★】如果测试集足够大，使得这0.05%的误差估计是准确的，这意味着贝叶斯误差是小于等于0.05的。

13. 问题 13

事实证明，和平之城也雇佣了你的竞争对手来设计一个系统。您的系统和竞争对手都被提供了相同的运行时间和内存大小的系统，您的系统有更高的准确性。然而，当你和你的竞争对手的系统进行测试时，和平之城实际上更喜欢竞争对手的系统，因为即使你的整体准确率更高，你也会有更多的假阴性结果(当鸟在空中时没有发出警报)。你该怎么办？

【 】查看开发过程中开发的所有模型，找出错误率最低的模型。
【 】要求你的团队在开发过程中同时考虑准确性和假阴性率。
【★】重新思考此任务的指标，并要求您的团队调整到新指标。
【 】选择假阴性率作为新指标，并使用这个新指标来进一步发展。

14. 问题 14

你轻易击败了你的竞争对手，你的系统现在被部署在和平之城中，并且保护公民免受鸟类攻击！ 但在过去几个月中，一种新的鸟类已经慢慢迁移到该地区，因此你的系统的性能会逐渐下降，因为您的系统正在测试一种新类型的数据。（博主注：以系统未训练过的鸟类图片来测试系统的性能）

你只有1000张新鸟类的图像，在未来的3个月里，城市希望你能更新为更好的系统。你应该先做哪一个？

【★】使用所拥有的数据来定义新的评估指标（使用新的开发/测试集），同时考虑到新物种，并以此来推动团队的进一步发展。
【 】把1000张图片放进训练集，以便让系统更好地对这些鸟类进行训练。
【 】尝试数据增强/数据合成，以获得更多的新鸟的图像。
【 】将1,000幅图像添加到您的数据集中，并重新组合成一个新的训练/开发/测试集

15. 问题 15

市议会认为在城市里养更多的猫会有助于吓跑鸟类，他们对你在鸟类探测器上的工作感到非常满意，他们也雇佣你来设计一个猫探测器。（哇~猫探测器是非常有用的，不是吗？）由于有多年的猫探测器的工作经验，你有一个巨大的数据集，你有100,000,000猫的图像，训练这个数据需要大约两个星期。你同意哪些说法？（检查所有选项。）

【★】需要两周的时间来训练将会限制你迭代的速度。
【★】购买速度更快的计算机可以加速团队的迭代速度，从而提高团队的生产力。
【★】如果100,000,000个样本就足以建立一个足够好的猫探测器，你最好用100,000,00个样本训练，从而使您可以快速运行实验的速度提高约10倍，即使每个模型表现差一点因为它的训练数据较少。
【 】建立了一个效果比较好的鸟类检测器后，您应该能够采用相同的模型和超参数，并将其应用于猫数据集，因此无需迭代。

Bird recognition in the city of Peacetopia (case study)

1. Question 1

Problem Statement

This example is adapted from a real production application, but with details disguised to protect confidentiality.

You are a famous researcher in the City of Peacetopia. The people of Peacetopia have a common characteristic: they are afraid of birds. To save them, you have to build an algorithm that will detect any bird flying over Peacetopia and alert the population.

The City Council gives you a dataset of 10,000,000 images of the sky above Peacetopia, taken from the city’s security cameras. They are labelled:

y = 0: There is no bird on the image
y = 1: There is a bird on the image

Your goal is to build an algorithm able to classify new images taken by security cameras from Peacetopia.

There are a lot of decisions to make:

What is the evaluation metric?
How do you structure your data into train/dev/test sets?

Metric of success

The City Council tells you the following that they want an algorithm that

Has high accuracy
Runs quickly and takes only a short time to classify a new image.
Can fit in a small amount of memory, so that it can run in a small processor that the city will attach to many different security cameras.

Note: Having three evaluation metrics makes it harder for you to quickly choose between two different algorithms, and will slow down the speed with which your team can iterate. True/False?

[x] True
[ ] False

2. Question 2

After further discussions, the city narrows down its criteria to:

“We need an algorithm that can let us know a bird is flying over Peacetopia as accurately as possible.”
“We want the trained model to take no more than 10sec to classify a new image.”
“We want the model to fit in 10MB of memory.”

If you had the three following models, which one would you choose?

[ ] A

Test Accuracy
Runtime
Memory size

97%
1 sec
3MB

- [ ] B

Test Accuracy
Runtime
Memory size

99%
13 sec
9MB

- [ ] C

Test Accuracy
Runtime
Memory size

97%
3 sec
2MB

- [x] D

Test Accuracy
Runtime
Memory size

98%
9 sec
9MB

3. Question 3

Based on the city’s requests, which of the following would you say is true?

[x] Accuracy is an optimizing metric; running time and memory size are a satisficing metrics.
[ ] Accuracy is a satisficing metric; running time and memory size are an optimizing metric.
[ ] Accuracy, running time and memory size are all optimizing metrics because you want to do well on all three.
[ ] Accuracy, running time and memory size are all satisficing metrics because you have to do sufficiently well on all three for your system to be acceptable.

4. Question 4

Before implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice?

[ ] A

Train
Dev
Test

3,333,334
3,333,333
3,333,333

- [ ] B

Train
Dev
Test

6,000,000
3,000,000
1,000,000

- [x] C

Train
Dev
Test

9,500,000
250,000
250,000

- [ ] D

Train
Dev
Test

6,000,000
1,000,000
3,000,000

5. Question 5

After setting up your train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm.

You should not add the citizens’ data to the training set, because this will cause the training and dev/test set distributions to become different, thus hurting dev and test set performance. True/False?

[ ] True
[x] False

6. Question 6

One member of the City Council knows a little about machine learning, and thinks you should add the 1,000,000 citizens’ data images to the test set. You object because:

[x] This would cause the dev and test set distributions to become different. This is a bad idea because you’re not aiming where you want to hit.
[ ] The 1,000,000 citizens’ data images do not have a consistent x–>y mapping as the rest of the data (similar to the New York City/Detroit housing prices example from lecture).
[ ] A bigger test set will slow down the speed of iterating because of the computational expense of evaluating models on the test set.
[x] The test set no longer reflects the distribution of data (security cameras) you most care about.

7. Question 7

You train a system, and its errors are as follows (error = 100%-Accuracy):

Training set error
4.0%

Dev set error
4.5%

This suggests that one good avenue for improving performance is to train a bigger network so as to drive down the 4.0% training error. Do you agree?

[ ] Yes, because having 4.0% training error shows you have high bias.
[ ] Yes, because this shows your bias is higher than your variance.
[ ] No, because this shows your variance is higher than your bias.
[x] No, because there is insufficient information to tell.

8. Question 8

You ask a few people to label the dataset so as to find out what is human-level performance. You find the following levels of accuracy:

Bird watching expert #1
0.3% error

Bird watching expert #2
0.5% error

Normal person #1 (not a bird watching expert)
1.0% error

Normal person #2 (not a bird watching expert)
1.2% error

If your goal is to have “human-level performance” be a proxy (or estimate) for Bayes error, how would you define “human-level performance”?

[ ] 0.0% (because it is impossible to do better than this)
[x] 0.3% (accuracy of expert #1)
[ ] 0.4% (average of 0.3 and 0.5)
[ ] 0.75% (average of all four numbers above)

9. Question 9

Which of the following statements do you agree with?

[x] A learning algorithm’s performance can be better than human-level performance but it can never be better than Bayes error.
[ ] A learning algorithm’s performance can never be better than human-level performance but it can be better than Bayes error.
[ ] A learning algorithm’s performance can never be better than human-level performance nor better than Bayes error.
[ ] A learning algorithm’s performance can be better than human-level performance and better than Bayes error.

10. Question 10

You find that a team of ornithologists debating and discussing an image gets an even better 0.1% performance, so you define that as “human-level performance.” After working further on your algorithm, you end up with the following:

Human-level performance
0.1%

Training set error
2.0%

Dev set error
2.1%

Based on the evidence you have, which two of the following four options seem the most promising to try? (Check two options.)

[ ] Try increasing regularization.
[ ] Get a bigger training set to reduce variance.
[x] Try decreasing regularization.
[x] Train a bigger model to try to do better on the training set.

11. Question 11

You also evaluate your model on the test set, and find the following:

Human-level performance
0.1%

Training set error
2.0%

Dev set error
2.1%

Test set error
7.0%

What does this mean? (Check the two best options.)

[ ] You have underfit to the dev set.
[x] You should try to get a bigger dev set.
[ ] You should get a bigger test set.
[x] You have overfit to the dev set.

12. Question 12

After working on this project for a year, you finally achieve:

Human-level performance
0.10%

Training set error
0.05%

Dev set error
0.05%

What can you conclude? (Check all that apply.)

[x] It is now harder to measure avoidable bias, thus progress will be slower going forward.
[ ] This is a statistical anomaly (or must be the result of statistical noise) since it should not be possible to surpass human-level performance.
[ ] With only 0.09% further progress to make, you should quickly be able to close the remaining gap to 0%
[x] If the test set is big enough for the 0.05% error estimate to be accurate, this implies Bayes error is ≤0.05

13. Question 13

It turns out Peacetopia has hired one of your competitors to build a system as well. Your system and your competitor both deliver systems with about the same running time and memory size. However, your system has higher accuracy! However, when Peacetopia tries out your and your competitor’s systems, they conclude they actually like your competitor’s system better, because even though you have higher overall accuracy, you have more false negatives (failing to raise an alarm when a bird is in the air). What should you do?

[ ] Look at all the models you’ve developed during the development process and find the one with the lowest false negative error rate.
[ ] Ask your team to take into account both accuracy and false negative rate during development.
[x] Rethink the appropriate metric for this task, and ask your team to tune to the new metric.
[ ] Pick false negative rate as the new metric, and use this new metric to drive all further development.

14. Question 14

You’ve handily beaten your competitor, and your system is now deployed in Peacetopia and is protecting the citizens from birds! But over the last few months, a new species of bird has been slowly migrating into the area, so the performance of your system slowly degrades because your data is being tested on a new type of data.

You have only 1,000 images of the new species of bird. The city expects a better system from you within the next 3 months. Which of these should you do first?

[x] Use the data you have to define a new evaluation metric (using a new dev/test set) taking into account the new species, and use that to drive further progress for your team.
[ ] Put the 1,000 images into the training set so as to try to do better on these birds.
[ ] Try data augmentation/data synthesis to get more images of the new type of bird.
[ ] Add the 1,000 images into your dataset and reshuffle into a new train/dev/test split.

15. Question 15

The City Council thinks that having more Cats in the city would help scare off birds. They are so happy with your work on the Bird detector that they also hire you to build a Cat detector. (Wow Cat detectors are just incredibly useful aren’t they.) Because of years of working on Cat detectors, you have such a huge dataset of 100,000,000 cat images that training on this data takes about two weeks. Which of the statements do you agree with? (Check all that agree.)

[x] Needing two weeks to train will limit the speed at which you can iterate.
[x] Buying faster computers could speed up your teams’ iteration speed and thus your team’s productivity.
[x] If 100,000,000 examples is enough to build a good enough Cat detector, you might be better of training with just 10,000,000 examples to gain a ≈10x improvement in how quickly you can run experiments, even if each model performs a bit worse because it’s trained on less data.
[ ] Having built a good Bird detector, you should be able to take the same model and hyperparameters and just apply it to the Cat dataset, so there is no need to iterate.

展开全文
• 本资源属于机器学习经典入门，Coursera 吴恩达 机器学习 斯坦福课程视频教程。资源包括全部课程的视频教程...包含视频资源的方方面面，助你在学习吴恩达入门视频一臂之力。现在福利来了，机不可失，祝大家学有进步！
• 测验：Application: Photo OCR 第一题 Suppose you are running a sliding window detector to find text in images. Your input images are 1000x1000 pixels. You will run your sliding windows detector at two ...
测验：Application: Photo OCR
第一题
Suppose you are running a sliding window detector to find
text in images. Your input images are 1000x1000 pixels. You
will run your sliding windows detector at two scales, 10x10
and 20x20 (i.e., you will run your classifier on lots of 10x10
patches to decide if they contain text or not; and also on
lots of 20x20 patches), and you will “step” your detector by 2
pixels each time. About how many times will you end up
running your classifier on a single 1000x1000 test set image?

答案
A
分析：每一次移动2个像素，故一轮循环需要移动将近500 * 500次，故两轮循环需要移动500，000次。
第二题
Suppose that you just joined a product team that has been
developing a machine learning application, using m = 1,000m=1,000
training examples. You discover that you have the option of
hiring additional personnel to help collect and label data.
You estimate that you would have to pay each of the labellers
\$10 per hour, and that each labeller can label 4 examples per
minute. About how much will it cost to hire labellers to
label 10,000 new training examples?

答案
D
第三题
What are the benefits of performing a ceiling analysis? Check all that apply.

答案
CD
第四题
Suppose you are building an object classifier, that takes as input an image, and recognizes that image as either containing a car (y=1y=1) or not (y=0y=0). For example, here are a positive example and a negative example:

After carefully analyzing the performance of your algorithm, you conclude that you need more positive (y=1y=1) training examples. Which of the following might be a good way to get additional positive examples?

答案
A
第五题
Suppose you have a PhotoOCR system, where you have the following pipeline:

答案
AB


展开全文
• 测验：Neural Networks: Representation 第一题 答案 AD 分析： B：错误，XOR需要三层。 C：输出结果并不是概率，不一定和为1。 第二题 答案 A 分析：画出真值表 x1 x2 真值 0 0 1 0 1 1 1 0 1 1 1 ...
代码：https://github.com/LiuZhe6/AndrewNGMachineLearning
文章目录测验：Neural Networks: Representation第一题第二题第三题第四题第五题编程作业：作业一：Regularized Logistic Regression作业二：One-vs-All Classifier Training作业三：One-vs-All Classifier Prediction作业四：Neural Network Prediction Function
测验：Neural Networks: Representation
第一题

答案
分析：
B：错误，XOR需要三层。
C：输出结果并不是概率，不一定和为1。
第二题

答案
A
分析：画出真值表

x1
x2
真值

0
0
1

0
1
1

1
0
1

1
1
0

第三题

答案
A
第四题

答案
A
分析：
x为3 x 1的列向量，theta1为3 x 3 的矩阵，故做乘法时为 theta1 * x，并且sigmoid（）不会改变乘法结果的大小，即a^(2)为3*1列向量，符合神经网络的定义。
第五题

答案
A
分析：交换了Layer2的两个节点，同时将theta1两行交换，并将theta2最后两列交换，相当于对应参数都没有发生变换，故值不变。

编程作业：
作业一：Regularized Logistic Regression
注意点：
注意CostFunction计算时，后部的theta平方里面theta(0)是不参与的，故在程序中theta(1)是不参与的。
lrCostFunction.m
% theta(0)不参与
J = 1 / m * ( -y' * log(sigmoid( X * theta )) - (1 - y)' * log(1 - sigmoid( X * theta ))) + lambda/(2*m)*(theta'*theta -theta(1)^2);

grad = 1 / m * (X' * (sigmoid(X*theta) - y));
temp = theta;
temp(1) = 0;
grad = grad + lambda/m * temp;

作业二：One-vs-All Classifier Training
oneVsAll.m
initial_theta = zeros(n + 1, 1);
options = optimset('GradObj', 'on', 'MaxIter', 50);

for c = 1:num_labels
all_theta(c,:) =  fmincg(@(t)(lrCostFunction(t,X,(y==c),lambda)),initial_theta,options);
end


作业三：One-vs-All Classifier Prediction
predictOneVsAll.m
% 罗辑回归预测结果
A = sigmoid( X * all_theta') ;
% 获得每一行最大值与下标
[x,index] = max (A,[],2);
% 将下标置给p，代表预测结果
p = index;


作业四：Neural Network Prediction Function
使用已经训练好的Theta1 和 Theta2，直接可以通过神经网络计算得到假设函数的结果，找到最大概率对应的下标，即为预测手写图片对应的数字。
% a1加一列，值全为1
a1 = [ones(m,1) X];

z2 = a1 * Theta1';

% 计算a2
a2 = sigmoid(z2);

% a2添加一列，值全为1
a2 = [ones(size(z2,1),1) a2];

z3 = a2 * Theta2';

% 计算a3
a3 = sigmoid(z3);

[value ,index] = max(a3, [] , 2);

p = index;




展开全文
• 测验：Neural Networks: Learning 第一题 答案 B 第二题 答案 A 第三题 答案 D 第四题 答案 AD 分析： A：使用梯度检验来检查反向传播是否正确，正确。 B：梯度检验要比反向传播计算损失函数的梯度慢的多，错误...
• 视频课件资料 ????...黄海广博士整理提供的资料 ...90题细品吴恩达机器学习》，感受被刷题支配的恐惧 吴恩达机器学习）专栏 吴恩达机器学习作业（原版无答案+有答案两个版本） 学习作业 其它同学的学习笔记等 M
• 第六周的习题做了三遍才100%正确，其中还是参照了不少论坛里大神的答案（比如BeiErGeLaiDe的博客，链接点击打开链接） 正式进入主题：ML第六周最后测验，共五题。文中大部分属于个人观点，如有错误欢迎指正、交流。...
• 本片文章内容： Coursera吴恩达机器学习课程，第九周的测验，题目及答案截图。
• 本片文章内容： Coursera吴恩达机器学习课程，第八周的测验，题目及答案截图。
• 吴恩达机器学习课程课后测验，几乎每次都有一道题会错，好烦，做个错题记录吧！
• coursera-斯坦福-机器学习-吴恩达-第4周笔记-神经网络coursera-斯坦福-机器学习-吴恩达-第4周笔记-神经网络 提出神经网络的动机 神经网络算法 1 神经元 2 神经网络 应用 1 例子1 and与or运算 2 例子2 3 多分类 复习...
• 主要总结题型 Q1：主要是T,P,E的概念选择 A computer program is said to learn from experience E with respect to some task T and ...PS：界定监督学习与非监督学习，看给的数据是否有想要的正确答案即可选择BC。
• Quiz Question 1 A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E...
• 本片文章内容： Coursera吴恩达机器学习课程，第二周的测验，题目及答案截图。
• 从上周开始学习Coursera吴恩达机器学习课程。 为了方便，一口气看到了第三周并略过了编程作业，突然发现，很多题目设置的真的...Coursera吴恩达机器学习课程，第二周的测验，题目及答案截图。        ...
• 本片文章内容： Coursera吴恩达机器学习课程，第五周的测验，题目及答案截图。
• 本片文章内容： Coursera吴恩达机器学习课程，第八周的测验，题目及答案截图。