Introduction To Deep Learning

What Is Deep Learning And How Can I Study It?

Go to the profile of Tyler Elliot Bettilyon
Tyler Elliot Bettilyon BlockedUnblockFollowFollowing Jun 13, 2018

I took a Deep Learning course through
The Bradfield School of Computer Science
in June. This series is a journal about what I learned in class, and what I've learned since.

This is the first article in this series, and is is about the recommended preparation for the Deep Learning course and what we learned in the first class.
Read the second article here
, and
the third here

Although normally the "prework" comes before the introduction, I'm going to give the 30,000 foot view of the fields of artificial intelligence, machine learning, and deep learning at the top. I have found that this context can really help us understand why the prerequisites seem so broad, and help us study just the essentials. Besides, the history and landscape of artificial intelligence is interesting, so lets dive in!

Artificial Intelligence, Machine Learning, and Deep Learning

Deep learning is a subset of machine learning. Machine learning is a subset of artificial intelligence. Said another way --- all deep learning algorithms are machine learning algorithms, but many machine learning algorithms do not use deep learning. As a Venn Diagram, it looks like this:
深度学习是机器学习的一个子集。机器学习是人工智能的一个子集。换句话说 - 所有深度学习算法都是机器学习算法,但许多机器学习算法不使用深度学习。作为维恩图,它看起来像这样:

Deep learning refers specifically to a class of algorithm called a
深度学习特指一类称为a的算法neural network, and technically only to "deep" neural networks (more on that in a second). This first neural network was invented in 1949, but back then they weren't very useful. In fact, from the 1970's to the 2010's traditional forms of AI would consistently outperform neural network based models.

These non-learning types of AI include rule based algorithms (imagine an extremely complex series of if/else blocks); heuristic based AIs such as
这些非学习类型的AI包括基于规则的算法(想象一系列非常复杂的if / else块);基于启发式的AIs,如A* search; constraint satisfaction algorithms like
;约束满足算法Arc Consistency; tree search algorithms
;树搜索算法such as minimax (used by the famous Deep Blue chess AI); and more.

There were two things preventing machine learning, and especially deep learning, from being successful. Lack of availability of large datasets and lack of availability of computational power. In 2018 we have exabytes of data, and anyone with an AWS account and a credit card has access to a distributed supercomputer. Because of the new availability of data and computing power, Machine learning --- and especially deep learning --- has taken the AI world by storm.
有两件事阻止机器学习,尤其是深度学习,从而成功。缺乏大型数据集的可用性和缺乏计算能力。在2018年,我们拥有数十亿的数据,拥有AWS账户和信用卡的任何人都可以访问分布式超级计算机。由于数据和计算能力的新可用性,机器学习 - 特别是深度学习 - 已经风靡AI世界。

You should know that there are other categories of machine learning such as
您应该知道还有其他类别的机器学习,例如unsupervised learning and
and reinforcement learning but for the rest of this article, I will be talking about a subset of machine learning called
但是对于本文的其余部分,我将讨论称为机器学习的子集supervised learning.

Supervised learning algorithms work by forcing the machine to repeatedly make predictions. Specifically, we ask it to make predictions about data that we (the humans) already know the correct answer for. This is called "labeled data" --- the label is whatever we want the machine to predict.

Here's an example: let's say we wanted to build an algorithm to predict if someone will default on their mortgage. We would need a bunch of examples of people who did and did not default on their mortgages. We will take the relevant data about these people; feed them into the machine learning algorithm; ask it to make a prediction about each person; and after it guesses we tell the machine what the right answer actually was. Based on how right or wrong it was the machine learning algorithm
这是一个例子:假设我们想建立一个算法来预测某人是否违约抵押贷款。我们需要一些例子,说明他们的抵押贷款是否已经违约。我们将采取有关这些人的相关数据;将它们输入机器学习算法;要求它对每个人做出预测;在猜测之后我们告诉机器实际上是什么答案。基于机器学习算法的正确与否 changes how it makes predictions .

We repeat this process
我们重复这个过程 many many times, and through the miracle of mathematics, our machine's predictions get better. The predictions get better relatively slowly though, which is why we need so much data to train these algorithms.

Machine learning algorithms such as
机器学习算法如linear regression,
, support vector machines, and
, and decision trees all "learn" in different ways, but fundamentally they all apply this same process: make a prediction, receive a correction, and adjust the prediction mechanism based on the correction. At a high level, it's quite similar to how a human learns.

Recall that deep learning is a subset of machine learning which focuses on a specific category of machine learning algorithms called neural networks. Neural networks were originally inspired by the way human brains work --- individual "neurons" receive "signals" from other neurons and in turn send "signals" to other "neurons". Each neuron transforms the incoming "signals" in some way, and eventually an output signal is produced. If everything went well that signal represents a correct prediction!
回想一下,深度学习是机器学习的一个子集,它专注于一种称为神经网络的特定类型的机器学习算法。神经网络最初的灵感来自人类大脑的工作方式 - 个体"神经元"从其他神经元接收"信号",然后将"信号"发送给其他"神经元"。每个神经元以某种方式转换输入的"信号",并最终产生输出信号。如果一切顺利,信号代表正确的预测!

This is a helpful mental model, but computers are not biological brains. They do not have neurons, or synapses, or any of the other biological mechanisms that make brains work. Because the biological model breaks down, researchers and scientists instead use graph theory to model neural networks --- instead of describing neural networks as "artificial brains", they describe them as complex graphs with powerful properties.
这是一个有用的心理模型,但计算机不是生物学的大脑。它们没有神经元或突触,或任何使大脑发挥作用的其他生物机制。由于生物模型被破坏,研究人员和科学家反而使用图论来模拟神经网络 - 而不是将神经网络描述为"人工大脑",他们将它们描述为具有强大属性的复杂图形。

Viewed through the lens of graph theory a neural network is a series of layers of connected nodes; each node represents a "neuron" and each connection represents a "synapse".

Different kinds of nets have different kinds of connections. The simplest form of deep learning is a deep neural network. A deep neural network is a graph with a series of fully connected layers. Every node in a particular layer has an edge to every node in the next layer; each of these edges is given a different weight. The whole series of layers is the "brain". It turns out, if the weights on all these edges are set
不同种类的网具有不同种类的连接。最简单的深度学习形式是深度神经网络。深度神经网络是具有一系列完全连接的层的图。特定层中的每个节点都具有到下一层中每个节点的边缘;这些边缘中的每一个都具有不同的重量。整个系列层是"大脑"。事实证明,如果设置了所有这些边缘的权重 just right these graphs can do some incredible "thinking".

Ultimately, the Deep Learning Course will be about how to construct different versions of these graphs; tune the connection weights until the system works; and try to make sure our machine does what we
最终,深度学习课程将是关于如何构建这些图形的不同版本;调整连接权重,直到系统工作;并尝试确保我们的机器做我们的 think it's doing. The mechanics that make Deep Learning work, such as gradient descent and backpropagation, combine a lot of ideas from different mathematical disciplines. In order to
它正在做。使深度学习工作的机制,例如梯度下降和反向传播,结合了来自不同数学学科的许多想法。为了 really understand neural networks we need some math background.

Background Knowledge --- A Little Bit Of Everything

Given how easy to use libraries like PyTorch and TensorFlow are, it's really tempting to say, "you don't need the math
考虑到像PyTorch和TensorFlow这样的库是多么容易使用,它真的很诱人,"你不需要数学 that much. " But after doing the required reading for the two classes, I'm glad I have some previous math experience. A subset of topics from linear algebra, calculus, probability, statistics, and graph theory have already come up.

Getting this knowledge at university would entail taking roughly 5 courses. Calculus 1, 2 and 3; linear algebra; and computer science 101. Luckily, you don't need each of those fields
在大学获得这些知识将需要大约5门课程。微积分1,2和3;线性代数;和计算机科学101.幸运的是,你不需要每个领域 in their entirety. Based on what I've seen so far, this is what I would recommend studying if you want to get into neural networks yourself:

From linear algebra, you need to know the
从线性代数中,你需要知道 dot product, matrix multiplication (especially the rules for multiplying matrices with different sizes), and transposes. You don't have to be able to do these things quickly by hand, but you should be comfortable enough to do small examples on a whiteboard or paper. You should also feel comfortable working with "multidimensional spaces" --- deep learning uses a lot of many dimensional vectors.
你不必手动快速完成这些事情,但你应该足够舒服,在白板或纸上做一些小例子。你应该对使用"多维空间"感到舒服 - 深度学习使用了很多维度向量。

I love
I love 3Blue1Brown's Essence of Linear Algebra for a refresher or an introduction into linear algebra. Additionally, compute a few dot products and matrix multiplications by hand (with small vector/matrix sizes). Although we use graph theory to model neural networks these graphs are represented in the computer by matrices and vectors for efficiency reasons. You should be comfortable both thinking about and programming with vectors and matrices.

From calculus you need to know the derivative, and you ideally should know it pretty well. Neural networks involve
从微积分你需要知道衍生物,理想情况下你应该很清楚它。神经网络涉及 simple derivatives, the chain rule, partial derivatives, and the gradient . The derivative is used by neural nets to solve
。神经网络使用导数来解决 optimization problems , so you should understand how the derivative can be used to find the "direction of greatest increase". A good intuition is probably enough, but if you
,所以你应该理解衍生物如何用来找到"最大增长的方向"。一个好的直觉可能就足够了,但如果你solve a couple simple optimization problems using the derivative, you'll be happy you did. 3Blue1Brown also has an
你会很开心的。 3Blue1Brown也有一个Essence of Calculus series, which is lovely as a more holistic review of calculus.

Gradient descent and backpropagation both make heavy use of derivatives to fine tune the networks during training. You don't have to know how to solve big complex derivatives with compounding chain and product rules, but having a feel for partial derivatives with simple equations helps a lot.

From probability and statistics, you should know about
从概率和统计数据来看,你应该知道common distributions, the idea of metrics,
,指标的想法,accuracy vs precision, and hypothesis testing. By far the most common applications of neural networks are to make predictions or judgements of some kind. Is this a picture of a dog? Will it rain tomorrow? Should I show Tyler
和假设检验。到目前为止,神经网络最常见的应用是做出某种预测或判断。这是一张狗的照片吗?明天会下雨吗?我应该向泰勒展示吗? this advertisement, or
advertisement, or that one? Statistics and probability will help us assess the accuracy and usefulness of these systems.

It's worth noting that the statistics appear more on the applied side; the graph theory, calculus, and linear algebra all appear on the implementation side. I think it's best to understand both, but if you're only going to be
值得注意的是,统计数据更多地出现在应用方面;图论,微积分和线性代数都出现在实现方面。我认为最好同时理解两者,但如果你只是这样 using a library like TensorFlow and are not interested in
像TensorFlow这样的图书馆并不感兴趣 implementing these algorithms yourself --- it might be wise to focus on the statistics more than the calculus & linear algebra.
这些算法本身 - 除了微积分和线性代数之外,关注统计数据可能更明智。

Finally, the graph theory. Honestly, if you can define the terms "vertex", "edge" and "edge weight" you've probably got enough graph theory under your belt. Even this "
最后,图论。老实说,如果你可以定义术语"顶点","边缘"和"边缘权重",你可能已经掌握了足够的图论。即使这样"Gentle Introduction" has more information than you need.

In the next article in this series I'll be examining Deep Neural Networks and how they are constructed. See you then!

Part 2: Deep Neural Networks as Computational Graphs

Part 3: Classifying MNIST Digits With Different Neural Network Architectures