• Transfer learning 顾名思义就是就是把已学训练好的模型参数迁移到新的模型来帮助新模型训练数据集。就跟其他知友回答的那样，考虑到大部分数据或任务是存在相关性的，所以通过transfer learning我们可以将已经学到...
Transfer learning 顾名思义就是就是把已学训练好的模型参数迁移到新的模型来帮助新模型训练数据集。就跟其他知友回答的那样，考虑到大部分数据或任务是存在相关性的，所以通过transfer learning我们可以将已经学到的parameter 分享给新模型从而加快并优化模型的学习不用像之前那样learn from zero.
比较标志性的例子是Deepmind的作品 progressive neural network. https://arxiv.org/pdf/1606.04671v3.pdf 文章里将三个小游戏Pong, Labyrinth, Atari 通过将已学其一的游戏的parameter 通过一个 lateral connection feed 到一个新游戏。外墙可以看youtube的演讲视频： https://www.youtube.com/watch?v=aWAP_CWEtSI 知乎的一个主页也写的比较清楚：最前沿：从虚拟到现实，迁移深度增强学习让机器人革命成为可能！ - 智能单元 - 知乎专栏

至于发展前景，我们可以这么想。
Deep learning 可能不一定是通往meta learning的唯一路径，但非常有可能是一个可行的方法。我和领域其他的朋友讨论这个问题后，得到以下这么几个general 方向分解了meta learning。
Dynamic learning. 标志作品是AlphaGo，或者也称reinforcement learning， 能通过非暴力的方法对一些非常复杂的且定义模糊的任务有着全局观的理解，以此来take action 最大化获得reward。这块我并特别熟悉所以不详细解释来误导大众了。相关文章：Human-level control through deep reinforcement learning : Nature : Nature Research ， http://www.nature.com/nature/journal/v529/n7587/full/nature16961.htmlTransfer learning / progressive (continual) learning. 就如刚才所说的，现在所有的deep learning模型都是learn from scratch 而并不像人类一样，可以很快速的上手一些类似的游戏，或者永远不会忘掉骑自行车那样的特征。所以一定程度的share parameter是非常必要的，不仅加速模型的训练还可以节省内存避免对已有类似的问题重复学习。One/zero-shot learning. 现在在vision领域里，基本上所有的recognition和classification task都需要大量的数据集。而事实上，人类并不是通过这样的方式去认识一个新事物。比如说，当人类看到一个恐龙的图片，之后给的恐龙多么古怪，毛发，颜色，特征都不一样，但是人类依然可以相当轻松的知道这是恐龙。或者说， 通过已学到的特征，我们通过文字描述，这是一只白色的毛茸茸的兔子，我们自己脑子里就可以大致想象出他的样子。所以在recognition和classification task里还有很大的提升空间。相关文章：http://vision.stanford.edu/documents/Fei-FeiFergusPerona2006.pdf, https://arxiv.org/pdf/1605.06065v1.pdfGenerative learning. 或者俗称举一反三。现在已有的作品：Variational autoencoder (VAE)，generative adversarial network (GAN) 通过将probabilistic graphical model与Bayesian stat和deep learning相结合，把所有数据看做一个概率分布，而deep learning是用来学习概率分布的参数，最后通过sample分布在得到一个类似数据集里的数据但并不完全相等的新数据。同样是DeepMind，最近发布的WaveNet就是通过generative model来学习人声，demo可见：https://deepmind.com/blog/wavenet-generative-model-raw-audio/.Hierarchical learning. (?) 这块纯粹想象，并没有任何paper出现。大致想法是希望model能跟人类一样从1+1=2 慢慢学会微积分。从而真正达到强人工智能。
上述前面四个方向都有paper和相关研究，但是都并未发展完全，而且相互依赖。并且我相信，NLP也会和vision converge，因为类似generative learning 和 zero-shot learning他们需要互相利用对方的结果才能继续发展。
路漫漫其修远兮，吾将上下而求索。大家一起为meta learning而努力吧。

Transfer Learning - Machine Learning's Next Frontier

What is Transfer Learning?Why Transfer Learning Now?A Definition of Transfer LearningTransfer Learning ScenariosApplications of Transfer Learning
Learning from simulationsAdapting to new domainsTransferring knowledge across languagesTransfer Learning Methods
Using pre-trained CNN featuresLearning domain-invariant representationsMaking representations more similarConfusing domainsRelated Research Areas
Semi-supervised learningUsing available data more effectivelyImproving models' ability to generalizeMaking models more robustMulti-task learningContinuous learningZero-shot learningConclusion
In recent years, we have become increasingly good at training deep neural networks to learn a very accurate mapping from inputs to outputs, whether they are images, sentences, label predictions, etc. from large amounts of labeled data.
What our models still frightfully lack is the ability to generalize to conditions that are different from the ones encountered during training. When is this necessary? Every time you apply your model not to a carefully constructed dataset but to the real world. The real world is messy and contains an infinite number of novel scenarios, many of which your model has not encountered during training and for which it is in turn ill-prepared to make predictions. The ability to transfer knowledge to new conditions is generally known as transfer learning and is what we will discuss in the rest of this post.
Over the course of this blog post, I will first contrast transfer learning with machine learning's most pervasive and successful paradigm, supervised learning. I will then outline reasons why transfer learning warrants our attention. Subsequently, I will give a more technical definition and detail different transfer learning scenarios. I will then provide examples of applications of transfer learning before delving into practical methods that can be used to transfer knowledge. Finally, I will give an overview of related directions and provide an outlook into the future.
What is Transfer Learning?
In the classic supervised learning scenario of machine learning, if we intend to train a model for some task and domain AA, we assume that we are provided with labeled data for the same task and domain. We can see this clearly in Figure 1, where the task and domain of the training and test data of our model AA is the same. We will later define in more detail what exactly a task and a domain are). For the moment, let us assume that a task is the objective our model aims to perform, e.g. recognize objects in images, and a domain is where our data is coming from, e.g. images taken in San Francisco coffee shops.
Figure 1: The traditional supervised learning setup in ML
We can now train a model AA on this dataset and expect it to perform well on unseen data of the same task and domain. On another occasion, when given data for some other task or domain BB, we require again labeled data of the same task or domain that we can use to train a new model BB so that we can expect it to perform well on this data.
The traditional supervised learning paradigm breaks down when we do not have sufficient labeled data for the task or domain we care about to train a reliable model. If we want to train a model to detect pedestrians on night-time images, we could apply a model that has been trained on a similar domain, e.g. on day-time images. In practice, however, we often experience a deterioration or collapse in performance as the model has inherited the bias of its training data and does not know how to generalize to the new domain. If we want to train a model to perform a new task, such as detecting bicyclists, we cannot even reuse an existing model, as the labels between the tasks differ.
Transfer learning allows us to deal with these scenarios by leveraging the already existing labeled data of some related task or domain. We try to store this knowledge gained in solving the source task in the source domain and apply it to our problem of interest as can be seen in Figure 2.
Figure 2: The transfer learning setup
In practice, we seek to transfer as much knowledge as we can from the source setting to our target task or domain. This knowledge can take on various forms depending on the data: it can pertain to how objects are composed to allow us to more easily identify novel objects; it can be with regard to the general words people use to express their opinions, etc.
Why Transfer Learning Now?
Andrew Ng, chief scientist at Baidu and professor at Stanford, said during his widely popular NIPS 2016 tutorial that transfer learning will be -- after supervised learning -- the next driver of ML commercial success.
Figure 3: Andrew Ng on transfer learning at NIPS 2016
In particular, he sketched out a chart on a whiteboard that I've sought to replicate as faithfully as possible in Figure 4 below (sorry about the unlabelled axes). According to Andrew Ng, transfer learning will become a key driver of Machine Learning success in industry.
Figure 4: Drivers of ML industrial success according to Andrew Ng
It is indisputable that ML use and success in industry has so far been mostly driven by supervised learning. Fuelled by advances in Deep Learning, more capable computing utilities, and large labeled datasets, supervised learning has been largely responsible for the wave of renewed interest in AI, funding rounds and acquisitions, and in particular the applications of machine learning that we have seen in recent years and that have become part of our daily lives. If we disregard naysayers and heralds of another AI winter and instead trust the prescience of Andrew Ng, this success will likely continue.
It is less clear, however, why transfer learning which has been around for decades and is currently little utilized in industry, will see the explosive growth predicted by Ng. Even more so as transfer learning currently receives relatively little visibility compared to other areas of machine learning such as unsupervised learning and reinforcement learning, which have come to enjoy increasing popularity: Unsupervised learning -- the key ingredient on the quest to General AI according to Yann LeCun as can be seen in Figure 5 -- has seen a resurgence of interest, driven in particular by Generative Adversarial Networks. Reinforcement learning, in turn, spear-headed by Google DeepMind has led to advances in game-playing AI exemplified by the success of AlphaGo and has already seen success in the real world, e.g. by reducing Google's data center cooling bill by 40%. Both of these areas, while promising, will likely only have a comparatively small commercial impact in the foreseeable future and mostly remain within the confines of cutting-edge research papers as they still face many challenges.
Figure 5: Transfer Learning is conspicuously absent as ingredient from Yann LeCun's cake
What makes transfer learning different? In the following, we will look at the factors that -- in our opinion -- motivate Ng's prognosis and outline the reasons why just now is the time to pay attention to transfer learning.
The current use of machine learning in industry is characterised by a dichotomy: On the one hand, over the course of the last years, we have obtained the ability to train more and more accurate models. We are now at the stage that for many tasks, state-of-the-art models have reached a level where their performance is so good that it is no longer a hindrance for users. How good? The newest residual networks [1] on ImageNet achieve superhuman performance at recognising objects; Google's Smart Reply [2] automatically handles 10% of all mobile responses; speech recognition error has consistently dropped and is more accurate than typing [3]; we can automatically identify skin cancer as well as dermatologists; Google's NMT system [4] is used in production for more than 10 language pairs; Baidu can generate realistic sounding speech in real-time; the list goes on and on. This level of maturity has allowed the large-scale deployment of these models to millions of users and has enabled widespread adoption.
On the other hand, these successful models are immensely data-hungry and rely on huge amounts of labeled data to achieve their performance. For some tasks and domains, this data is available as it has been painstakingly gathered over many years. In a few cases, it is public, e.g. ImageNet [5], but large amounts of labeled data are usually proprietary or expensive to obtain, as in the case of many speech or MT datasets, as they provide an edge over the competition.
At the same time, when applying a machine learning model in the wild, it is faced with a myriad of conditions which the model has never seen before and does not know how to deal with; each client and every user has their own preferences, possesses or generates data that is different than the data used for training; a model is asked to perform many tasks that are related to but not the same as the task it was trained for. In all of these situations, our current state-of-the-art models, despite exhibiting human-level or even super-human performance on the task and domain they were trained on, suffer a significant loss in performance or even break down completely.
Transfer learning can help us deal with these novel scenarios and is necessary for production-scale use of machine learning that goes beyond tasks and domains were labeled data is plentiful. So far, we have applied our models to the tasks and domains that -- while impactful -- are the low-hanging fruits in terms of data availability. To also serve the long tail of the distribution, we must learn to transfer the knowledge we have acquired to new tasks and domains.
To be able to do this, we need to understand the concepts that transfer learning involves. For this reason, we will give a more technical definition in the following section.
A Definition of Transfer Learning
For this definition, we will closely follow the excellent survey by Pan and Yang (2010) [6] with binary document classification as a running example. Transfer learning involves the concepts of a domain and a task. A domain DD consists of a feature space XX and a marginal probability distribution P(X)P(X) over the feature space, where X=x1,⋯,xn∈XX=x1,⋯,xn∈X. For document classification with a bag-of-words representation, XXis the space of all document representations, xixi is the ii-th term vector corresponding to some document and XX is the sample of documents used for training.
Given a domain, D={X,P(X)}D={X,P(X)}, a task TT consists of a label space YY and a conditional probability distribution P(Y|X)P(Y|X) that is typically learned from the training data consisting of pairs xi∈Xxi∈X and yi∈Yyi∈Y. In our document classification example, YY is the set of all labels, i.e. True, False and yiyi is either True or False.
Given a source domain DSDS, a corresponding source task TSTS, as well as a target domain DTDT and a target task TTTT, the objective of transfer learning now is to enable us to learn the target conditional probability distribution P(YT|XT)P(YT|XT) in DTDT with the information gained from DSDSand TSTS where DS≠DTDS≠DT or TS≠TTTS≠TT. In most cases, a limited number of labeled target examples, which is exponentially smaller than the number of labeled source examples are assumed to be available.
As both the domain DD and the task TT are defined as tuples, these inequalities give rise to four transfer learning scenarios, which we will discus below.
Transfer Learning Scenarios
Given source and target domains DSDS and DTDT where D={X,P(X)}D={X,P(X)} and source and target tasks TSTS and TTTT where T={Y,P(Y|X)}T={Y,P(Y|X)}source and target conditions can vary in four ways, which we will illustrate in the following again using our document classification example:
XS≠XTXS≠XT. The feature spaces of the source and target domain are different, e.g. the documents are written in two different languages. In the context of natural language processing, this is generally referred to as cross-lingual adaptation.P(XS)≠P(XT)P(XS)≠P(XT). The marginal probability distributions of source and target domain are different, e.g. the documents discuss different topics. This scenario is generally known as domain adaptation.YS≠YTYS≠YT. The label spaces between the two tasks are different, e.g. documents need to be assigned different labels in the target task. In practice, this scenario usually occurs with scenario 4, as it is extremely rare for two different tasks to have different label spaces, but exactly the same conditional probability distributions.P(YS|XS)≠P(YT|XT)P(YS|XS)≠P(YT|XT). The conditional probability distributions of the source and target tasks are different, e.g. source and target documents are unbalanced with regard to their classes. This scenario is quite common in practice and approaches such as over-sampling, under-sampling, or SMOTE [7] are widely used.
After we are now aware of the concepts relevant for transfer learning and the scenarios in which it is applied, we will look to different applications of transfer learning that illustrate some of its potential.
Applications of Transfer Learning
Learning from simulations
One particular application of transfer learning that I'm very excited about and that I assume we'll see more of in the future is learning from simulations. For many machine learning applications that rely on hardware for interaction, gathering data and training a model in the real world is either expensive, time-consuming, or simply too dangerous. It is thus advisable to gather data in some other, less risky way.
Simulation is the preferred tool for this and is used towards enabling many advanced ML systems in the real world. Learning from a simulation and applying the acquired knowledge to the real world is an instance of transfer learning scenario 2, as the feature spaces between source and target domain are the same (both generally rely on pixels), but the marginal probability distributions between simulation and reality are different, i.e. objects in the simulation and the source look different, although this difference diminishes as simulations get more realistic. At the same time, the conditional probability distributions between simulation and real wold might be different as the simulation is not able to fully replicate all reactions in the real world, e.g. a physics engine can not completely mimic the complex interactions of real-world objects.
Figure 6: A Google self-driving car (source:
Learning from simulations has the benefit of making data gathering easy as objects can be easily bounded and analyzed, while simultaneously enabling fast training, as learning can be parallelized across multiple instances. Consequently, it is a prerequisite for large-scale machine learning projects that need to interact with the real world, such as self-driving cars (Figure 6). According to Zhaoyin Jia, Google's self-driving car tech lead, "Simulation is essential if you really want to do a self-driving car". Udacity has open-sourced the simulator it uses for teaching its self-driving car engineer nanodegree, which can be seen in Figure 7 and OpenAI's Universe will potentially allows to train a self-driving car using GTA 5 or other video games.
Figure 7: Udacity's self-driving car simulator (source:
TechCrunch)
Another area where learning from simulations is key is robotics: Training models on a real robot is too slow and robots are expensive to train. Learning from a simulation and transferring the knowledge to real-world robot alleviates this problem and has recently been garnering additional interest [8]. An example of a data manipulation task in the real world and in a simulation can be seen in Figure 8.
Figure 8: Robot and simulation images (Rusu et al., 2016)
Finally, another direction where simulation will be an integral part is on the path towards general AI. Training an agent to achieve general artificial intelligence directly in the real world is too costly and hinders learning initially through unnecessary complexity. Rather, learning may be more successful if it is based on a simulated environment such as CommAI-env [9] that is visible in Figure 9.
Figure 9: Facebook AI Research's CommAI-env (Mikolov et al., 2015)
While learning from simulations is a particular instance of domain adaptation, it is worth outlining some other examples of domain adaptation.
Domain adaptation is a common requirement in vision as often the data where labeled information is easily accessible and the data that we actually care about are different, whether this pertains to identifying bikes as in Figure 10 or some other objects in the wild. Even if the training and the the test data look the same, the training data may still contain a bias that is imperceptible to humans but which the model will exploit to overfit on the training data [10].
Figure 10: Different visual domains (Sun et al., 2016)
Another common domain adaptation scenario pertains to adapting to different text types: Standard NLP tools such as part-of-speech taggers or parsers are typically trained on news data such as the Wall Street Journal, which has historically been used to evaluate these models. Models trained on news data, however, have difficulty coping with more novel text forms such as social media messages and the challenges they present.
Figure 11: Different text types / genres
Even within one domain such as product reviews, people employ different words and phrases to express the same opinion. A model trained on one type of review should thus be able to disentangle the general and domain-specific opinion words that people use in order not to be confused by the shift in domain.
Figure 12: Different topics
Finally, while the above challenges deal with general text or image types, problems are amplified if we look at domains that pertain to individual or groups of users: Consider the case of automatic speech recognition (ASR). Speech is poised to become the next big platform, with 50% of all our searches predicted to be performed by voice by 2020. Most ASR systems are evaluated traditionally on the Switchboard dataset, which comprises 500 speakers. Most people with a standard accent are thus fortunate, while immigrants, people with non-standard accents, people with a speech impediment, or children have trouble being understood. Now more than ever do we need systems that are able to adapt to individual users and minorities to ensure that everyone's voice is heard.
Figure 13: Different accents
Transferring knowledge across languages
Finally, learning from one language and applying our knowledge to another language is -- in my opinion -- another killer application of transfer learning, which I have written about before here in the context of cross-lingual embedding models. Reliable cross-lingual adaptation methods would allow us to leverage the vast amounts of labeled data we have in English and apply them to any language, particularly underserved and truly low-resource languages. Given the current state-of-the-art, this still seems utopian, but recent advances such as zero-shot translation [11] promise rapid progress in this area.
While we have so far considered particular applications of transfer learning, we will now look at practical methods and directions in the literature that are used to solve some of the presented challenges.
Transfer Learning Methods
Transfer learning has a long history of research and techniques exist to tackle each of the four transfer learning scenarios described above. The advent of Deep Learning has led to a range of new transfer learning approaches, some of which we will review in the following. For a survey of earlier methods, refer to [6].
Using pre-trained CNN features
In order to motivate the most common way of transfer learning is currently applied, we must understand what accounts for the outstanding success of large convolutional neural networks on ImageNet [12].
Understanding convolutional neural networks
While many details of how these models work still remain a mystery, we are by now aware that lower convolutional layers capture low-level image features, e.g. edges (see Figure 14), while higher convolutional layers capture more and more complex details, such as body parts, faces, and other compositional features.
Figure 14: Example filters learned by AlexNet (Krizhevsky et al., 2012).
The final fully-connected layers are generally assumed to capture information that is relevant for solving the respective task, e.g. AlexNet's fully-connected layers would indicate which features are relevant to classify an image into one of 1000 object categories.
However, while knowing that a cat has whiskers, paws, fur, etc. is necessary for identifying an animal as a cat (for an example, see Figure 15), it does not help us with identifying new objects or to solve other common vision tasks such as scene recognition, fine grained recognition, attribute detection and image retrieval.
Figure 15: This post's token cat
What can help us, however, are representations that capture general information of how an image is composed and what combinations of edges and shapes it contains. This information is contained in one of the final convolutional layers or early fully-connected layers in large convolutional neural networks trained on ImageNet as we have described above.
For a new task, we can thus simply use the off-the-shelf features of a state-of-the-art CNN pre-trained on ImageNet and train a new model on these extracted features. In practice, we either keep the pre-trained parameters fixed or tune them with a small learning rate in order to ensure that we do not unlearn the previously acquired knowledge. This simple approach has been shown to achieve astounding results on an array of vision tasks [13] as well as tasks that rely on visual input such as image captioning. A model trained on ImageNet seems to capture details about the way animals and objects are structured and composed that is generally relevant when dealing with images. As such, the ImageNet task seems to be a good proxy for general computer vision problems, as the same knowledge that is required to excel in it is also relevant for many other tasks.
Learning the underlying structure of images
A similar assumption is used to motivate generative models: When training generative models, we assume that the ability to generate realistic images requires an understanding of the underlying structure of images, which in turn can be applied to many other tasks. This assumption itself relies on the premise that all images lie on a low-dimensional manifold, i.e. that there is some underlying structure to images that can be extracted by a model. Recent advances in generating photorealistic images with Generative Adversarial Networks [14] indicate that such a structure might indeed exist, as evidenced by the model's ability to show realistic transitions between points in the bedroom image space in Figure 16.
Figure 16: Walking along the bedroom image manifold
Are pre-trained features useful beyond vision?
Off-the-shelf CNN features have seen unparalleled results in vision, but the question remains if this success can be replicated in other disciplines using other types of data, such as languages. Currently, there are no off-the-shelf features that achieve results for natural language processing that are as astounding as their vision equivalent. Why is that? Do such features exist at all or -- if not -- why is vision more conducive to this form of transfer than language?
The output of lower-level tasks such as part-of-speech tagging or chunking can be likened as off-the-shelf features, but these do not capture more fine-grained rules of language use beyond syntax and are not helpful for all tasks. As we have seen, the existence of generalizable off-the-shelf features seems to be intertwined with the existence of a task that can be seen as a prototype for many tasks in the field. In vision, object recognition occupies such a role. In language, the closest analogue might be language modelling: In order to predict the next word or sentence given a sequence of words, a model needs to possess knowledge of how language is structured, needs to understand what words likely are related to and likely follow each other, needs to model long-term dependencies, etc.
While state-of-the-art language models increasingly approach human levels [15], their features are only of limited use. At the same time, advances in language modelling have led to positive results for other tasks: Pre-training a model with a language model objective improves performance [16]. In addition, word embeddings pre-trained on a large unlabelled corpus with an approximated language modelling objective have become pervasive [17]. While they are not as effective as off-the-shelf features in vision, they still provide sizeable gains [18] and can be seen a simple form of transfer of general domain knowledge derived from a large unlabelled corpus.
While a general proxy task seems currently out of reach in natural language processing, auxiliary tasks can take the form of local proxies. Whether through multi-task objectives [19] or synthetic task objectives [20, 21], they can be used to inject additional relevant knowledge into the model.
Using pre-trained features is currently the most straightforward and most commonly used way to perform transfer learning. However, it is by far not the only one.
Learning domain-invariant representations
Pre-trained features are in practice mostly used for adaptation scenario 3 where we want to adapt to a new task. For the other scenarios, another way to transfer knowledge enabled by Deep Learning is to learn representations that do not change based on our domain. This approach is conceptually very similar to the way we have been thinking about using pre-trained CNN features: Both encode general knowledge about our domain. However, creating representations that do not change based on the domain is a lot less expensive and more feasible for non-vision tasks than generating representations that are useful for all tasks. ImageNet has taken years and thousands of hours to create, while we typically only need unlabelled data of each domain for creating domain-invariant representations. These representations are generally learned using stacked denoising autoencoders and have seen success in natural language processing [22, 23] as well as in vision [24].
Making representations more similar
In order to improve the transferability of the learned representations from the source to the target domain, we would like the representations between the two domains to be as similar as possible so that the model does not take into account domain-specific characteristics that may hinder transfer but the commonalities between the domains.
Rather than just letting our autoencoder learn some representation, we can thus actively encourage the representations of both domains to be more similar to each other. We can apply this as a pre-processing step directly to the representations of our data [25, 26] and can then use the new representations for training. We can also encourage the representations of the domains in our model to be more similar to each other [27, 28].
Confusing domains
Another way to ensure similarity between the representations of both domains that has recently become more popular is to add another objective to an existing model that encourages it to confuse the two domains [29, 30]. This domain confusion loss is a regular classification loss where the model tries to predict the domain of the input example. The difference to a regular loss, however, is that gradients that flow from the loss to the rest of the network are reversed, as can be seen in Figure 17.
Figure 17: Confusing domains with a gradient reversal layer (Ganin and Lempitsky, 2015)
Instead of learning to minimize the error of the domain classification loss, the gradient reversal layer causes the model to maximize the error. In practice, this means that the model learns representations that allow it to minimize its original objective, while not allowing it to differentiate between the two domains, which is beneficial for knowledge transfer. While a model trained only with the regular objective is shown in Figure 18 to be clearly able to separate domains based on its learned representation, a model whose objective has been augmented with the domain confusion term is unable to do so.
Figure 18: Domain classifier score of a regular and a domain confusion model (Tzeng et al, 2015)
Related Research Areas
While this post is about transfer learning, transfer learning is by far not the only area of machine learning that seeks to leverage limited amounts of data, use learned knowledge for new endeavours, and enable models to generalize better to new settings. In the following, we will thus introduce other directions that are related or complementary to the goals of transfer learning.
Semi-supervised learning
Transfer learning seeks to leverage unlabelled data in the target task or domain to the most effect. This is also the maxim of semi-supervised learning, which follows the classical machine learning setup but assumes only a limited amount of labeled samples for training. Insofar, semi-supervised domain adaptation is essentially semi-supervised learning under domain shift. Many lessons and insights from semi-supervised learning are thus equally applicable and relevant for transfer learning. Refer to [31] for a great survey on semi-supervised learning.
Using available data more effectively
Another direction that is related to transfer learning and semi-supervised learning is to enable models to work better with limited amounts of data.
This can be done in several ways: One can leverage unsupervised or semi-supervised learning to extract information from unlabelled data thereby reducing the reliance on labelled samples; one can give the model access to other features inherent in the data while reducing its tendency to overfit via regularization; finally, one can leverage data that so far remains neglected or rests in non-obvious places.
Such fortuitous data [32] may be created as a side effect of user-generated content, such as hyperlinks that can be used to improve named entity and part-of-speech taggers; it may come as a by-product of annotation, e.g. annotator disagreement that may improve tagging or parsing; or it may be derived from user behaviour such as eye tracking or keystroke dynamics, which can inform NLP tasks. While such data has only been exploited in limited ways, such examples encourage us to look for data in unexpected places and to investigate new ways of retrieving data.
Improving models' ability to generalize
Related to this is also the direction of making models generalize better. In order to achieve this, we must first better understand the behaviour and intricacies of large neural networks and investigate why and how they generalize. Recent work has taken promising steps towards this end [33], but many questions are still left unanswered.
Making models more robust
While improving our models' generalization ability goes a long way, we might generalize well to similar instances but still fail catastrophically on unexpected or atypical inputs. Therefore, a key complementary objective is to make our models more robust. This direction has seen increasing interest recently fuelled by advances in adversarial learning and recent approaches have investigated many ways of how models can be made more robust to worst-case or adversarial examples in different settings [34, 35].
For a more thorough overview of multi-task learning, particularly as applied to deep neural networks, have a look at my other blog post here.
Continuous learning
While multi-task learning allows us to retain the knowledge across many tasks without suffering a performance penalty on our source tasks, this is only possible if all tasks are present at training time. For each new task, we would generally need to retrain our model on all tasks again.
In the real world, however, we would like an agent to be able to deal with tasks that gradually become more complex by leveraging its past experience. To this end, we need to enable a model to learn continuously without forgetting. This area of machine learning is known as learning to learn [36], meta-learning, life-long learning, or continuous learning. It has seen some recent developments in the context of RL [37, 38, 39] most notably by Google DeepMind on their quest towards general learning agents and is also being applied to sequence-to-sequence models [40].
Zero-shot learning
Finally, if we take transfer learning to the extreme and aim to learn from only a few, one or even zero instances of a class, we arrive at few-shot, one-shot, and zero-shot learning respectively. Enabling models to perform one-shot and zero-shot learning is admittedly among the hardest problems in machine learning. At the same time, it is something that comes naturally to us humans: Toddlers only need to be told once what a dog is in order to be able to identify any other dog, while adults can understand the essence of an object just by reading about it in context, without ever having encountered it before.
Recent advances in one-shot learning have leveraged the insight that models need to be trained explicitly to perform one-shot learning in order to achieve good performance at test time [41, 42], while the more realistic generalized zero-shot learning setting where training classes are present at test time has garnered attention lately [43].
Conclusion
In summary, there are many exciting research directions that transfer learning offers and -- in particular -- many applications that are in need of models that can transfer knowledge to new tasks and adapt to new domains. I hope that I was able to provide you with an overview of transfer learning in this blog post and was able to pique your interest.
Some of the statements in this blog post are deliberately phrased slightly controversial. Let me know your thoughts about any contentious issues and any errors that I undoubtedly made in writing this post in the comments below.
Note: Title image is credited to [44].

展开全文
weixin_37773766 2018-06-21 15:56:37
• ## 详细解读Completely Heterogeneous Transfer Learning with Attention - What And What Not To Transfer 迁移学习 自然语言处理 文本分类

这篇论文叫做Completely Heterogeneous Transfer Learning with Attention - What And What Not To Transfer，论文的作者来自Carnegie Mellon University，然后这论文发在了IJCAI2017上面。而这篇文章是在其之前发的...


这篇论文叫做Completely Heterogeneous Transfer Learning with Attention - What And What Not To Transfer，论文的作者来自Carnegie Mellon University，然后这论文发在了IJCAI2017上面。而这篇文章是在其之前发的文章上的改进，大家感兴趣的话可以去搜一下Proactive transfer learning for heterogeneous feature and label spaces，那么在其之前发的这篇论文中提出了一个模型叫做CHTL（Completely Heterogeneous Transfer Learning）接下来我们会详细讲，然后今天要讲的这篇论文是在原有的模型上加入了attention机制，大家可能也发现了attention机制最近大家用的还是蛮多的而且效果确实都有改善。
首先解释下Completely Heterogeneous Transfer Learning（CHTL），其意思就是完全异质的迁移学习，那么什么交完全异质呢，其实大家原来接触的大多迁移都是比如我学习到了如何区分猫和狗，然后我要利用这个学到的知识迁移到识别老虎的任务中，大家会发现虽然我们在训练的时候没有涉及到老虎但是猫和狗在特征分布上和老虎还是类似的，也就是说虽然sourse和target的标签空间是不重合的，但是他们的特征空间是重合的，即使不重合也有一些显式的联系。而完全异质的迁移涉及到的sourse和target的标签空间和特征空间都是不重合的。举个例子来讲就是我在英文文本上学习到的知识，我没有去迁移到其他的未见的英文文本上而是把这些知识用来了法语文本的相关任务中。
好了，搞清楚任务是什么。我们接下来看下模型。文章中的任务是自然语言文本分类，先放模型的整体框架上来：

我们可以看到左下方的源数据集中的是关于政治或者药物的英文文本，然后右下角的目标数据集中的是关于政府和运动的文本所以它们的标签空间不重合，然后要注意的目标数据集中文本不是英文文本而是法语，意大利语等文本。也就是说他们的特征空间也是不重合的。那么首先源数据集中 通过一个映射g，目标数据集通过一个映射h投影到一个共同的连接子空间上，具体来讲就是Word2vet。Word2vet是我们这个领域在处理文本数据的一种常见的行为，Word2vet又有一些常用的方法本文采用的是skip-gram方法。我们大家都知道我们的神经网络没有办法去很好地直接地处理这些文本信息，而且这些文本相对来说维度比较低，所以更好的选择就是将其转化为向量，当然转化工程也要遵循一定的规则，就是在未转化前两个单词在语义上是比较近的那么在转化为向量后也要求二者的距离比较近。我之前看过一篇博客对Word2vet做了很详细的介绍但是很可惜没有收藏，一时也找不到后面找到了再推荐给大家吧。那么本文中做Word2vet除了是为了方便处理的基础上更重要的是可以减少两个数据集直接的特征差异（毕竟一个英文一个非英文）然后我们通过一个共同的映射f将二者投影到我们的标签空间中便完成了文本分类问题。
好了看到这里是不是觉得很简单，事实上却是也比较简单，但是后续还有模型的优化部分。不过我们在介绍优化部分之前还是把这简单的模型公式化的展示一下。其目标函数如下：

我们可以看到损失函数由两个部分构成分别是源数据集上的loss和目标数据集上的loss。然后我们仔细一看，发现问题并不简单，说好的迁移呢。为毛还有目标数据集的训练，博主欺负我读书少。然鹅事实是我们源数据集是完全标记的，然后我们的目标数据集的标记率是很低的，然后我们可以通过后续的实验我们发现这篇文章中的数据集中目标数据集的标记率是0.1，而在训练的过程中正是用的这一小部分的数据来训练的，是不是感觉0.1的标记率有点低，然而还有更低的比如one shot问题和zero shot问题不过这些极端问题一般需要一些先验知识作为辅助工具。正是因为有标记的数据需要大量的人力物力来收集，才催生了迁移学习这个领域。本人也当过苦力标记图像数据，简直不要太辛苦枯燥、乏味，而且标错了才是最让人崩溃的。所以各位同学在用人家的数据集的时候还是要满怀感激好吧。好的扯的有点远，我们回归正题，公式中的f,g,h我们前面介绍过了，然后W就是W大家都懂的。Xs是源数据，Xt是目标数据，Ys表示源数据的标签，Yt是目标数据的标签，上标i表示第i个实例，上标i表示第i个实例，$\frac{~}{y}$表示的是数据集中不同于$_{_{_{_{Y}^{}}^{i}$的其他标记，即除了当前处理的数据的标记外的其他的所有标记。看下第二个求和符合就知道了。这里采用了hinge rank loss，我们有必要解释下为什么要用这种形式，我们将公式展开成下面的形式可能更方便理解一点：

上面的这个公式是CHTL的原始公式是作者直接发的论文里的，对应的符号意思大家应该看的懂。那么为什么要用hinge rank loss，我们看下面的这个公式：

$X_{s}^{i}W_{s}W_{f}$这一块的得到就是我们通过模型预测到的标签，然后$X_{s}^{i}W_{s}W_{f}Y_{s}^{T(i)}$就是预测的标签和ground truth的点积，也就是越相似其值会越大，然后前面的符号是负的也是越相似我们的loss越小，这和我们训练的目的不谋而和，$X_{s}^{i}W_{s}W_{f}{\widetilde{y}_^T}$而这一块的含义和前面的是类似的但是由于其前面的符号是正的所以其越不相似越好。为什么这样设定呢，因为我们做的是分类问题，所以目标类和我们的预测越像当然越好，其他的干扰类当然越不像越好。然后R(W)这个是正则项，$\lambda$是可调的参数。
然后，刚才介绍的是最初的chtl模型，然后文章在后面做了一些改进。改进的出发点是虽然我们说我们认为chtl源和目标之间我们认为是没有什么明显的联系的但是我们有理由认为有些实例适合用作迁移，而有些实例不适合进行迁移甚至会产生负面的影响，比如说我们首先学会了打篮球，然后我们去学习踢足球，那么你在打篮球的时候学习到的知识比如一共有两个框投入到对方的那个框就可以得分，还有要把球传给有位置比较好的队友这些知识在你学习踢足球的时候是有利的，因为在打篮球的时候同样是适用的，但是有些知识则会产生负面的影响，比如打篮球的时候当球出界了但是球还没落地还是可以救回来的但是足球一旦出界无论是否触地都算出界了。所以如果把这个知识迁移过来就不会产生帮助甚至会产生负面影响。所以文章针对这个问题给出了一个解决方法就是引入注意力机制（attention）。具体来讲就是对源数据集中的实例加上一个权重，同时出于计算性能方面的考虑，文章没有对每一个实例都加一个权重而是先把实例给分了k簇，然后对每一簇加了一个权重。下面是加了attention后的目标函数：

和之前的模型对比，发现变化就是在左边的基础上加了一个权重系数，然后a是一个可学习的参数，决定了每个簇的权重。μ是一个仅仅针对源任务超参数对a和f的惩罚来对源进行优化。因为a虽然只是针对源任务但是其会影响f，通过影响f就可以影响目标任务。具体怎么将源数据集进行分簇，文章采用了K-means聚类的方法，也就是k均值聚类。需要注意的是这里只是对实例聚了个类，没有分类这些聚在一类中的文本可能属于同一个标签页可能不属于同一个标签。K-means算法然后他比较简单，以欧式距离作为相似度测量，但他有一个可调的参数就是聚成几类也就是k，这个k设置多大合适只能通过实验效果确定了。
然后在训练带有注意力机制的chtl的时候，文章采用分步训练的方法首先先固定其他的参数,训练Wg; a;Wf，就是先训练源数据的分类方法，然后再固定其他的训练Wh;Wf.这样交替进行训练来优化参数。
然后这还没有完，文章的作者还引入了去燥自动编码器来优化模型，所以这里简单介绍下自动编码器：自动编码器基于这样一个事实：原始input（设为x）经过加权（W、b)、映射（Sigmoid）之后得到y，再对y反向加权映射回来成为z。通过反复迭代训练两组（W、b），使得误差函数最小，即尽可能保证z近似于x，即完美重构了x。那么可以说正向第一组权（W、b）是成功的，很好的学习了input中的关键特征，不然也不会重构得如此完美。结构图如下图所示：

举个例子：

那么什么交去燥自动编码器呢，就是在自动编码器的基础上加了去燥模块：
去燥：就是以一定概率分布（通常使用二项分布）去擦除原始input矩阵，即每个值都随机置0,  这样看起来部分数据的部分特征是丢失了。以这丢失的数据x'去计算y，计算z，并将z与原始x做误差迭代，这样，网络就学习了这个破损（原文叫Corruputed）的数据。
这个破损的数据是很有用的，原因有二：
其之一，通过与非破损数据训练的对比，破损数据训练出来的Weight噪声比较小。降噪因此得名。
原因不难理解，因为擦除的时候不小心把输入噪声给×掉了。
其之二，破损数据一定程度上减轻了训练数据与测试数据的代沟。由于数据的部分被×掉了，因而这破损数据
一定程度上比较接近测试数据。（训练、测试肯定有同有异，当然我们要求同舍异）。
这样训练出来的Weight的鲁棒性就提高了。图示如下：

好了了解了这些之后我们来看最终的模型：

那么目标函数也加入了对应的去燥编码器部分的loss:

后面呢就是实验部分了，没啥好说的了贴结果吧： 文章中介绍了两个实验第一个实验是作者他们自己制作的随机数数据集：

然后用这个数据集进行测试来比较各种方法的优劣。他们通过高斯函数生成了多对源和目标数据集，每对数据集之间的差异性大小不同，可以通过这这些数据集的对比可以看出不同模型在不同任务难度下的性能。我们通过一个可控的超参数δlable来控制员和目标数据集的异构性大小，δlable越大则源和目标数据集之间的差异越大。而δ_diff定义了分类的界限，本文中将δ_diff设为0.5也就是将生成的数据中方差间的差异小于0.5的归为一类，Psys,m是表示源数据中第m类的均值。然后M=4是源和目标数据集中都是共有4类实例。这个实验室只体现在标签空间上的异构，他们的特征空间的维度是一样的都等于二十，从形式上也可以看出来他们都是高斯函数自然特征空间也就一样，他做这个实验的目的只是想做一个模拟实验看看他们的模型的有效性。这个实验是一个只有标签空间异构的实验，也就是传统的迁移问题。

我们可以看到标记率越高实验效果越好,然后att是表示attention，AE表示自动编码器。ZSL是表示zero shot。
然后文章中还做了另一个实验就是不同语言不同类别间的文本迁移：

实验采用了多个数据集，源数据集前面的三个数据集rcv1,20news,和R8都是英语，前面是通过迁移英语文本学到的文本分类知识来学习法语，西班牙语，德语和意大利语的后面是通过其他语言迁移学习英语。文章没有给出了具体的实验步骤也没有代码。只是对实验结果进行了一下分析。实验结果显示本文提出的模型chtl:att，ae是效果最好的，然后同时分别说明了单独加入att会对效果有一定的改善，加入ae后会有进一步的改善，然后只是加入两个全连接层并不会改善效果，但是加入两个全连接层同时结合att和ae则会比不加两个全连接层的效果要好。然后zsl在这个实验中相较于mlp多加了Word embedding的操作，可以看到结果得到了很大的提升，然后zsl引入注意力机制也对其性能进行了改善，但是没有像chtl那样改善的多。

前面在介绍注意了机制的时候我们队源数据集进行了分簇操作，但是具体分几簇比较好后面通过实验得出的，然后论文中提到他么用chtl：att模型进行了异类语言的文本分类发现在k=40的时候效果是最好的。
本文的一个重点就是attention机制，那么它怎么体现出来的呢，文章给了一个可视化的表示：

蓝色的是源数据红色的是目标数据，然后黑色的表示源数据集中被赋予了最高权重的五个簇，我么可以发现靠的最近的源数据被赋予了比较高的权重这样可以把最合适的知识迁移过去。但是我有一个地方不太懂就是最远的也被赋予了比较高的权重文章说这样就避免了负迁移，但是还是不太懂。
本篇博客写到这里就结束了，希望写的内容可以对大家有所帮助，如果有疑问欢迎留言。如果哪位大佬发现博主写的那些地方差错或者纰漏还望指出我们一起学习。
ps:转载请注明出处。
展开全文
qq_35916487 2018-07-24 14:54:29
• 转自： ... 原文链接： 1， http://simple-is-better.com/news/553  2，  http://blog.qiusuo.dotcloud.com/2011/07/10/transfer-encoding的作用/#comment-187 今天使
转自：
http://blog.sina.com.cn/s/blog_6b98772b010105fd.html
原文链接：
1，

http://simple-is-better.com/news/553

2，

http://blog.qiusuo.dotcloud.com/2011/07/10/transfer-encoding的作用/#comment-187

今天使用swift里面的wsgi server，写一个上传下载文件的应用。使用httplib客户端，

遇到一个问题，就是Transfer-Enco
ding的问题，最终在上面的链接中找到了答案，

Transfer-Encoding的作用
By admin on July 10, 2011

If a Transfer-Encoding field with a value of chunked is specified in an HTTP message (either a request sent by a client or the response from the server), the body of the message consists of an unspecified number of chunks, a terminating last-chunk, an optional trailer of entity header fields, and a final CRLF sequence.

Each chunk starts with the number of octets of the data it embeds expressed in hexadecimal followed by optional parameters (chunk extension) and a terminating CRLF (carriage return and line feed) sequence, followed by the chunk data. The chunk is terminated by CRLF. If chunk extensions are provided, the chunk size is terminated by a semicolon followed with the extension name and an optional equal sign and value.

The last chunk is a zero-length chunk, with the chunk size coded as 0, but without any chunk data section. The final chunk may be followed by an optional trailer of additional entity header fields that are normally delivered in the HTTP header to allow the delivery of data that can only be computed after all chunk data has been generated. The sender may indicate in a Trailer header field which additional fields it will send in the trailer after the chunks.

但凡web server支持 HTTP 1.1，就应该支持Transfer-Encoding的传送方式。apache当然也支持这种传送方式。 简简单单写个程序验证下。
服务器端，一个cgi(mirror.cgi)，将获取的标准输入直接输出到标准输出即可。也就是说将从客户端获得的报文体又作为报文体返回给客户端。 这样来验证客户端通过Transfer-Encoding传送，是否达到预想的目的。

view plain
copy to clipboard
print
?

#!/usr/bin/env python      import sys      BUFFER_SIZE = 1024      sys.stdout.write("Content-type: text/html\n\n")   while True:       buffer = sys.stdin.read(BUFFER_SIZE)       sys.stdout.write(buffer)          if len(buffer) != BUFFER_SIZE:           break

客户端，按照Transfer-Encoding为chunked的format，来传递数据。比如我们想传递一个文件名为file的文件内容 作为报文体的内容传送给服务端。由于file的内容比较大，一下子传递，内存估计吃不消，就可以采用分批传送。

view plain
copy to clipboard
print
?

#!/usr/bin/env python      import httplib      conn = httplib.HTTPConnection("127.0.0.1")   conn.putrequest("PUT", "/cgi-bin/mirror.cgi")   conn.putheader("Transfer-Encoding", "chunked")   conn.endheaders()      with open("file") as fp:       for line in fp.readlines():           conn.send("%x" % len(line) + "\r\n" + line + "\r\n")      conn.send("0\r\n\r\n")      response = conn.getresponse()   print response.read()

References & Resources:
Chunked transfer encodingRFC2616 Transfer-EncodingRFC2616 Transfer-CodingsRFC2616 Content-Length

Posted in web | Tagged Content-Length, Transfer-Encoding | 10 Responses

展开全文
zhangxueyang1 2017-01-08 09:37:56
• ## Exploring transfer learning for NLP 探索NLP的转学-数据集 数据集

556.7MB weixin_38635975 2021-03-12 05:12:53
• ## StyleBank: An Explicit Representation for Neural Image Style Transfer 论文理解 风格迁移 styleBank

StyleBank: An Explicit Representation for Neural Image Style Transfer 论文理解 与现有的神经风格转换网络区别： （1）给风格提供了一个明显的表示。网络在训练好之后可以从内容中完全分离出样式。 （2）能够...
 StyleBank: An Explicit Representation for Neural Image Style Transfer论文理解
与现有的神经风格转换网络区别：
（1）给风格提供了一个明显的表示。网络在训练好之后可以从内容中完全分离出样式。
（2）能够基于区域的风格转换。
（3）不仅可以同时训练多个共享自编码的风格，还可以在不改变自编码的情况下，增量学习一个新的风格。

网络结构
网络实现content和style分离开来。构建一个基于自编码的前向网络，首先通过编码子网络将输入图像转换成特征空间；使用styleBank来分类表示输入风格；每个filter bank代表一个风格。styleBank通过与自编码生成的内容特征进行卷积，可以对应相应的content产生不同的风格化结果。

上图显示了具体的网络结构，主要包括3个模块：图像编码，StyleBank 层，图像解码。构成了2个分支：自编码（图像编码->图像解码），风格化（图像编码->StyleBank层->图像解码）,两个分支共享Encoder和Decoder模块。
自编码分支（Encoder->Decoder）：训练自编码使生成的图像尽可能的和输入图像相近。
风格化分支（Encoder->StyleBank Layer->Decoder）:在Encoder和Decoder中间加入一个中间的StyleBank Layer。在这个layer层中，StyleBank会分别和输入图像经过Encoder得到的特征maps进行卷积生成风格变换后的特征，最后再输入到Decoder中得到风格化结果。

在这个网络结构中，内容信息可以尽可能多地编码到Autoencoder和Decoder中；风格信息被编码到StyleBank中，因此，内容和风格通过这个网路可以被尽可能的分离开。

Encoder 和 Decoder
Encoder由一个步长为1的卷积层和两个步长为2的卷积层组成，Decoder和Encoder是对称的结构，由两个1/2步长的分步卷积和一个步长为1的卷积层。所有的卷积层除了最后一个输出层之后都有instance normalization和relu ；第一个和最后一层是用9*9的卷积核，其余卷积层都是用3*3的卷积核。由于只用了这些层，提出的网络可以进一步减小的模型大小以及计算耗费。

StyleBank Layer
网络支持多个风格同时训练。n个风格，就要训练n个卷积滤镜库。在训练时，若是第i个风格输入时，需要使用相对应的filter bank来前向计算和梯度的反向传播。通过与对应的filter bank进行卷积后就可以将输入图像的特征转换成风格变换后的特征。需要训练新的风格时，只需要重新训练新的filter bank layer。

损耗函数
这个网络包含两个分支，且交替训练的，因此需要定义两个损耗函数。
自编码分支：使用MSE（均方误差），计算输入图像和输出图像的像素均方误差。

风格化分支：使用感知损耗，由内容损耗、风格损耗和总变差正则项组成。

训练策略
为了平衡两个分支（自编码和风格化），采用T+1步交替训练的策略。在训练时，每T+1次迭代，先训练风格化分支T次迭代，然后训练自编码分支一次。

StyleBank 和 Auto-encoder理解
1）StyleBank 如何表示风格？
采用了两种类型的块（stroke patch 和texture patch）来研究，具体方法是选择了两种类型块，将选择的块对应的feature map保留，其余都清0，可以看到（g）的结果是恢复出的风格元素，和选中的原来风格块（i）以及风格化结果块（j）都很接近。

风格元素大小：滤镜核大小有关
如果风格元素较小，不太看出滤镜核的影响。
风格元素较大，滤镜核越大得出的风格元素越大。

下图显示了使用3*3和7*7滤镜核的风格化块结果。明显看到7*7的滤镜核可以学习到更大的风格元素信息。譬如图片中的底下一行，大的浪花在7*7滤镜核的风格化结果中，这个网络支持通过调整滤镜核大小的参数来进行风格元素大小的控制。风格元素大小即得到的纹理图案大小。

2）content image 编码
a.content image 编码后的feautres可以在空间上进行分簇（颜色、边缘、纹理），如可以采用K-mean的无监督算法进行分簇，获得如下图中的左边结果，这里可以看到实现了图像的分割效果，因此这里的自编码可以实现特定区域的风格转换。

b.这些特征在通道上稀疏分布。有价值的响应始终存在于某些特定通道。很可能是这些通道刚好对应于区域特定转移中的特定风格元素。
c.尽管上面提到了最后通道上特征的系数性，通道的选择也不是越小越好，128通道网络收敛的最好且生成的结果也是最好的。

3）content 和 style 分离
下图显示了两个分支的影响，左边是输入图像；中间的是训练时没有auto-encoder分支重建出的图像；右边的是训练时有auto-encoder分支，两个分支，这个右边的结果是直接从auto-encoder分支中重建出来的图像。可以看出没有auto-encoder分支时无法重建出原来的输入图像，而有auto-encoder分支时能够重建出和输入图像很接近的结果，因此内容信息很明显是编码到auto-encoder中了，独立于style，所以这个网络实现了内容和style的分离。

4）content image 如何控制风格转换
a）输入是不同颜色的时候，没有纹理，结果是仅仅颜色转换了，如（b）（f）。
b) 输入是同样的颜色，纹理不同时，转换的结果是：颜色转换+根据输入纹理转换了不同的纹理。
c）输入是不同的颜色，相同的纹理时，结果有相同的转换纹理但是不同的颜色。

网络的能力
1.支持增量训练。
2.风格融合
风格线性融合：多个风格线性融合
特定区域风格融合：不同的图像区域可以被渲染成不同的风格。可以在图像自编码后的特征空间中对特征进行分簇实现不同区域划分。

实验结果：
论文的网络表现更好的基于区域的风格转换，如人肖像。

参考资料：
http://home.ustc.edu.cn/%7Ecd722522/pubs/StyleBank_supplmentary.pdf
http://home.ustc.edu.cn/%7Ecd722522/videos/building.mp4
http://www.tuicool.com/articles/UbyemyJ
http://www.bubuko.com/infodetail-2065280.html
http://www.msra.cn/zh-cn/news/blogs/2017/05/style-transfer-20170524.aspx
论文地址：
https://arxiv.org/pdf/1703.09210.pdf

代码：

展开全文
wyl1987527 2017-07-19 22:12:37
• weixin_34344677 2018-12-25 11:35:00

yanyangxu01 2018-08-25 21:35:52
• a13393665983 2019-10-06 01:11:25
• ## Distilling transformers into simple neural networks with unlabeled transfer data论文解读 自然语言处理 深度学习

liaoshenglan 2020-02-28 17:44:29
• a13393665983 2019-10-06 00:48:48
• c2a2o2 2017-11-23 21:43:12
• ## 《A Survey on Transfer Learning》迁移学习研究综述 翻译 迁移学习

jiruijing123 2019-06-12 22:12:56
• ## Hypertext Transfer Protocol -- HTTP/1.0 服务器 internet http服务器

ddarkelf 2004-09-10 14:34:00
• ## 论文阅读7-----基于强化学习的推荐系统 DRN: A Deep Reinforcement Learning Framework for News ... 深度学习 推荐系统 强化学习 数据挖掘

qq_37227782 2021-01-18 14:38:33
• yuzhonglaomeng 2019-07-27 18:18:53
• ## The Research and Development Of Educational News Website Based On Big Data Analysis mongodb javascript 大数据 java

codylele 2020-07-18 16:06:43

max2008 2007-07-31 11:13:00
• HQ354974212 2017-07-15 21:14:29
• Zen_y 2021-06-03 10:32:30
• ## Mirai僵尸网络+DDoS 攻击+常用端口号大全 物联网

holmeswf 2021-03-15 11:02:46
• SoftpaseFar 2019-10-24 16:43:55
• ## KDD2021| 工业界搜推广nlp论文整理 列表 编程语言 推荐系统 人工智能

Kaiyuan_sjtu 2021-07-14 00:46:40
• ## 超文本传输协议 -- HTTP/1.0 Hyptertext Transfer Protocol 服务器 internet http服务器

wzbob 2006-10-12 14:14:00
• ## #计算机网络#学习笔记-常用端口详解 计算机网络 计算机端口 TCP UDP

u010852540 2019-11-04 10:14:50
• u011130086 2019-03-25 15:35:40
• ## web 安全-电脑端口（全部） web 安全 电脑端口

kclax 2019-05-31 21:49:06

l1028386804 2015-05-18 14:17:26

...