2019-01-01 20:42:14 qq_32925781 阅读数 383
  • 机器学习&深度学习系统实战!

    购买课程后,可扫码进入学习群,获取唐宇迪老师答疑 数学原理推导与案例实战紧密结合,由机器学习经典算法过度到深度学习的世界,结合深度学习两大主流框架Caffe与Tensorflow,选择经典项目实战人脸检测与验证码识别。原理推导,形象解读,案例实战缺一不可!具体课程内容涉及回归算法原理推导、决策树与随机森林、实战样本不均衡数据解决方案、支持向量机、Xgboost集成算法、神经网络基础、神经网络整体架构、卷积神经网络、深度学习框架--Tensorflow实战、案例实战--验证码识别、案例实战--人脸检测。 专属会员卡优惠链接:http://edu.csdn.net/lecturer/1079

    39706 人正在学习 去看看 唐宇迪

深度学习综述

1.简介

1.1机器学习

机器学习并不是一个离大众很遥远的技术,相反,它已经渗透到了当今社会的方方面面,比如说音乐软件的音乐推荐系统、购物网站的商品推荐、社交媒体的内容筛选与推送、智能手机的人脸解锁以及照相机的智能物体识别与美化。机器学习在图像识别、语音识别、推荐系统和搜索引擎等应用领域都表现出令人满意的效果。而近年来,业界开始越来越多的使用一项成为深度学习的技术。

1.2深度学习

传统的机器学习技术在处理原始形式的自然数据时,其性能会受到限制。在过去,设计一个优秀的机器学习系统需要相当多的专业知识和工程技术,通过设计一个较好的特征分类器,将原始数据转换为合适的特征向量,从而让分类器能够检测输入模式。

在介绍深度学习之前,需要先介绍一下表征学习。严格来说,表征学习是一系列方法,这些方法能让机器从原始数据中自动找出那些可以用来分类或是识别的特征。

深度学习就是多层级的表征学习方法。整个深度学习系统由许多简单但非线形的模块组成,在每一层中,模块将输入的表征(数据)转换为更为抽象、更为高级的表征。通过上述转换的各种组合方式,深度学习系统可以学习那些很复杂的特征。

以图像识别举例来说,图像以一个像素值矩阵的形式输入深度学习系统,那么在第一层学习到的特征通常会是图像中特定方向或者位置出现或未出现的边缘。第二层通常会通过特定的边缘分布来检测图案,同时忽视边缘位置的细微变化。第三层则会将这些图案进行组合——这些组合通常与某些物体有关。最后一层则会识别出相对应的物体。这样便完成了一次图像识别。

值得注意的是,在这样一个深度学习系统中,没有任何特征层是由人工设计的,所有的特征层都通过一个通用学习程序从原始数据中得到。这也是深度学习相较于传统机器学习的最大优势。

深度学习与传统的机器学习不同,它不需要异常庞大的工程量,易于上手,同时得益于近年来计算机硬件与大数据的发展,它又拥有充沛的数据与足够的计算能力。随着工业界和学术界对深度学习领域的不断投入,新的算法与架构也会不断推陈出新,这将会大大推动深度学习的发展。

1.3两个范式

目前提到的深度学习,大多将其分为两个范式——即监督学习(supervised learning)和无监督学习(unsupervised learning)。

在机器学习领域,无论“深度”与否,最常见的学习形式就是监督学习。试想构建一个图像分类系统,为了使系统能够区分出人,汽车,狗或是手机,首先需要收集大量人,汽车,狗以及手机的图片,且每张照片都标记了它所属的类别。对于每一张输入图片,这个系统都会给出一个输出向量,这个向量中最大值的位置就对应其所属的类别。我们训练这个系统的目的就是使系统所给出的输出向量的最大值位置就对应输入图片的类别。这就需要计算一个损失函数来表示系统输出值与“正确答案”之间的误差,然后依托某些算法不断地更新系统中的参数,使损失函数的值不断下降,直到这个系统令人满意。

上面这个例子中,机器学习系统的终极目标就是使自己输出尽可能的接近“正确答案”,而这种从始至终都有“正确答案”的机器学习方法就被称为监督学习。之相对应的,无监督学习就是没有正确答案的机器学习方法,无监督学习要求模型能够从原始数据自发挖掘出规律或是分类。

2.监督学习

监督式学习是一个机器学习中的方法,可以从训练数据中学到或建立一个模式,并依此模式来推测新的实例。训练数据是由输入向量和预期输出(标签)所组成的。模型的输出可以是一个连续的值(线性回归),也可以是一个分类标签(分类问题)。

在深度学习领域的监督学习主要集中在深度神经网络方向,具体有DNN、RNN和CNN等类型。

2.1深度神经网络(DNN)

目前提到的大多数深度学习其实是深度神经网络的简称。通常来讲,具有一到两个隐藏层的神经网络就能被称为多层神经网络(或称为浅层神经网络),而随着隐藏层的增多,一般超过五层的神经网络就被称为深度神经网络(DNN)。

因此,“深度”更像是一个商业上的概念,不用特别在意其具体层数。在深度学习领域,有超过五层隐藏层的就可以成为深度网络。

深度神经网络主要分为最为普通的神经网络(DNN)、卷积神经网络(CNN)、循环神经网络(RNN)以及递归神经网络(RNN)等类别。由于CNN与RNN的结构一般都很深,所以不用特别指明它们是“深度”的,而有一些网络则需要特别指明。

2.2卷积神经网络(CNN)

卷积神经网络可能最为大家熟知的神经网络,它在一定意义上借鉴了生物学上的神经网络,与人类的视觉系统有异曲同工之妙,而它也因此在图像识别领域大放异彩。

卷积网络最显著的特点便是在多个空间位置上共享参数,这是通过一系列的卷积与池化完成的。卷积运算可以实现参数共享和稀疏矩阵相乘,它不需要像普通DNN一样针对每个输入数据提供一个单独的参数,而只需要提取一定范围(在图像识别中这一范围通常为矩阵)内的公共数据。再加之以CNN特有的池化层,大大降低了输入数据的维度。

下图是一个用CNN进行手写数字识别的可视化结果(http://scs.ryerson.ca/~aharley/vis/conv/),从图中可以看出,CNN的卷积、池化等一系列过程压缩了图片的维度,但又在一定程度上保留了不同区域数据的联系。

在这里插入图片描述

权值共享、本地连接、池化以及多层结构是卷积网络网络标志性的四个特征,这让它特别适合处理结构化的数据(比如矩阵)。人们总是将CNN与图像识别联系在一起,但其实它适用于任何拥有结构化数据的领域。

2.3递归神经网络与循环神经网络(RNN)

2.3.1普通RNN

虽然递归神经网络与循环神经网络的简称都是RNN,但他们之间还是有很多区别的。当然它们都适合处理有序列的问题,比如说语音分析,文字分析以及时间序列分析。在预测股票走势的时候,就会选择RNN。因为每天的股票价格与昨天,上周,上月,去年都有关系。

当一段序列输入到RNN中时,每个时刻RNN只会处理序列中的一个元素。每个元素都会保留在RNN隐藏单元中被称为“状态向量”的地方,单元内暗含了序列当前元素之前所有序列的历史记录。下面这幅图(http://colah.github.io/posts/2015-08-Understanding-LSTMs/)很直观的展现了一个展开的RNN结构,

在这里插入图片描述

这种链式的性质很容易让人联想到序列以及链表。而这也恰恰是RNN所应用的领域:语音识别、语言建模、翻译以及图片标签化。准确来说,应用最为广泛也最为成功的其实是一类特殊的RNN——LSTMs。普通的RNN并不能很好的“记住”长时间的信息(Hochreiter(1991)和Bengio(1994)挖掘了这种缺陷的原因)并将其联系到当前任务,而LSTMs则解决这个“长期及远距离依赖关系”的问题。

2.3.2LSTMs

LSTMs(Long Short Term Memory networks),被称为“长短期记忆网络”,它在诞生伊始就适合适合于处理和预测时间序列中间隔和延迟非常长的重要事件。和所有的RNN一样,LSTMs也拥有链式结构,唯一不同的是其单个模块有着更为复杂的结构——比如说下图(http://colah.github.io/posts/2015-08-Understanding-LSTMs/)的LSTMs的每个单元都有四层神经网络层,而普通RNN则只有一层神经网络。

在这里插入图片描述
LSTMs天生具有长期记忆输入数据的性质,它的每个单元就像是一个累加器或说是门控神经元:它在下一个时刻与自己相连,既复制了了自己的实时状态同时也累积外部信号,但这种自我连接是由一个乘法门控单元来进行控制的,这个乘法门控可以决定保留或是清除“记忆”中的信息。

LSTMs具有非常多的变种,许多论文中的LSTMs都会有微小的差别,这些变种在适用于某些特定的任务时可能会有更好的表现,但面对大部分任务时它们都取得了非常出色的效果。

3.无监督学习

众所周知,机器学习的发展历程几经波折,历史上曾经掀起过几次机器学习热潮而又归于平静。近年来的机器学习的火热并不是完全靠前面介绍的深度神经网络带动,无监督学习也在其中起到了催化的作用。人类和动物的学习过程很大意义上就是无监督的,我们观察这个世界,接触这个世界,而不是用一堆贴上正确标签的数据来训练我们的大脑(后天的教育可能会采用这种方式)。因此,人们都认为无监督学习是深度学习未来发展的趋势,甚至是通往真正人工智能的一条道路。

与使用正确标签训练数据的监督学习相比,无监督学习的模型必须自主学习数据集中元素之间的关系,并且在没有“帮助”的情况下对原始数据进行分类。无监督学习中有不同形式的算法来学习这种关系,但所有的模型都具有发现数据间隐藏结构、模式或特征的能力——而这也正是人类大脑思考的模式。

无监督学习中常用的算法有聚类分析、异常检测、神经网络等。由于本篇文章着眼于深度学习,所以主要会介绍无监督学习中与深度学习相关的模型或算法。

3.1深度置信网络(DBN)

深度置信网络(DBN)是一种生成图形模型,它也属于深度神经网络,由多层的隐藏单元组成,各层之间有连接但每层内部的单元之间没有连接。

当在没有监督的情况下训练一组示例时,DBN可以学习尽可能得重建模型输入,然后将这些层充当特征检测器。在完成该步骤之后,可以使用传统的神经网络进一步学习。

DBN可以被视为简单的无监督网络的组合,例如受限玻尔兹曼机(RBM)或自动编码器,其中每个子网络的隐藏层用作下一个可见层。RBM是一种无向的,基于能量的生成模型,具有“可见”输入层和隐藏层以及层之间的连接。

总的来说,DBN更多是了解深度学习方式和其“思考模式”的一个手段,实际应用中还是以CNN和RNN为主,类似的深度玻尔兹曼机也有这样的特性但工业界使用较少。

3.2自编码器(Auto-encode)

自编码器是一种人工神经网络,它以无监督学习的方式学习有效的数据编码。自动编码器的目的是学习一组数据的表示(编码),通常可以用于降低维数。最近,自编码器概念已经被广泛用于学习数据的生成模型。

在这里插入图片描述
如上图所示,自编码器主要由两个部分组成:一个是编码器(Encoder),另一个则是解码器(Decoder)。以图片为例,当含有数字“2”的图片从左端输入后,经过了编码器与解码器,得到了一个略有不同的新的数字“2”。事实上,自编码器真正学习的是中间的用红色方框框起来的部分——即数字在低维度上的压缩表示。评估自编码器的方法是重建误差,即输出的数字“2”和输入的数字“2”之间的误差,误差值越小越好。

和主成分分析(PCA)类似,自编码器也可以用来进行数据压缩(也可以理解成降维),从而从原始数据中提取出较为重要的特征。上图中输出端的数字与输入端的略有不同,这是降维过程中正常存在的数据损失。

3.3生成式对抗网络(GAN)

生成式对抗网络(GAN)是一类用于无监督学习的算法,它的核心思想来源于博弈论。GAN由在零和游戏框架中相互竞争的两个神经网络系统实现。GAN的算法思想可以用下图(https://sthalles.github.io/assets/dcgan/GANs.png)来表示

在这里插入图片描述
由上图可以看出,GAN会训练两个神经网络:左边那个为生成网络,通过添加一些噪声,生成一些与训练数据相类似的图片;右边为判别网络,用来区分生成网络中输出的数据是真实数据还是伪装数据。左右两个网络在“对抗中”不断学习,不断调整参数,其最终目的是使判决网络无法判断生成网络的的输出是否真实。

GAN的算法思想会让人们觉得这是一个复杂且低效的模型,但在实际应用中则超出了人们的预期。GAN能够更有效的学习到数据的分布,同时需要的参数也比一般神经网络要少。

4.展望

未来的深度学习

  • 建立更接近通用计算机程序的模型,建立在比起现在更为富余的可微层之上——更为接近人类的抽象与思维过程,这也正是目前深度学习的最大弱点。
  • 建立一个不再需要工程师过多参与的模型,避免无休无止的调参。
  • 系统化的复用子结构与子模块,尝试构建基于可重用和模块化程序的元学习系统。

值得注意的是上述发展趋势并不局限于深度学习领域,它同样适用于自监督学习、强化学习等其他机器学习领域。

4.1模型即程序

这是一种脱胎于传统纯模式识别的模型,它能够实现抽象和推理,从而完成对各种数据更高维度的抽象和泛化(类似于人类大脑)。目前能够进行基本推理形式的人工智能程序都是由程序员来编写的,比如说依赖于搜索算法、图形操作、形式逻辑的程序。

由DeepMind开发的AlphaGo在2016年战胜人类围棋手李世石,但其大部分“智能”功能都是由专业程序员设计并实现的(例如蒙特卡罗搜索树),而其向外宣称的深度学习只存在于专门的子模块中(比如价值网络或决策网络)。但是在未来,这种人工智能系统很可能完全由机器自主学习,没有人类参与。

这类系统使用可重复使用的模块化组件,这些组件是由从数千个任务和数据集上学习到的高性能模型抽象而来。这类元学习系统可以直接识别出问题解决模式,并通过调用不同的子模块,从而完成了问题的解决方案(这一点与软件工程中的思路非常接近,只不过实现的主体是机器而不是人)。

4.2不止是反向传播

当我们的模型变得更像程序时,它们不再可微。这些程序仍然会使用可区分的几何图层作为子程序,但整个模型不会如此。因此,通过反向传播算法来调整网络中的权重值不能解决所有问题。反向传播算法在解决可微问题时仍然是一大利器,但是未来的模型不会局限于可微问题,因此它们的学习过程需要的不仅仅是反向传播。

目前能有效训练不可微微分模型的方法主要有遗传算法、“进化策略”、某些强化学习方法以及ADMM(乘数的交替方向法)。

4.3自动化学习

目前,深度学习工程师的大部分工作都是在编写Python脚本进行数据整合,然后对深度网络的体系结构和超参数进行长时间调整以获得理想的工作模型。这显然不是一个很好的选择,我们希望机器能够自动完成这些工作。不幸的是,参数调整部分很难自动化,因为它通常需要专业领域知识以及对实现目标清晰而高级的理解。

亲手实现过现有深度学习模型的人一定知道:一旦模型的架构发生很细微的变化,整个训练过程就需要推倒重来,这显然是非常低效的。所以另一个重要的自动化机器学习方向是系统能够在学习权重的同时也学习模型架构。这个系统将在通过训练数据调整模型参数的同时改善模型架构,因此尽可能的消除计算冗余。

自动化学习的到来并不会让机器学习工程师的工作消失。相反,工程师将向价值创造链的上游移动。他们可以投入更多精力来制作真正反映业务目标的复杂损失函数,并深入了解模型如何影响他们所部署的数字生态系统。

2019-12-17 14:37:33 Together_CZ 阅读数 135
  • 机器学习&深度学习系统实战!

    购买课程后,可扫码进入学习群,获取唐宇迪老师答疑 数学原理推导与案例实战紧密结合,由机器学习经典算法过度到深度学习的世界,结合深度学习两大主流框架Caffe与Tensorflow,选择经典项目实战人脸检测与验证码识别。原理推导,形象解读,案例实战缺一不可!具体课程内容涉及回归算法原理推导、决策树与随机森林、实战样本不均衡数据解决方案、支持向量机、Xgboost集成算法、神经网络基础、神经网络整体架构、卷积神经网络、深度学习框架--Tensorflow实战、案例实战--验证码识别、案例实战--人脸检测。 专属会员卡优惠链接:http://edu.csdn.net/lecturer/1079

    39706 人正在学习 去看看 唐宇迪

       推荐系统在我们日常生活中发挥着非常重要的作用,相信实际从事过推荐相关的工程项目的人或多或少都会看多《推荐系统实战》这本书,我也是读者之一,个人感觉对于推荐系统的入门来说这本书籍还是不错的资料。很多商场、大厂的推荐系统都是很复杂也是很强大的,大多是基于深度学习来设计强有力的计算系统,本文是笔者在公司实践项目中实际做过的推荐系统实践经验分享。技术层面主要从机器学习和深度学习两个方面来分别进行讲解。

       其中,机器学习部分主要是基于surprise模块来实现图书推荐系统和电影的推荐系统设计与实现;深度学习部分主要是基于神经网络推荐模型来实现音乐数据推荐。

       本文所用到的数据集可以从下面的链接处下载:

https://download.csdn.net/download/together_cz/10916350

       关于surprise模块的相关介绍和实例可以参考下面的链接:

https://surprise.readthedocs.io/en/stable/getting_started.html

      首页截图如下所示:

         使用surprise来加载自己的数据集首先要定义一个数据读取器来格式化数据,简单的数据读取器代码实现如下:

#构建读取器
reader=Reader(line_format=data_format,sep=sep)
mydata=Dataset.load_from_file(data_path,reader=reader)

         图书推荐系统设计示意图如下所示:

     从上方的数据集连接处下载得到数据集后,我们对book.csv进行简单的查看,下面是部分数据的结果:

id,book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,title,language_code,average_rating,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
1,2767052,2767052,2792775,272,439023483,9.78043902348e+12,Suzanne Collins,2008.0,The Hunger Games,"The Hunger Games (The Hunger Games, #1)",eng,4.34,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m/2767052.jpg,https://images.gr-assets.com/books/1447303603s/2767052.jpg
2,3,3,4640799,491,439554934,9.78043955493e+12,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,"Harry Potter and the Sorcerer's Stone (Harry Potter, #1)",eng,4.44,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m/3.jpg,https://images.gr-assets.com/books/1474154022s/3.jpg
3,41865,41865,3212258,226,316015849,9.78031601584e+12,Stephenie Meyer,2005.0,Twilight,"Twilight (Twilight, #1)",en-US,3.57,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m/41865.jpg,https://images.gr-assets.com/books/1361039443s/41865.jpg
4,2657,2657,3275794,487,61120081,9.78006112008e+12,Harper Lee,1960.0,To Kill a Mockingbird,To Kill a Mockingbird,eng,4.25,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m/2657.jpg,https://images.gr-assets.com/books/1361975680s/2657.jpg
5,4671,4671,245494,1356,743273567,9.78074327356e+12,F. Scott Fitzgerald,1925.0,The Great Gatsby,The Great Gatsby,eng,3.89,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m/4671.jpg,https://images.gr-assets.com/books/1490528560s/4671.jpg
6,11870085,11870085,16827462,226,525478817,9.78052547881e+12,John Green,2012.0,The Fault in Our Stars,The Fault in Our Stars,eng,4.26,2346404,2478609,140739,47994,92723,327550,698471,1311871,https://images.gr-assets.com/books/1360206420m/11870085.jpg,https://images.gr-assets.com/books/1360206420s/11870085.jpg
7,5907,5907,1540236,969,618260307,9.7806182603e+12,J.R.R. Tolkien,1937.0,The Hobbit or There and Back Again,The Hobbit,en-US,4.25,2071616,2196809,37653,46023,76784,288649,665635,1119718,https://images.gr-assets.com/books/1372847500m/5907.jpg,https://images.gr-assets.com/books/1372847500s/5907.jpg
8,5107,5107,3036731,360,316769177,9.78031676917e+12,J.D. Salinger,1951.0,The Catcher in the Rye,The Catcher in the Rye,eng,3.79,2044241,2120637,44920,109383,185520,455042,661516,709176,https://images.gr-assets.com/books/1398034300m/5107.jpg,https://images.gr-assets.com/books/1398034300s/5107.jpg
9,960,960,3338963,311,1416524797,9.78141652479e+12,Dan Brown,2000.0,Angels & Demons ,"Angels & Demons  (Robert Langdon, #1)",en-CA,3.85,2001311,2078754,25112,77841,145740,458429,716569,680175,https://images.gr-assets.com/books/1303390735m/960.jpg,https://images.gr-assets.com/books/1303390735s/960.jpg
10,1885,1885,3060926,3455,679783261,9.78067978327e+12,Jane Austen,1813.0,Pride and Prejudice,Pride and Prejudice,eng,4.24,2035490,2191465,49152,54700,86485,284852,609755,1155673,https://images.gr-assets.com/books/1320399351m/1885.jpg,https://images.gr-assets.com/books/1320399351s/1885.jpg
11,77203,77203,3295919,283,1594480001,9.78159448e+12,Khaled Hosseini,2003.0,The Kite Runner ,The Kite Runner,eng,4.26,1813044,1878095,59730,34288,59980,226062,628174,929591,https://images.gr-assets.com/books/1484565687m/77203.jpg,https://images.gr-assets.com/books/1484565687s/77203.jpg
12,13335037,13335037,13155899,210,62024035,9.78006202404e+12,Veronica Roth,2011.0,Divergent,"Divergent (Divergent, #1)",eng,4.24,1903563,2216814,101023,36315,82870,310297,673028,1114304,https://images.gr-assets.com/books/1328559506m/13335037.jpg,https://images.gr-assets.com/books/1328559506s/13335037.jpg
13,5470,5470,153313,995,451524934,9.78045152494e+12,"George Orwell, Erich Fromm, Celâl Üster",1949.0,Nineteen Eighty-Four,1984,eng,4.14,1956832,2053394,45518,41845,86425,324874,692021,908229,https://images.gr-assets.com/books/1348990566m/5470.jpg,https://images.gr-assets.com/books/1348990566s/5470.jpg
14,7613,7613,2207778,896,452284244,9.78045228424e+12,George Orwell,1945.0,Animal Farm: A Fairy Story,Animal Farm,eng,3.87,1881700,1982987,35472,66854,135147,433432,698642,648912,https://images.gr-assets.com/books/1424037542m/7613.jpg,https://images.gr-assets.com/books/1424037542s/7613.jpg
15,48855,48855,3532896,710,553296981,9.78055329698e+12,"Anne Frank, Eleanor Roosevelt, B.M. Mooyaart-Doubleday",1947.0,Het Achterhuis: Dagboekbrieven 14 juni 1942 - 1 augustus 1944,The Diary of a Young Girl,eng,4.1,1972666,2024493,20825,45225,91270,355756,656870,875372,https://images.gr-assets.com/books/1358276407m/48855.jpg,https://images.gr-assets.com/books/1358276407s/48855.jpg
16,2429135,2429135,1708725,274,307269752,9.78030726975e+12,"Stieg Larsson, Reg Keeland",2005.0,Män som hatar kvinnor,"The Girl with the Dragon Tattoo (Millennium, #1)",eng,4.11,1808403,1929834,62543,54835,86051,285413,667485,836050,https://images.gr-assets.com/books/1327868566m/2429135.jpg,https://images.gr-assets.com/books/1327868566s/2429135.jpg
17,6148028,6148028,6171458,201,439023491,9.7804390235e+12,Suzanne Collins,2009.0,Catching Fire,"Catching Fire (The Hunger Games, #2)",eng,4.3,1831039,1988079,88538,10492,48030,262010,687238,980309,https://images.gr-assets.com/books/1358273780m/6148028.jpg,https://images.gr-assets.com/books/1358273780s/6148028.jpg
18,5,5,2402163,376,043965548X,9.78043965548e+12,"J.K. Rowling, Mary GrandPré, Rufus Beck",1999.0,Harry Potter and the Prisoner of Azkaban,"Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)",eng,4.53,1832823,1969375,36099,6716,20413,166129,509447,1266670,https://images.gr-assets.com/books/1499277281m/5.jpg,https://images.gr-assets.com/books/1499277281s/5.jpg
19,34,34,3204327,566,618346252,9.78061834626e+12,J.R.R. Tolkien,1954.0, The Fellowship of the Ring,"The Fellowship of the Ring (The Lord of the Rings, #1)",eng,4.34,1766803,1832541,15333,38031,55862,202332,493922,1042394,https://images.gr-assets.com/books/1298411339m/34.jpg,https://images.gr-assets.com/books/1298411339s/34.jpg
20,7260188,7260188,8812783,239,439023513,9.78043902351e+12,Suzanne Collins,2010.0,Mockingjay,"Mockingjay (The Hunger Games, #3)",eng,4.03,1719760,1870748,96274,30144,110498,373060,618271,738775,https://images.gr-assets.com/books/1358275419m/7260188.jpg,https://images.gr-assets.com/books/1358275419s/7260188.jpg
21,2,2,2809203,307,439358078,9.78043935807e+12,"J.K. Rowling, Mary GrandPré",2003.0,Harry Potter and the Order of the Phoenix,"Harry Potter and the Order of the Phoenix (Harry Potter, #5)",eng,4.46,1735368,1840548,28685,9528,31577,180210,494427,1124806,https://images.gr-assets.com/books/1387141547m/2.jpg,https://images.gr-assets.com/books/1387141547s/2.jpg
22,12232938,12232938,1145090,183,316166685,9.78031616668e+12,Alice Sebold,2002.0,The Lovely Bones,The Lovely Bones,eng,3.77,1605173,1661562,36642,62777,131188,404699,583575,479323,https://images.gr-assets.com/books/1457810586m/12232938.jpg,https://images.gr-assets.com/books/1457810586s/12232938.jpg
23,15881,15881,6231171,398,439064864,9.78043906487e+12,"J.K. Rowling, Mary GrandPré",1998.0,Harry Potter and the Chamber of Secrets,"Harry Potter and the Chamber of Secrets (Harry Potter, #2)",eng,4.37,1779331,1906199,34172,8253,42251,242345,548266,1065084,https://images.gr-assets.com/books/1474169725m/15881.jpg,https://images.gr-assets.com/books/1474169725s/15881.jpg
24,6,6,3046572,332,439139600,9.7804391396e+12,"J.K. Rowling, Mary GrandPré",2000.0,Harry Potter and the Goblet of Fire,"Harry Potter and the Goblet of Fire (Harry Potter, #4)",eng,4.53,1753043,1868642,31084,6676,20210,151785,494926,1195045,https://images.gr-assets.com/books/1361482611m/6.jpg,https://images.gr-assets.com/books/1361482611s/6.jpg
25,136251,136251,2963218,263,545010225,9.78054501022e+12,"J.K. Rowling, Mary GrandPré",2007.0,Harry Potter and the Deathly Hallows,"Harry Potter and the Deathly Hallows (Harry Potter, #7)",eng,4.61,1746574,1847395,51942,9363,22245,113646,383914,1318227,https://images.gr-assets.com/books/1474171184m/136251.jpg,https://images.gr-assets.com/books/1474171184s/136251.jpg
26,968,968,2982101,350,307277674,9.78030727767e+12,Dan Brown,2003.0,The Da Vinci Code,"The Da Vinci Code (Robert Langdon, #2)",eng,3.79,1447148,1557292,41560,71345,126493,340790,539277,479387,https://images.gr-assets.com/books/1303252999m/968.jpg,https://images.gr-assets.com/books/1303252999s/968.jpg
27,1,1,41335427,275,439785960,9.78043978597e+12,"J.K. Rowling, Mary GrandPré",2005.0,Harry Potter and the Half-Blood Prince,"Harry Potter and the Half-Blood Prince (Harry Potter, #6)",eng,4.54,1678823,1785676,27520,7308,21516,136333,459028,1161491,https://images.gr-assets.com/books/1361039191m/1.jpg,https://images.gr-assets.com/books/1361039191s/1.jpg
28,7624,7624,2766512,458,140283331,9.78014028333e+12,William Golding,1954.0,Lord of the Flies ,Lord of the Flies,eng,3.64,1605019,1671484,26886,92779,160295,425648,564916,427846,https://images.gr-assets.com/books/1327869409m/7624.jpg,https://images.gr-assets.com/books/1327869409s/7624.jpg
29,18135,18135,3349450,1937,743477111,9.78074347712e+12,"William Shakespeare, Robert           Jackson",1595.0,An Excellent conceited Tragedie of Romeo and Juliet,Romeo and Juliet,eng,3.73,1628519,1672889,14778,57980,153179,452673,519822,489235,https://images.gr-assets.com/books/1327872146m/18135.jpg,https://images.gr-assets.com/books/1327872146s/18135.jpg
30,8442457,19288043,13306276,196,297859382,9.78029785938e+12,Gillian Flynn,2012.0,Gone Girl,Gone Girl,eng,4.03,512475,1626519,121614,38874,80807,280331,616031,610476,https://images.gr-assets.com/books/1339602131m/8442457.jpg,https://images.gr-assets.com/books/1339602131s/8442457.jpg
31,4667024,4667024,4717423,183,399155341,9.78039915534e+12,Kathryn Stockett,2009.0,The Help,The Help,eng,4.45,1531753,1603545,78204,10235,25117,134887,490754,942552,https://images.gr-assets.com/books/1346100365m/4667024.jpg,https://images.gr-assets.com/books/1346100365s/4667024.jpg
32,890,890,40283,373,142000671,9.78014200067e+12,John Steinbeck,1937.0,Of Mice and Men ,Of Mice and Men,eng,3.84,1467496,1518741,24642,46630,110856,355169,532291,473795,https://images.gr-assets.com/books/1437235233m/890.jpg,https://images.gr-assets.com/books/1437235233s/890.jpg
33,930,929,1558965,220,739326228,9.78073932622e+12,Arthur Golden,1997.0,Memoirs of a Geisha,Memoirs of a Geisha,eng,4.08,1300209,1418172,25605,23500,59033,258700,517157,559782,https://s.gr-assets.com/assets/nophoto/book/111x148-bcc042a9c91a29c1d680899eff700a03.png,https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png
34,10818853,10818853,15732562,169,1612130291,9.78161213029e+12,E.L. James,2011.0,Fifty Shades of Grey,"Fifty Shades of Grey (Fifty Shades, #1)",eng,3.67,1338493,1436818,75437,165455,152293,252185,294976,571909,https://images.gr-assets.com/books/1385207843m/10818853.jpg,https://images.gr-assets.com/books/1385207843s/10818853.jpg
35,865,865,4835472,458,61122416,9.78006112242e+12,"Paulo Coelho, Alan R. Clarke",1988.0,O Alquimista,The Alchemist,eng,3.82,1299566,1403995,55781,74846,123614,289143,412180,504212,https://images.gr-assets.com/books/1483412266m/865.jpg,https://images.gr-assets.com/books/1483412266s/865.jpg
36,3636,3636,2543234,192,385732554,9.78038573255e+12,Lois Lowry,1993.0,The Giver,"The Giver (The Giver, #1)",eng,4.12,1296825,1345445,54084,26497,59652,225326,448691,585279,https://images.gr-assets.com/books/1342493368m/3636.jpg,https://images.gr-assets.com/books/1342493368s/3636.jpg
37,100915,100915,4790821,474,60764899,9.78006076489e+12,C.S. Lewis,1950.0,"The Lion, the Witch and the Wardrobe","The Lion, the Witch, and the Wardrobe (Chronicles of Narnia, #1)",eng,4.19,1531800,1584884,15186,19309,55542,262038,513366,734629,https://images.gr-assets.com/books/1353029077m/100915.jpg,https://images.gr-assets.com/books/1353029077s/100915.jpg
38,14050,18619684,2153746,167,965818675,9.78096581867e+12,Audrey Niffenegger,2003.0,The Time Traveler's Wife,The Time Traveler's Wife,eng,3.95,746287,1308667,43382,44339,85429,257805,427210,493884,https://images.gr-assets.com/books/1437728815m/14050.jpg,https://images.gr-assets.com/books/1437728815s/14050.jpg
39,13496,13496,1466917,101,553588486,9.78055358848e+12,George R.R. Martin,1996.0,A Game of Thrones,"A Game of Thrones (A Song of Ice and Fire, #1)",eng,4.45,1319204,1442220,46205,19988,28983,114092,404583,874574,https://images.gr-assets.com/books/1436732693m/13496.jpg,https://images.gr-assets.com/books/1436732693s/13496.jpg
40,19501,19501,3352398,185,143038419,9.78014303841e+12,Elizabeth Gilbert,2006.0,"Eat, pray, love: one woman's search for everything across Italy, India and Indonesia","Eat, Pray, Love",eng,3.51,1181647,1206597,49714,100373,149549,310212,332191,314272,https://images.gr-assets.com/books/1503066414m/19501.jpg,https://images.gr-assets.com/books/1503066414s/19501.jpg
41,28187,28187,3346751,159,786838655,9.78078683865e+12,Rick Riordan,2005.0,The Lightning Thief,"The Lightning Thief (Percy Jackson and the Olympians, #1)",eng,4.23,1366265,1411114,46006,18303,48294,219638,435514,689365,https://images.gr-assets.com/books/1400602609m/28187.jpg,https://images.gr-assets.com/books/1400602609s/28187.jpg
42,1934,1934,3244642,1707,451529308,9.78045152930e+12,Louisa May Alcott,1868.0,Little Women,"Little Women (Little Women, #1)",en-US,4.04,1257121,1314293,17090,31645,70011,250794,426280,535563,https://s.gr-assets.com/assets/nophoto/book/111x148-bcc042a9c91a29c1d680899eff700a03.png,https://s.gr-assets.com/assets/nophoto/book/50x75-a91bf249278a81aabab721ef782c4a74.png
43,10210,10210,2977639,2568,142437204,9.78014243721e+12,"Charlotte Brontë, Michael Mason",1847.0,Jane Eyre,Jane Eyre,eng,4.1,1198557,1286135,31212,35132,64274,212294,400214,574221,https://images.gr-assets.com/books/1327867269m/10210.jpg,https://images.gr-assets.com/books/1327867269s/10210.jpg
44,15931,15931,1498135,190,553816713,9.78055381672e+12,Nicholas Sparks,1996.0,The Notebook,"The Notebook (The Notebook, #1)",eng,4.06,1053403,1076749,17279,41395,63432,176469,298259,497194,https://images.gr-assets.com/books/1385738917m/15931.jpg,https://images.gr-assets.com/books/1385738917s/15931.jpg
45,4214,4214,1392700,264,770430074,9.78077043008e+12,Yann Martel,2001.0,Life of Pi,Life of Pi,,3.88,1003228,1077431,42962,39768,74331,218702,384164,360466,https://images.gr-assets.com/books/1320562005m/4214.jpg,https://images.gr-assets.com/books/1320562005s/4214.jpg
46,43641,43641,3441236,128,1565125606,9.78156512560e+12,Sara Gruen,2006.0,Water for Elephants,Water for Elephants,eng,4.07,1068146,1108839,55732,16705,49832,200154,417328,424820,https://images.gr-assets.com/books/1494428973m/43641.jpg,https://images.gr-assets.com/books/1494428973s/43641.jpg
47,19063,19063,878368,251,375831002,9.780375831e+12,Markus Zusak,2005.0,The Book Thief,The Book Thief,eng,4.36,1159741,1287798,93611,17892,35360,135272,377218,722056,https://images.gr-assets.com/books/1390053681m/19063.jpg,https://images.gr-assets.com/books/1390053681s/19063.jpg
48,4381,4381,1272463,507,307347974,9.78030734798e+12,Ray Bradbury,1953.0,Fahrenheit 451,Fahrenheit 451,spa,3.97,570498,1176240,30694,28366,64289,238242,426292,419051,https://images.gr-assets.com/books/1351643740m/4381.jpg,https://images.gr-assets.com/books/1351643740s/4381.jpg
49,49041,49041,3203964,194,316160199,9.78031616019e+12,Stephenie Meyer,2006.0,"New Moon (Twilight, #2)","New Moon (Twilight, #2)",eng,3.52,1149630,1199000,44020,102837,160660,294207,290612,350684,https://images.gr-assets.com/books/1361039440m/49041.jpg,https://images.gr-assets.com/books/1361039440s/49041.jpg

          整体的设计思想很简单,并没有很难理解的节点,接下来看具体的代码实现:

def bookRecommendSystem(map_data='book.csv',train_data='rating.csv',data_format='book user rating',sep=',',flag='SVD',k=10):
    '''
    图书推荐系统
    '''
    id_name_dic,name_id_dic=bookDataMapping(map_data)
    myModel,dataset=buildModel(data_path=train_data,data_format=data_format,sep=sep,flag=flag)
    print '==================model Training Finished========================'
    performace=evaluationModel(myModel,dataset)
    print '==================model performace==================='
    print performace
    current_playlist_id='1239'
    print u'当前的用户id:'+current_playlist_id
    current_playlist_name=id_name_dic[current_playlist_id]
    print u'当前的书籍名称:'+current_playlist_name
    playlist_inner_id=myModel.trainset.to_inner_uid(current_playlist_id)
    print u'当前的用户内部id:'+str(playlist_inner_id)
    #以10个用户为基准推荐
    playlist_neighbors=myModel.get_neighbors(playlist_inner_id,k=k)
    playlist_neighbors_id=(myModel.trainset.to_raw_uid(inner_id) for inner_id in playlist_neighbors)
    playlist_neighbors_name=(id_name_dic[playlist_id] for playlist_id in playlist_neighbors_id)
    print("和用户<", current_playlist_name, '> 最接近的10本书为:\n')
    for playlist_name in playlist_neighbors_name:
        print(playlist_name, name_id_dic[playlist_name])

          上面的函数实现了图书推荐系统,相应的注释都在里面,就不多解释了,下面对其中几个关键的函数实现进行说明。

           模型初始化模块:

def initModel(flag='NormalPredictor'):
    '''
    多种推荐算法对比使用
    '''
    if flag=='NormalPredictor':  #使用NormalPredictor
        return NormalPredictor()
    elif flag=='BaselineOnly':  #使用BaselineOnly
        return BaselineOnly()
    elif flag=='KNNBasic':  #使用基础版协同过滤
        return KNNBasic()
    elif flag=='KNNWithMeans':  #使用均值协同过滤
        return KNNWithMeans()
    elif flag=='KNNBaseline':  #使用协同过滤baseline
        return KNNBaseline()
    elif flag=='SVD':  #使用SVD
        return SVD()
    elif flag=='SVDpp':  #使用SVD++
        return SVDpp()
    elif flag=='NMF':  #使用NMF
        return NMF()
    else:
        return SVD()

          推荐系统模型构建模块:

def buildModel(data_path='rating.csv',data_format='user item rating',sep=',',flag='KNNBasic'):
    '''
    推荐系统模型
    '''
    #构建读取器
    reader=Reader(line_format=data_format,sep=sep)
    mydata=Dataset.load_from_file(data_path,reader=reader)
    #计算书籍之间的相似度
    train_set=mydata.build_full_trainset()
    print '================model training================'
    model=initModel(flag=flag)
    model.fit(train_set)
    return model,mydata

        数据集映射关系构建模块:

def bookDataMapping(data_path='book.csv'):
    '''
    加载原始的 "id,name" 数据来构建字典映射
    '''
    csv_reader=csv.reader(open(data_path))
    id_name_dic,name_id_dic={},{}
    for row in csv_reader:
        id_name_dic[row[0]]=row[10]
        name_id_dic[row[10]]=row[0]
    return id_name_dic, name_id_dic

        简单调用如下所示:

bookRecommendSystem(map_data='book.csv',train_data='RRR.csv',data_format='user item rating',sep=',',flag='KNNBasic',k=10)

         默认采用了KNN算法,五折交叉验证计算,具体的结果输出如下:

------------
Fold 1
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9211
MAE:  0.7108
FCP:  0.7038
------------
Fold 2
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9211
MAE:  0.7093
FCP:  0.6996
------------
Fold 3
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9234
MAE:  0.7133
FCP:  0.7010
------------
Fold 4
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9210
MAE:  0.7119
FCP:  0.7017
------------
Fold 5
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9268
MAE:  0.7167
FCP:  0.6983
------------
------------
Mean RMSE: 0.9227
Mean MAE : 0.7124
Mean FCP : 0.7009
------------
------------
==================model performace===================
defaultdict(<type 'list'>, {u'fcp': [0.703847835793307, 0.6995619798679573, 0.7009530691688108, 0.7017142119961722, 0.6982634284783771], u'mae': [0.7107821167817494, 0.7093057204220446, 0.7132818148803571, 0.7119004793330316, 0.7167381500990199], u'rmse': [0.921100168545926, 0.9210542860057216, 0.9234120678271927, 0.9209873056186509, 0.9267740800608146]})
当前的用户id:1239
当前的书籍名称:Chronicle of a Death Foretold
当前的用户内部id:537
(u'\u548c\u7528\u6237<', 'Chronicle of a Death Foretold', u'> \u6700\u63a5\u8fd1\u768410\u672c\u4e66\u4e3a\uff1a\n')
('Frostbite (Vampire Academy, #2)', '384')
('The Call of the Wild', '375')
('The Knife of Never Letting Go (Chaos Walking, #1)', '1050')
('The Neverending Story', '877')
('Lord of the Flies', '28')
('Olive Kitteridge', '930')
('Twenty Thousand Leagues Under the Sea', '699')
("1st to Die (Women's Murder Club, #1)", '336')
('The Big Short: Inside the Doomsday Machine', '985')
('The Black Echo (Harry Bosch, #1; Harry Bosch Universe, #1)', '902')

        上述工作完成了从原始数据解析处理到最终的图书推荐系统的构建整个工作,接下来我基于Suprise内置的电影数据集来构建一个电影推荐系统,其实本质上来说,这里的图书推荐系统和电影推荐系统在技术上是极为相似的,区别只是在于数据集类型的不同,下面我们来看一下电影推荐系统。

         这里数据集使用的是内置的数据集ml-100k,这个数据集需要的话可以直接去网上进行搜索的,数据集截图如下所示:

       其中,u.data部分数据示例如下所示:

196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596
298	474	4	884182806
115	265	2	881171488
253	465	5	891628467
305	451	3	886324817
6	86	3	883603013
62	257	2	879372434
286	1014	5	879781125
200	222	5	876042340
210	40	3	891035994
224	29	3	888104457
303	785	3	879485318
122	387	5	879270459
194	274	2	879539794
291	1042	4	874834944
234	1184	2	892079237
119	392	4	886176814
167	486	4	892738452
299	144	4	877881320
291	118	2	874833878
308	1	4	887736532
95	546	2	879196566
38	95	5	892430094
102	768	2	883748450
63	277	4	875747401
160	234	5	876861185
50	246	3	877052329
301	98	4	882075827
225	193	4	879539727
290	88	4	880731963
97	194	3	884238860
157	274	4	886890835
181	1081	1	878962623
278	603	5	891295330
276	796	1	874791932
7	32	4	891350932
10	16	4	877888877
284	304	4	885329322
201	979	2	884114233
276	564	3	874791805
287	327	5	875333916
246	201	5	884921594
242	1137	5	879741196
249	241	5	879641194
99	4	5	886519097
178	332	3	882823437

        u.item部分数据示例如下所示:

1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|0|1|0|0|0|0|0|0|0|0|0|0
5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|1|0|0
6|Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)|01-Jan-1995||http://us.imdb.com/Title?Yao+a+yao+yao+dao+waipo+qiao+(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
7|Twelve Monkeys (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Twelve%20Monkeys%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|1|0|0|0
8|Babe (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Babe%20(1995)|0|0|0|0|1|1|0|0|1|0|0|0|0|0|0|0|0|0|0
9|Dead Man Walking (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Dead%20Man%20Walking%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
10|Richard III (1995)|22-Jan-1996||http://us.imdb.com/M/title-exact?Richard%20III%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|1|0
11|Seven (Se7en) (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Se7en%20(1995)|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|1|0|0
12|Usual Suspects, The (1995)|14-Aug-1995||http://us.imdb.com/M/title-exact?Usual%20Suspects,%20The%20(1995)|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|1|0|0
13|Mighty Aphrodite (1995)|30-Oct-1995||http://us.imdb.com/M/title-exact?Mighty%20Aphrodite%20(1995)|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0
14|Postino, Il (1994)|01-Jan-1994||http://us.imdb.com/M/title-exact?Postino,%20Il%20(1994)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0
15|Mr. Holland's Opus (1995)|29-Jan-1996||http://us.imdb.com/M/title-exact?Mr.%20Holland's%20Opus%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
16|French Twist (Gazon maudit) (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Gazon%20maudit%20(1995)|0|0|0|0|0|1|0|0|0|0|0|0|0|0|1|0|0|0|0
17|From Dusk Till Dawn (1996)|05-Feb-1996||http://us.imdb.com/M/title-exact?From%20Dusk%20Till%20Dawn%20(1996)|0|1|0|0|0|1|1|0|0|0|0|1|0|0|0|0|1|0|0
18|White Balloon, The (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Badkonake%20Sefid%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
19|Antonia's Line (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Antonia%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
20|Angels and Insects (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Angels%20and%20Insects%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|1|0|0|0|0

        关于数据集解压缩后的各个子文件解释信息如下所示:

Here are brief descriptions of the data.

ml-data.tar.gz   -- Compressed tar file.  To rebuild the u data files do this:
                gunzip ml-data.tar.gz
                tar xvf ml-data.tar
                mku.sh

u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
              Each user has rated at least 20 movies.  Users and items are
              numbered consecutively from 1.  The data is randomly
              ordered. This is a tab separated list of 
	         user id | item id | rating | timestamp. 
              The time stamps are unix seconds since 1/1/1970 UTC   

u.info     -- The number of users, items, and ratings in the u data set.

u.item     -- Information about the items (movies); this is a tab separated
              list of
              movie id | movie title | release date | video release date |
              IMDb URL | unknown | Action | Adventure | Animation |
              Children's | Comedy | Crime | Documentary | Drama | Fantasy |
              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
              Thriller | War | Western |
              The last 19 fields are the genres, a 1 indicates the movie
              is of that genre, a 0 indicates it is not; movies can be in
              several genres at once.
              The movie ids are the ones used in the u.data data set.

u.genre    -- A list of the genres.

u.user     -- Demographic information about the users; this is a tab
              separated list of
              user id | age | gender | occupation | zip code
              The user ids are the ones used in the u.data data set.

u.occupation -- A list of the occupations.

u1.base    -- The data sets u1.base and u1.test through u5.base and u5.test
u1.test       are 80%/20% splits of the u data into training and test data.
u2.base       Each of u1, ..., u5 have disjoint test sets; this if for
u2.test       5 fold cross validation (where you repeat your experiment
u3.base       with each training and test set and average the results).
u3.test       These data sets can be generated from u.data by mku.sh.
u4.base
u4.test
u5.base
u5.test

ua.base    -- The data sets ua.base, ua.test, ub.base, and ub.test
ua.test       split the u data into a training set and a test set with
ub.base       exactly 10 ratings per user in the test set.  The sets
ub.test       ua.test and ub.test are disjoint.  These data sets can
              be generated from u.data by mku.sh.

allbut.pl  -- The script that generates training and test sets where
              all but n of a users ratings are in the training data.

mku.sh     -- A shell script to generate all the u data sets from u.data.

         了解和熟悉了数据集之后就可以对其进行建模分析了。首先,构建id、name的映射关系字典,方法如下:

def dataMapping(data='item.txt'):
    '''
    构建id和name的映射关系字典
    '''
    id_name_dict,name_id_dict={},{}
    with open(data) as f:
        data_list=[one_line.strip().split('|') for one_line in f.readlines() if one_line]
    for one_list in data_list:
        id_name_dict[one_list[0]]=one_list[1]
        name_id_dict[one_list[1]]=one_list[0]
    return id_name_dict,name_id_dict

          其余的推荐工作与book推荐类似,这里就不再详细解释了,这里直接上代码:

def movieRecommendSystem():
    '''
    电影推荐系统
    '''
    #构建训练数据集与KNN模型
    movie_data=Dataset.load_builtin('ml-100k')
    trainset=movie_data.build_full_trainset()
    algo=KNNBasic()
    algo.train(trainset)
    #构建id、name映射关系
    id_name_dict,name_id_dict=dataMapping(data='item.txt')
    #电影推荐
    #以电影Army of Darkness (1993)为基础
    raw_id=name_id_dict['Army of Darkness (1993)']  #获得raw_id
    inner_id=algo.trainset.to_inner_iid(raw_id)  #转换为模型内部id
    neighbors=algo.get_neighbors(inner_id,10)  #模型推荐电影(默认10个推荐结果)
    res_ids=[algo.trainset.to_raw_iid(_id) for _id in neighbors]  #模型内部id转换为实际电影id
    movies=[id_name_dict[raw_id] for raw_id in res_ids]  #获得电影名称
    print u"========================10个最相似的电影:========================"
    for movie in movies:
        print name_id_dict[movie],'==========>',movie

      我们基于实际的数据来进行简单的测试分析,推荐结果如下:

Done computing similarity matrix.
========================10个最相似的电影:========================
242 ==========> Kolya (1996)
486 ==========> Sabrina (1954)
88 ==========> Sleepless in Seattle (1993)
603 ==========> Rear Window (1954)
20 ==========> Angels and Insects (1995)
479 ==========> Vertigo (1958)
1336 ==========> Kazaam (1996)
673 ==========> Cape Fear (1962)
568 ==========> Speed (1994)
623 ==========> Angels in the Outfield (1994)

       在本文的实践中默认使用到的推荐计算模型都是KNN,当然Suprise还提供了很多其他的模型和工具,奇异矩阵分解SVD就是非常常用的模型之一,这里我们对KNN和SVD的性能进行了简单的对比分析,结果如下图所示:

       本文到这里,基于机器学习的推荐实践就差不多结束了,感兴趣的话可以使用相应的数据集和代码进行尝试和建模分析了,接下来主要是基于深度学习模型来构建属于自己的音乐推荐系统。

     首先对数据集进行介绍说明,数据集主要来源于网易云音乐数据的采集,这里由于版权和官方的声明信息就不能公开爬虫的代码了,如果真的对数据集有需求的话可以联系我来获取一下实验的数据集,仅作为学术研究使用,切勿用作其他用途,谢谢合作。接下来我们对数据集进行初步的概览,部分数据结果如下所示:
      歌曲id-name映射数据集示例如下所示:

4875306,逍遥叹-胡歌
376417,一生有你-水木年华
177575,让我一次爱个够-庾澄庆
5255987,你若成风-许嵩
165375,专属味道-汪苏泷
63650,独家记忆-陈小春
191819,当你孤单你会想起谁-张栋梁
504826080,路过南京,路过你-江皓南
408250378,写给我第一个喜欢的女孩的歌-西瓜Kune
505451285,青春住了谁-杨丞琳
31426805,Tell Her You Belong To Me-Beth Hart
5094255,So Nice-Jim Tomlinson
27008758,Angel-Randy Crawford
427416048,Boom Boom Baby-Sean Hayes
458496129,moonlight.-Sleep2.
1308441,What a Difference the Day Made-Eddie Higgins
3163956,Moon Song-Norah Jones
2391318,Soledad-Concha Buika
2639938,The Girl From Impanema-Gabriela Anders
16952047,More-Matt Dusk
536680802,江南雨巷-绯村柯北
463425816,倾杯有酒-晃儿
35956497,一衫轻纱-陈浩东
34341487,寒江雪-Braska
454717839,宿雨冷-老虎欧巴
479177946,巷雨梨花-涵昱
530986445,晴川雪-银临
31062973,执伞-吾恩
29984203,执伞待人归-而已
30352430,旧诗行-只有影子
461347998,Something Just Like This-The Chainsmokers
411314681,This Is What You Came For -Calvin Harris
460043372,It Ain't Me-Kygo
515269424,Wolves-Selena Gomez
422132237,Cold Water-Major Lazer
521416693,So Far Away-Martin Garrix
461518855,Stay-Zedd
474581010,BOOM-Tiësto
420922950,Let Me Love You-DJ Snake
31370725,Say My Name (Kids Want Techno Remix)-Odesza
28310930,涩-纣王老胡
437755447,途-倪健
490106148,山下-方拾贰
30635613,秋酿-房东的猫
417859220,皆非-马頔
399340140,来信-陈鸿宇
443967407,短叹-房东的猫
436514312,成都-赵雷
29572804,傲寒-马頔
408814900,借我-谢春花
29482203,風巻立つ-増田俊郎
32743519,江上清风游-变奏的梦想
32743521,明月逐人归-变奏的梦想
31477886,人闲桂花静-F.Be.I
785507,天照大御神-Musical Jarβ
683826,春よ、来い-松任谷由実
507152393,既听云深-秩厌
428203067,行雲流水-流派未月亭
509466710,饮酒赋诗-韦卓成
31649696,千年の風-天地雅楽
488953797,You Might Be (GoldFish Remix)-Autograf
524149482,Sunset City-Andreas Phazer
485612576,Creep-Gamper & Dadoni
451703286,Burn (Gryffin Remix)-Gryffin
32238090,Animals (Gryffin Remix)-Maroon 5
464674974,Baby Boy (Famba Remix)-Famba
34690580,How Deep Is Your Love (Liva K Remix)-Liva K
407002710,Desire (Gryffin Remix)-Years & Years
443292315,Deep Of The Night (Extended Mix)-Goldfish
451701288,Am I Wrong (Gryffin Remix)-Gryffin

        专辑id-name映射部分数据示例如下所示:

2106881647,寒假都快结束了,暑假为什么不接上?
2095806875,日系治愈男声,逐步陷入温柔的世界
2089907261,『無前奏 | 女嗓』开口即跪 心醉神迷.
2099302296,纯音乐|钢琴与旧书页,会跳舞的黑白键
2099228567,Bedroom-pop丨天马行空的绮梦
2092484970,日系治愈|请问您今天要来点少女心吗?
2091038787,如果可以,我想和古人谈一场恋爱
2093273437,人声后摇 I 怎奈何琼夜一嗓伶俜
2092474396,「抒情摇滚」过去的时光与黯然的诗
2097548733,放完寒假,还是要继续追逐梦想
2095135687,2018年韩国平昌冬奥会花样滑冰比赛BGM
2129450612,有没有一首歌会让你想起周华健
2097045424,音室Vol.4丨细 数 一 些 旧 时 光
2094937822,2018年平昌冬奥会花样滑冰音乐选曲爆点精选
2088799380,日系温柔女声丨沐浴在歌声中的暖阳下
2084738832,「赶走阴霾」来一首欢快的欧美小调
2092015892,我们在苏打绿的小宇宙里再遇见
2081400147,【妖说】沧海桑田 只待君归
2081133164,古风词作‖他们只是喜欢用歌的方式来讲故事
2083911279,【Kobalt推荐】清晨最温柔的旋律
2087725103,【超级碗2018】贾老板的中场盛事
2082564528,听什么歌都像在唱自己
2088028082,『曲作精选』细数古风圈原创作曲人❷
2075587022,助眠集 | 自然音,伴灵动乐符萦绕耳畔
2086732756,「深度睡眠」音符伴你入梦 愿你一夜好眠
2074273616,想要办一场古风婚礼,许一世天地作嫁
2075961982,告白恋语|喜欢你,直至生命最后一刻
2074681032,「古风精选」你眼中是江湖 我眼中是你
2086647823,最强大脑第五季BGM
2077299279,『热血街舞团』参演曲目及出场BGM
2088338811,『 孝利家民宿2 』允儿篇
2078747658,『这!就是街舞』最全BGM合集持更
2076551016,那片蔚蓝的天空静止了。
2074505134,我们在寒冷的冬天停止不了摇滚与热吻
2076170419,▶ 2018年欧美流行新歌速递
2076565475,偶像练习生  参赛曲目 ( 3.23 New )
2073678124,和声优谈恋爱什么感觉丨日本男声优撩妹现场
2071490150,百首日系治愈 呼唤你的心灵 呼喊你的名字
2073263803,[萌系/俏皮小调]不期而至的心动❤
2069140707,恋爱系Melody,不知觉已陷入暗恋之心
2065854146,古典清香 I 我的茶馆里住着巴赫与肖邦
2062601053,日系摇滚『东京日和° 一番街的清冽少年』
2069998416,Moombahton:异域风情的Drop最强音
2064112746,那些年我们听错过的歌词「古风版」
2067901040,空灵澄澈|梦寻空中花园
2061447491,综艺《这!就是街舞》BGM合辑
2061240468,「恬静英文」愿你酣然入梦
2069189356,『Trap』低频轰炸机 令人上瘾的黑暗氛围
2065668633,「提神醒脑」学习工作健身游戏必备
2065515420,综艺《偶像练习生》BGM合辑
2062160307,好莱坞黄金时代|歌梦盛世
2060642136,長眠這裡吧妳已经沒有活下去的理由.
2055571883,小姐姐搭档电子乐,声控党新潮的标配~
2058497430,只是想安静的 享受那份呼吸的感觉。o・°。
2063413626,流行禁区 • 摄人心魄的性感律动
2066200314,「无前奏」喜欢无需铺垫 一秒便沦陷
2059430694,痛彻心扉地哭,然后刻骨铭心地记住
2063777060,「前奏沦陷」●迷醉在Absolut伏特加中
2057752377,「古风」歌暖如茶,满城花开
2054127850,节奏向|原谅我这一生不羁放纵爱自由
2062522327,2018年冬季新番音乐之旅
2060692794,【Jazz Blues】爵士乐句演绎12小节布鲁斯
2059412026,华语 | 80/90都听过的经典老歌【怀旧篇】
2052793999,(旋律控)|可可布朗尼般甜蜜
2065782856,〖偶像练习生〗丨参赛曲目合辑(持更…)
2048970456,百首日系抒情,总有一首触动你的心
2049903536,日系对唱丨聆听他们绽放在青春的美好
2041615881,『曲作精选』细数古风圈原创作曲人❶
2040074016,「女声控」音色沁人心 旋律美如画
2044527707,降燥八音盒 你要来一杯柠檬薄荷苏打水吗?
2050704516,2018全年抖腿指南,老铁你怕了吗?
2047743292,♪V家歌姬唱英文的时候♪
2047424322,粤语女声 I 故事太多 没人会听你诉说
2042009605,你绝对不应该错过的100首英文歌
2042006896,华语 | 听歌最怕应景,触景最怕生情
2059465574,韩语 | 一听就会中毒的韩文歌
2055505250,「情书予你」江湖太远,我就不去了。

      用户评分数据部分数据示例如下所示:

2148086011,32743519,100.0,1433001600000
2148086011,32743521,95.0,1433001600000
2148086011,31477886,95.0,1295712000000
2148086011,785507,90.0,1293724800000
2148086011,683826,100.0,1265126400000
2148086011,507152393,80.0,1505725430579
2148086011,428203067,95.0,1471017600000
2148086011,509466710,70.0,1506495600000
2148086011,31649696,90.0,1216310400000
2139118830,488953797,100.0,1499356800007
2139118830,524149482,100.0,1512662400007
2139118830,485612576,100.0,1497974400007
2139118830,451703286,100.0,1395763200007
2139118830,32238090,100.0,1431878400007
2139118830,464674974,95.0,1489334400007
2139118830,34690580,100.0,1441987200007
2139118830,407002710,100.0,1458319287803
2139118830,443292315,95.0,1480003200007
2139118830,451701288,100.0,1407772800007
2139491961,287035,100.0,1172678400000
2139491961,254485,100.0,965059200000
2139491961,254432,100.0,1038672000000
2139491961,224000,100.0,978278400000
2139491961,210281,100.0,1295913600000
2139491961,186010,100.0,1067616000000
2139491961,188674,100.0,975600000000
2139491961,375394,100.0,1130774400000
2139491961,27747329,100.0,1379952000007
2139491961,25641873,100.0,1104422400007
2139305008,186345,100.0,962380800000
2139305008,234841,100.0,892051200007
2139305008,187564,100.0,938707200000
2139305008,187600,100.0,938707200000
2139305008,143474,100.0,539107200000
2139305008,188222,100.0,691516800000
2139305008,156427,100.0,1185897600000
2139305008,153784,100.0,765129600000
2139305008,5242750,100.0,1262275200000
2139305008,194186,100.0,741456000000
2144281377,108251,100.0,1291737600000
2144281377,507815173,100.0,1505983872874
2144281377,375100,100.0,1178812800000
2144281377,385973,100.0,1122825600000
2144281377,186021,100.0,1059580800000
2144281377,29343809,100.0,1410278400007
2144281377,82203,100.0,1230307200000
2144281377,25699094,100.0,1199980800007
2144281377,254045,100.0,1344556800000
2144281377,287251,100.0,1128096000000
2139324915,108390,100.0,1256832000000
2139324915,375394,100.0,1130774400000
2139324915,254574,100.0,941385600000
2139324915,190072,100.0,975600000000
2139324915,186560,100.0,891360000000
2139324915,168089,100.0,1038672000000
2139324915,110400,100.0,817747200000
2139324915,287035,100.0,1172678400000
2139324915,32507038,100.0,1433433600007
2139324915,126946,100.0,1104508800004
2139566312,2182015,95.0,1208822400000
2139566312,26256399,85.0,1208822400000
2139566312,526935207,80.0,1199808000007
2139566312,526935208,80.0,1199808000007
2139566312,527013149,75.0,1236787200007
2139566312,527013150,80.0,1236787200007
2139566312,526977909,65.0,1227542400007
2139566312,526977910,55.0,1227542400007
2139566312,527013280,75.0,1238428800007
2139566312,25646006,80.0,1220918400000
2140187381,486473539,100.0,1358438400007
2140187381,426881088,100.0,589042800000
2140187381,426881089,95.0,589042800000
2140187381,426881090,90.0,589042800000
2140187381,426881091,85.0,589042800000
2140187381,426881092,85.0,589042800000
2140187381,426881093,85.0,589042800000
2140187381,426881094,80.0,589042800000
2140187381,426881095,80.0,589042800000
2140187381,426881096,75.0,589042800000
2128755383,459089,100.0,1104508800000
2128755383,23039253,100.0,1349049600000
2128755383,459093,100.0,1104508800000
2128755383,29744089,100.0,1416240000000
2128755383,459097,100.0,1104508800000
2128755383,21994019,95.0,1175443200000
2128755383,21993923,95.0,1247500800000
2128755383,459101,95.0,1104508800000
2128755383,5044797,95.0,1194883200007
2128755383,459105,90.0,1104508800000
2131958935,29787426,100.0,1395014400000
2131958935,16823382,100.0,1141660800000
2131958935,427542109,100.0,1474848000000
2131958935,33255655,100.0,1426348800007
2131958935,20953761,100.0,992016000000
2131958935,34376545,100.0,1453518281446
2131958935,4175444,100.0,1275580800007
2131958935,34229976,100.0,1380643200000
2131958935,432464943,100.0,1475259769146
2131958935,33211676,100.0,1416672000000
2132468088,534065427,100.0,1516896000007
2132468088,26209672,95.0,1364313600007
2132468088,4936840,90.0,1274803200000
2132468088,33367836,90.0,1421769600000
2132468088,541480254,90.0,1509033600007
2132468088,546730516,75.0,1414080000007
2132468088,4920894,85.0,1324396800000
2132468088,34928242,85.0,1443024000007
2132468088,28636414,85.0,1364486400000
2132468088,536622468,80.0,1498752000007
2129467457,277382,100.0,1014912000000
2129467457,108640,100.0,941385600000
2129467457,277822,100.0,875635200000
2129467457,277817,100.0,875635200000
2129467457,277820,100.0,875635200000
2129467457,277759,100.0,943977600000
2129467457,277804,100.0,896630400000
2129467457,277586,100.0,970329600000
2129467457,276939,100.0,1214323200000
2129467457,277836,100.0,820425600000
2134421380,4877892,100.0,1007136000000
2134421380,4877894,100.0,1007136000000
2134421380,4873340,100.0,1208448000000
2134421380,4873343,100.0,1208448000000

       基于深度学习的音乐推荐系统结构示例图如下所示:

      上面的示意图是本文音乐推荐系统的流程示意,相信是比较好理解的,接下来我们来看具体的代码实现:

def dataPre(one_line):
    '''
    去脏、去无效数据
    '''
    with open('stopwords.txt') as f:
        stopwords_list=[one.strip() for one in f.readlines() if one]
    sigmod_list=[',','。','(',')','-','——','\n','“','”','*','#','《','》','、','[',']','(',')','-',
                   '.','/','】','【','……','!','!',':',':','…','@','~@','~','「一」','「','」',
                '?','"','?','~','_',' ',';','◆','①','②','③','④','⑤','⑥','⑦','⑧','⑨','⑩',
                '⑾','⑿','⒀','⒁','⒂','&amp;quot;',' ','/','·','…','!!!','】','!',',',
                '。','[',']','【','、','?','/^/^','/^','”',')','(','~','》','《','。。。',
                '=','⑻','⑴','⑵','⑶','⑷','⑸','⑹','⑺','…','|']
    for one_sigmod in sigmod_list:
        one_line=one_line.replace(one_sigmod,'')
    return one_line


def seg(one_content):  
    ''' 
    分词并去除停用词 
    one_content:单条企业名称数据
    stopwords:停用词列表
    '''  
    stopwords=[]
    segs=jieba.cut(one_content,cut_all=False)  
    segs=[w.encode('utf8') for w in list(segs)]# 特别注意此处转换  
    seg_set=set(set(segs)-set(stopwords))  
    return list(seg_set) 


def word2vecModel(con_list,model_path='my.model'):
    '''
    class gensim.models.word2vec.Word2Vec(sentences=None,size=100,alpha=0.025,window=5, min_count=5,max_vocab_size=None, sample=0.001,seed=1, workers=3,min_alpha=0.0001, sg=0, hs=0, negative=5, cbow_mean=1,hashfxn=<built-in function hash>,iter=5,null_word=0, trim_rule=None, sorted_vocab=1, batch_words=10000)
    参数:
    1.sentences:可以是一个List,对于大语料集,建议使用BrownCorpus,Text8Corpus或·ineSentence构建。
    2.sg: 用于设置训练算法,默认为0,对应CBOW算法;sg=1则采用skip-gram算法。
    3.size:是指输出的词的向量维数,默认为100。大的size需要更多的训练数据,但是效果会更好. 推荐值为几十到几百。
    4.window:为训练的窗口大小,8表示每个词考虑前8个词与后8个词(实际代码中还有一个随机选窗口的过程,窗口大小<=5),默认值为5。
    '''
    model=word2vec.Word2Vec(con_list,sg=1,size=100,window=5,min_count=1,
                            negative=3,sample=0.001, hs=1,workers=4)
    model.save(model_path)
    return con_list,model


def songName2Words(data='neteasy_song_id_to_name_data.csv',save_path='music/songName.txt'):
    '''
    歌名数据向量化处理
    '''
    with open(data) as f:
        data_list=[one.strip().split(',') for one in f.readlines() if one]
    data_list.pop(0)  #去除标题行
    res_list=[]
    for one in data_list:
        musicId,content=one[0],''.join(one[1:])
        tmp=content.split('-')
        name,author=tmp[0].replace(' ',''),''.join(tmp[1:]).replace(' ','')
        name2=dataPre(name)
        author2=dataPre(author)
        cut_list=seg(name2)
        cut_list.append(author2)
        one_line=musicId+'|#|'+'/'.join(cut_list).strip()
        res_list.append(one_line)
    with open(save_path,'w') as f:
        for one in res_list:
            f.write(one.strip()+'\n')


def song2Vec(data='music/songName.txt',model_path='music/song2Vec.model'):
    '''
    对歌曲名称分词后构建word2vec模型
    '''
    with open(data) as f:
        data_list=[one.strip() for one in f.readlines() if one]
    data=[]
    for i in range(len(data_list)):
        musicId,content=data_list[i].split('|#|')
        con_list=content.split('/')
        data.append(con_list)
    #训练模型
    word2vecModel(data,model_path=model_path)


def mergeMovie(userVec='movie/userVec.json',songVec='movie/movieVec.json',save_path='movie/dataset.json'):
    '''
    将电影数据集向量拼接
    '''
    with open(userVec) as U:
        user_vector=json.load(U)
    with open(songVec) as S:
        song_vector=json.load(S)
    #加载评分数据
    with open('movie/ratings.dat') as f:
        data_list=[one.strip().split('::') for one in f.readlines()[:50000] if one]
    vector=[]
    for i in range(len(data_list)):
        one_list=[]
        userId,movieId,rating,T=data_list[i]
        try:
            userV=user_vector[userId]
            songV=song_vector[movieId]
            one_list+=userV
            one_list+=songV
            one_list.append(int(rating))     
            vector.append(one_list)
        except Exception as e:
            print('Exception: ',e)
    with open(save_path,'wb') as f:
        f.write(json.dumps(vector))

        上述代码我们基于word2vec向量化工具实现了歌曲和用户的向量化生成与表示,用于后续模型的计算。下面我们简单看一下特征向量:

F0,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12,F13,F14,F15,F16,F17,F18,F19,F20,F21,F22,F23,F24,F25,F26,F27,F28,F29,F30,F31,F32,F33,F34,F35,F36,F37,F38,F39,F40,F41,F42,F43,F44,F45,F46,F47,F48,F49,F50,F51,F52,F53,F54,F55,F56,F57,F58,F59,F60,F61,F62,F63,F64,F65,F66,F67,F68,F69,F70,F71,F72,F73,F74,F75,F76,F77,F78,F79,F80,F81,F82,F83,F84,F85,F86,F87,F88,F89,F90,F91,F92,F93,F94,F95,F96,F97,F98,F99,F100,F101,F102,F103,F104,F105,F106,F107,F108,F109,F110,F111,F112,F113,F114,F115,F116,F117,F118,F119,F120,F121,F122,F123,F124,F125,F126,F127,F128,F129,F130,F131,F132,F133,F134,F135,F136,F137,F138,F139,F140,F141,F142,F143,F144,F145,F146,F147,F148,F149,F150,F151,F152,F153,F154,F155,F156,F157,F158,F159,F160,F161,F162,F163,F164,F165,F166,F167,F168,F169,F170,F171,F172,F173,F174,F175,F176,F177,F178,F179,F180,F181,F182,F183,F184,F185,F186,F187,F188,F189,F190,F191,F192,F193,F194,F195,F196,F197,F198,F199,rank
-0.00905437208712101,-0.0007396119763143361,0.008964900858700275,0.007147061172872782,-0.002938180463388562,-0.006451033987104893,0.0006823862786404788,0.008351589553058147,-0.004327930044382811,-0.00039664877112954855,0.014263618737459183,0.0002304373774677515,0.007386152166873217,0.012727610766887665,-0.004937054589390755,0.010642743669450283,-0.0061443764716386795,-0.0030952980741858482,-0.000960861740168184,0.012492218986153603,0.004629271570593119,0.004056858364492655,-0.008649991825222969,-0.00019251192861702293,-0.0025250220205634832,0.000613546057138592,0.0020741717889904976,-0.0004906861577183008,-0.013737712986767292,-0.0005791311850771308,0.01548623014241457,0.007403503637760878,0.00670241005718708,0.00947426725178957,-0.003177322680130601,0.0032764971256256104,0.0006224086391739547,0.00381011632271111,0.004719317425042391,0.005109986290335655,0.0019342167070135474,0.004826852586120367,-0.0021173858549445868,-0.0022332491353154182,-0.0004247582401148975,-0.003798522986471653,0.00038840010529384017,0.0022172797471284866,-0.006280323024839163,-0.007924836128950119,0.00046939635649323463,0.007669319864362478,0.003541012294590473,0.006963915191590786,0.0031984655652195215,0.0007785331108607352,-0.006734368856996298,-0.00701282499358058,0.0065978821367025375,-0.0014829643769189715,0.007206542417407036,-0.002689479850232601,-0.007491654716432095,-0.007554797455668449,0.000426261976826936,-0.003307378152385354,-0.0013022092171013355,-0.002813913393765688,-0.005616216920316219,-0.005037547554820776,-0.005481714382767677,-0.007400141097605228,-0.006884533911943436,0.002671564696356654,-0.002913718344643712,-5.24939205206465e-05,0.007651910185813904,0.0009839265840128064,-0.0070876977406442165,-0.0006983017083257437,0.005889476742595434,-0.0011757116299122572,0.004702093079686165,-0.0011430471204221249,-0.004178674891591072,0.004804267082363367,-0.0023697437718510628,0.003897198010236025,-0.0142428083345294,-0.0080240648239851,-0.0027684057131409645,-0.006377904210239649,0.009644911624491215,0.013665839098393917,0.009657599963247776,0.012993580661714077,-0.004962602164596319,-0.01250487845391035,-0.0013530227588489652,0.001891392981633544,0.008389173075556755,-0.0021944893524050713,0.005254175513982773,-0.0010480922646820545,0.002954872790724039,0.005937131121754646,-0.012680958956480026,-0.0008859079098328948,0.006242068950086832,0.0013998642098158598,-0.0046497974544763565,0.004769282415509224,0.0024638278409838676,0.007718070410192013,-0.0030265755485743284,0.0020163003355264664,-0.0029373448342084885,0.0026505468413233757,0.0036762775853276253,-0.001057810615748167,0.010048923082649708,0.0023740148171782494,0.0021975813433527946,-0.00345548614859581,0.0007476559840142727,0.003124898299574852,0.005844129715114832,0.0013669220497831702,-0.00033092708326876163,-0.011868903413414955,-0.0028881989419460297,0.007996010594069958,-0.005054941400885582,-0.006188235245645046,-9.678240166977048e-05,0.004327509552240372,0.018809569999575615,0.002290343400090933,-0.0009504191693849862,-0.006280957255512476,-0.010345006361603737,-0.015455245971679688,0.0051696039736270905,0.011396088637411594,-0.014725587330758572,-0.0024055838584899902,0.0010706465691328049,-0.00768823828548193,-0.002703784964978695,-0.0015396936796605587,0.005851196125149727,0.0019892766140401363,0.0023974033538252115,0.005399889778345823,-0.012511718086898327,0.0017787865363061428,0.011473720893263817,0.003320086747407913,-0.005707764998078346,0.0031185653060674667,0.007588379550725222,0.009407034143805504,-0.0020049791783094406,-0.011117514222860336,0.008930917829275131,0.007140926085412502,0.00844403076916933,0.00846918299794197,-0.006229204125702381,-0.0028119836933910847,0.0020610331557691097,-0.006508949678391218,0.0027775117196142673,-0.0016185733256861567,0.005066308658570051,0.0014820161741226912,-0.013164183124899864,-0.01331096887588501,0.0032651261426508427,0.005103863310068846,-0.006894501857459545,0.004967343993484974,-0.010701723396778107,-0.005730992183089256,-0.0036964653991162777,0.012735491618514061,0.0025388621725142,0.003371666884049773,-0.015071623958647251,-0.010324200615286827,-0.0067957928404212,0.002820124151185155,-0.00553283654153347,0.0034825624898076057,0.007049893960356712,-0.0062140803784132,0.005579421296715736,-0.013508349657058716,0.0011655841954052448,-0.0015429649502038956,90.0
-0.00905437208712101,-0.0007396119763143361,0.008964900858700275,0.007147061172872782,-0.002938180463388562,-0.006451033987104893,0.0006823862786404788,0.008351589553058147,-0.004327930044382811,-0.00039664877112954855,0.014263618737459183,0.0002304373774677515,0.007386152166873217,0.012727610766887665,-0.004937054589390755,0.010642743669450283,-0.0061443764716386795,-0.0030952980741858482,-0.000960861740168184,0.012492218986153603,0.004629271570593119,0.004056858364492655,-0.008649991825222969,-0.00019251192861702293,-0.0025250220205634832,0.000613546057138592,0.0020741717889904976,-0.0004906861577183008,-0.013737712986767292,-0.0005791311850771308,0.01548623014241457,0.007403503637760878,0.00670241005718708,0.00947426725178957,-0.003177322680130601,0.0032764971256256104,0.0006224086391739547,0.00381011632271111,0.004719317425042391,0.005109986290335655,0.0019342167070135474,0.004826852586120367,-0.0021173858549445868,-0.0022332491353154182,-0.0004247582401148975,-0.003798522986471653,0.00038840010529384017,0.0022172797471284866,-0.006280323024839163,-0.007924836128950119,0.00046939635649323463,0.007669319864362478,0.003541012294590473,0.006963915191590786,0.0031984655652195215,0.0007785331108607352,-0.006734368856996298,-0.00701282499358058,0.0065978821367025375,-0.0014829643769189715,0.007206542417407036,-0.002689479850232601,-0.007491654716432095,-0.007554797455668449,0.000426261976826936,-0.003307378152385354,-0.0013022092171013355,-0.002813913393765688,-0.005616216920316219,-0.005037547554820776,-0.005481714382767677,-0.007400141097605228,-0.006884533911943436,0.002671564696356654,-0.002913718344643712,-5.24939205206465e-05,0.007651910185813904,0.0009839265840128064,-0.0070876977406442165,-0.0006983017083257437,0.005889476742595434,-0.0011757116299122572,0.004702093079686165,-0.0011430471204221249,-0.004178674891591072,0.004804267082363367,-0.0023697437718510628,0.003897198010236025,-0.0142428083345294,-0.0080240648239851,-0.0027684057131409645,-0.006377904210239649,0.009644911624491215,0.013665839098393917,0.009657599963247776,0.012993580661714077,-0.004962602164596319,-0.01250487845391035,-0.0013530227588489652,0.001891392981633544,-0.022592157125473022,0.03406761959195137,-0.011294135823845863,-0.03557229042053223,0.006008323282003403,0.000521798268891871,0.04826105386018753,-0.012627901509404182,0.0129550751298666,-0.01717522367835045,0.013481661677360535,0.013365393504500389,-0.036065325140953064,-0.0078732343390584,-0.01379341073334217,0.015099458396434784,-0.00837993435561657,-0.018873106688261032,-0.0025675371289253235,-0.022314341738820076,-0.01362605020403862,-0.012553971260786057,0.014811022207140923,-0.019477257505059242,-0.007187745068222284,-0.0029132734052836895,-0.0005756211467087269,-0.018443219363689423,0.009171165525913239,0.008502017706632614,0.010730217210948467,-0.024739541113376617,0.023262105882167816,0.024543697014451027,-0.010074739344418049,0.023082096129655838,-0.05538586899638176,0.013144426979124546,-0.0034713721834123135,0.017734598368406296,0.002445317804813385,0.06830388307571411,-0.04603378847241402,-0.017839286476373672,0.02907683700323105,0.057052962481975555,-0.006418359465897083,0.04278741776943207,-0.018171297386288643,-0.039012014865875244,0.0031849518418312073,-0.048897162079811096,0.03169402852654457,-0.0037552379071712494,-0.0599692165851593,0.028303690254688263,-0.0270870141685009,-0.005421224981546402,0.02729225344955921,0.0005637332797050476,0.0309995636343956,0.0162825807929039,-0.01295829564332962,0.06263034790754318,0.014025387354195118,0.027204040437936783,-0.050342194736003876,-0.036244578659534454,-0.015098728239536285,0.005250332877039909,0.015661250799894333,0.004092647694051266,-0.001389537937939167,0.00746307335793972,-0.027304377406835556,-0.025669891387224197,0.050152361392974854,0.026778768748044968,-0.04022226482629776,0.03715263307094574,-0.007856795564293861,-0.007773062214255333,0.06832816451787949,0.02114926651120186,0.016750771552324295,-0.013472440652549267,0.010191161185503006,-0.014516017399728298,-0.0029776711016893387,0.0033395399805158377,-0.00017385557293891907,-0.00641307607293129,0.0037895026616752148,-0.033165350556373596,-0.05743199586868286,0.029474055394530296,-0.0629713237285614,0.058291152119636536,-0.0018130820244550705,-0.014646890573203564,95.0
-0.00905437208712101,-0.0007396119763143361,0.008964900858700275,0.007147061172872782,-0.002938180463388562,-0.006451033987104893,0.0006823862786404788,0.008351589553058147,-0.004327930044382811,-0.00039664877112954855,0.014263618737459183,0.0002304373774677515,0.007386152166873217,0.012727610766887665,-0.004937054589390755,0.010642743669450283,-0.0061443764716386795,-0.0030952980741858482,-0.000960861740168184,0.012492218986153603,0.004629271570593119,0.004056858364492655,-0.008649991825222969,-0.00019251192861702293,-0.0025250220205634832,0.000613546057138592,0.0020741717889904976,-0.0004906861577183008,-0.013737712986767292,-0.0005791311850771308,0.01548623014241457,0.007403503637760878,0.00670241005718708,0.00947426725178957,-0.003177322680130601,0.0032764971256256104,0.0006224086391739547,0.00381011632271111,0.004719317425042391,0.005109986290335655,0.0019342167070135474,0.004826852586120367,-0.0021173858549445868,-0.0022332491353154182,-0.0004247582401148975,-0.003798522986471653,0.00038840010529384017,0.0022172797471284866,-0.006280323024839163,-0.007924836128950119,0.00046939635649323463,0.007669319864362478,0.003541012294590473,0.006963915191590786,0.0031984655652195215,0.0007785331108607352,-0.006734368856996298,-0.00701282499358058,0.0065978821367025375,-0.0014829643769189715,0.007206542417407036,-0.002689479850232601,-0.007491654716432095,-0.007554797455668449,0.000426261976826936,-0.003307378152385354,-0.0013022092171013355,-0.002813913393765688,-0.005616216920316219,-0.005037547554820776,-0.005481714382767677,-0.007400141097605228,-0.006884533911943436,0.002671564696356654,-0.002913718344643712,-5.24939205206465e-05,0.007651910185813904,0.0009839265840128064,-0.0070876977406442165,-0.0006983017083257437,0.005889476742595434,-0.0011757116299122572,0.004702093079686165,-0.0011430471204221249,-0.004178674891591072,0.004804267082363367,-0.0023697437718510628,0.003897198010236025,-0.0142428083345294,-0.0080240648239851,-0.0027684057131409645,-0.006377904210239649,0.009644911624491215,0.013665839098393917,0.009657599963247776,0.012993580661714077,-0.004962602164596319,-0.01250487845391035,-0.0013530227588489652,0.001891392981633544,0.019901975989341736,-0.015747029334306717,0.0033654612489044666,-0.03632471710443497,0.09456537663936615,0.0670727863907814,0.1920902580022812,0.04572567716240883,0.04062191769480705,0.1240314394235611,0.016196109354496002,0.07990884780883789,-0.09689442068338394,-0.06618727743625641,-0.041430652141571045,-0.0523345023393631,0.03341924399137497,-0.11789211630821228,-0.0317903570830822,-0.052808865904808044,-0.007092255167663097,0.02028701640665531,0.1428583711385727,-0.027414098381996155,-0.08579147607088089,0.00399013003334403,0.0071861702017486095,-0.0332927480340004,0.004831282421946526,0.030836742371320724,-0.10060083121061325,-0.022930579259991646,-0.04553624615073204,0.010508889332413673,-0.010346035473048687,0.023842399939894676,-0.0030776262283325195,-0.07267031073570251,-0.08600760996341705,0.07088109850883484,0.039998531341552734,0.0498877577483654,-0.13703669607639313,0.0275780837982893,0.118706613779068,0.09653228521347046,0.04980337619781494,0.09088883548974991,-0.0017204303294420242,-0.03217220678925514,0.005900798365473747,-0.09877994656562805,0.10114317387342453,0.004953338764607906,-0.033393122255802155,0.027972711250185966,-0.22143733501434326,-0.046712614595890045,-0.03027639538049698,0.03798357769846916,0.13159975409507751,-0.013990684412419796,0.003518206998705864,0.035273727029561996,0.10022589564323425,-0.008735567331314087,-0.04597346484661102,-0.04725199192762375,-0.04423960670828819,0.08597318828105927,0.020118530839681625,-0.02840239368379116,-0.037039615213871,0.026149779558181763,-0.07027353346347809,-0.030569007620215416,0.14087867736816406,0.04950963705778122,-0.08218803256750107,0.03112017922103405,0.10208556801080704,0.024767188355326653,0.1622713804244995,0.01481152419000864,0.017544297501444817,-0.14008687436580658,-0.0005397782661020756,-0.12663060426712036,0.06454144418239594,-0.02462538704276085,-0.08439943194389343,0.027440207079052925,-0.015044237487018108,-0.1196710616350174,-0.11459940671920776,0.0013232259079813957,-0.20382101833820343,0.07837190479040146,-0.01083550974726677,-0.1029856726527214,100.0
-0.00905437208712101,-0.0007396119763143361,0.008964900858700275,0.007147061172872782,-0.002938180463388562,-0.006451033987104893,0.0006823862786404788,0.008351589553058147,-0.004327930044382811,-0.00039664877112954855,0.014263618737459183,0.0002304373774677515,0.007386152166873217,0.012727610766887665,-0.004937054589390755,0.010642743669450283,-0.0061443764716386795,-0.0030952980741858482,-0.000960861740168184,0.012492218986153603,0.004629271570593119,0.004056858364492655,-0.008649991825222969,-0.00019251192861702293,-0.0025250220205634832,0.000613546057138592,0.0020741717889904976,-0.0004906861577183008,-0.013737712986767292,-0.0005791311850771308,0.01548623014241457,0.007403503637760878,0.00670241005718708,0.00947426725178957,-0.003177322680130601,0.0032764971256256104,0.0006224086391739547,0.00381011632271111,0.004719317425042391,0.005109986290335655,0.0019342167070135474,0.004826852586120367,-0.0021173858549445868,-0.0022332491353154182,-0.0004247582401148975,-0.003798522986471653,0.00038840010529384017,0.0022172797471284866,-0.006280323024839163,-0.007924836128950119,0.00046939635649323463,0.007669319864362478,0.003541012294590473,0.006963915191590786,0.0031984655652195215,0.0007785331108607352,-0.006734368856996298,-0.00701282499358058,0.0065978821367025375,-0.0014829643769189715,0.007206542417407036,-0.002689479850232601,-0.007491654716432095,-0.007554797455668449,0.000426261976826936,-0.003307378152385354,-0.0013022092171013355,-0.002813913393765688,-0.005616216920316219,-0.005037547554820776,-0.005481714382767677,-0.007400141097605228,-0.006884533911943436,0.002671564696356654,-0.002913718344643712,-5.24939205206465e-05,0.007651910185813904,0.0009839265840128064,-0.0070876977406442165,-0.0006983017083257437,0.005889476742595434,-0.0011757116299122572,0.004702093079686165,-0.0011430471204221249,-0.004178674891591072,0.004804267082363367,-0.0023697437718510628,0.003897198010236025,-0.0142428083345294,-0.0080240648239851,-0.0027684057131409645,-0.006377904210239649,0.009644911624491215,0.013665839098393917,0.009657599963247776,0.012993580661714077,-0.004962602164596319,-0.01250487845391035,-0.0013530227588489652,0.001891392981633544,9.006005711853504e-05,-0.008629919961094856,0.0019712536595761776,0.00932253710925579,-0.007398911751806736,-0.0032335149589926004,-0.03696579858660698,0.0013150651939213276,0.0009624955127947032,-0.016642197966575623,0.0036938730627298355,0.00475682970136404,0.017125168815255165,0.011642811819911003,0.008950425311923027,0.0212385356426239,-0.01236299704760313,0.02017541043460369,0.010390358977019787,0.005180324427783489,0.007398080080747604,-0.008122666738927364,-0.01915871724486351,-0.00018959050066769123,0.02241162396967411,0.0023561869747936726,0.003728562267497182,0.014529380947351456,-0.008861050941050053,-0.011183235794305801,0.00481022521853447,0.008094079792499542,-0.0034939858596771955,-0.008968185633420944,0.004446524661034346,-0.0033132312819361687,0.02285945415496826,0.0042738099582493305,0.001622360316105187,-0.016874084249138832,-0.00798086542636156,-0.0230935737490654,0.03532770276069641,0.0018782116239890456,-0.029727887362241745,-0.02030695416033268,0.0055946167558431625,-0.021974366158246994,0.0017065443098545074,0.0015403588768094778,0.00689676171168685,0.023769419640302658,-0.014577401801943779,0.011449558660387993,-0.0006733378395438194,-0.011922353878617287,0.037568483501672745,-0.0007055480964481831,0.0006643265369348228,-0.011070968583226204,-0.02785041183233261,0.0014375851023942232,0.010307244956493378,-0.02321864664554596,0.0014364710077643394,-0.005117666907608509,0.01468372531235218,0.023289699107408524,0.009948080405592918,-0.011513765901327133,-0.005623922683298588,-0.004363678395748138,-0.0059468913823366165,-0.006565356161445379,0.01921733468770981,0.0013734159292653203,-0.02277904562652111,-0.014114737510681152,0.02229023352265358,-0.01910337060689926,-0.022069711238145828,0.0024455441161990166,-0.03800266236066818,-0.010611163452267647,-0.0024507753551006317,0.014019387774169445,-0.01211622916162014,0.020271161571145058,-0.013096032664179802,-0.00019119825446978211,-0.00020770687842741609,-0.0034759303089231253,-0.0021791167091578245,0.03311455622315407,0.024117659777402878,-0.009091646410524845,0.04075819253921509,-0.035506218671798706,0.014370344579219818,0.02386917918920517,95.0

        之后我们对用户的评分数据进行了分级和归一化处理,结果如下:

F0,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12,F13,F14,F15,F16,F17,F18,F19,F20,F21,F22,F23,F24,F25,F26,F27,F28,F29,F30,F31,F32,F33,F34,F35,F36,F37,F38,F39,F40,F41,F42,F43,F44,F45,F46,F47,F48,F49,F50,F51,F52,F53,F54,F55,F56,F57,F58,F59,F60,F61,F62,F63,F64,F65,F66,F67,F68,F69,F70,F71,F72,F73,F74,F75,F76,F77,F78,F79,F80,F81,F82,F83,F84,F85,F86,F87,F88,F89,F90,F91,F92,F93,F94,F95,F96,F97,F98,F99,F100,F101,F102,F103,F104,F105,F106,F107,F108,F109,F110,F111,F112,F113,F114,F115,F116,F117,F118,F119,F120,F121,F122,F123,F124,F125,F126,F127,F128,F129,F130,F131,F132,F133,F134,F135,F136,F137,F138,F139,F140,F141,F142,F143,F144,F145,F146,F147,F148,F149,F150,F151,F152,F153,F154,F155,F156,F157,F158,F159,F160,F161,F162,F163,F164,F165,F166,F167,F168,F169,F170,F171,F172,F173,F174,F175,F176,F177,F178,F179,F180,F181,F182,F183,F184,F185,F186,F187,F188,F189,F190,F191,F192,F193,F194,F195,F196,F197,F198,F199,rank
-0.00905437208712101,-0.0007396119763143361,0.008964900858700275,0.007147061172872782,-0.002938180463388562,-0.006451033987104893,0.0006823862786404788,0.008351589553058147,-0.004327930044382811,-0.00039664877112954855,0.014263618737459183,0.0002304373774677515,0.007386152166873217,0.012727610766887665,-0.004937054589390755,0.010642743669450283,-0.0061443764716386795,-0.0030952980741858482,-0.000960861740168184,0.012492218986153603,0.004629271570593119,0.004056858364492655,-0.008649991825222969,-0.00019251192861702293,-0.0025250220205634832,0.000613546057138592,0.0020741717889904976,-0.0004906861577183008,-0.013737712986767292,-0.0005791311850771308,0.01548623014241457,0.007403503637760878,0.00670241005718708,0.00947426725178957,-0.003177322680130601,0.0032764971256256104,0.0006224086391739547,0.00381011632271111,0.004719317425042391,0.005109986290335655,0.0019342167070135474,0.004826852586120367,-0.0021173858549445868,-0.0022332491353154182,-0.0004247582401148975,-0.003798522986471653,0.00038840010529384017,0.0022172797471284866,-0.006280323024839163,-0.007924836128950119,0.00046939635649323463,0.007669319864362478,0.003541012294590473,0.006963915191590786,0.0031984655652195215,0.0007785331108607352,-0.006734368856996298,-0.00701282499358058,0.0065978821367025375,-0.0014829643769189715,0.007206542417407036,-0.002689479850232601,-0.007491654716432095,-0.007554797455668449,0.000426261976826936,-0.003307378152385354,-0.0013022092171013355,-0.002813913393765688,-0.005616216920316219,-0.005037547554820776,-0.005481714382767677,-0.007400141097605228,-0.006884533911943436,0.002671564696356654,-0.002913718344643712,-5.24939205206465e-05,0.007651910185813904,0.0009839265840128064,-0.0070876977406442165,-0.0006983017083257437,0.005889476742595434,-0.0011757116299122572,0.004702093079686165,-0.0011430471204221249,-0.004178674891591072,0.004804267082363367,-0.0023697437718510628,0.003897198010236025,-0.0142428083345294,-0.0080240648239851,-0.0027684057131409645,-0.006377904210239649,0.009644911624491215,0.013665839098393917,0.009657599963247776,0.012993580661714077,-0.004962602164596319,-0.01250487845391035,-0.0013530227588489652,0.001891392981633544,0.008389173075556755,-0.0021944893524050713,0.005254175513982773,-0.0010480922646820545,0.002954872790724039,0.005937131121754646,-0.012680958956480026,-0.0008859079098328948,0.006242068950086832,0.0013998642098158598,-0.0046497974544763565,0.004769282415509224,0.0024638278409838676,0.007718070410192013,-0.0030265755485743284,0.0020163003355264664,-0.0029373448342084885,0.0026505468413233757,0.0036762775853276253,-0.001057810615748167,0.010048923082649708,0.0023740148171782494,0.0021975813433527946,-0.00345548614859581,0.0007476559840142727,0.003124898299574852,0.005844129715114832,0.0013669220497831702,-0.00033092708326876163,-0.011868903413414955,-0.0028881989419460297,0.007996010594069958,-0.005054941400885582,-0.006188235245645046,-9.678240166977048e-05,0.004327509552240372,0.018809569999575615,0.002290343400090933,-0.0009504191693849862,-0.006280957255512476,-0.010345006361603737,-0.015455245971679688,0.0051696039736270905,0.011396088637411594,-0.014725587330758572,-0.0024055838584899902,0.0010706465691328049,-0.00768823828548193,-0.002703784964978695,-0.0015396936796605587,0.005851196125149727,0.0019892766140401363,0.0023974033538252115,0.005399889778345823,-0.012511718086898327,0.0017787865363061428,0.011473720893263817,0.003320086747407913,-0.005707764998078346,0.0031185653060674667,0.007588379550725222,0.009407034143805504,-0.0020049791783094406,-0.011117514222860336,0.008930917829275131,0.007140926085412502,0.00844403076916933,0.00846918299794197,-0.006229204125702381,-0.0028119836933910847,0.0020610331557691097,-0.006508949678391218,0.0027775117196142673,-0.0016185733256861567,0.005066308658570051,0.0014820161741226912,-0.013164183124899864,-0.01331096887588501,0.0032651261426508427,0.005103863310068846,-0.006894501857459545,0.004967343993484974,-0.010701723396778107,-0.005730992183089256,-0.0036964653991162777,0.012735491618514061,0.0025388621725142,0.003371666884049773,-0.015071623958647251,-0.010324200615286827,-0.0067957928404212,0.002820124151185155,-0.00553283654153347,0.0034825624898076057,0.007049893960356712,-0.0062140803784132,0.005579421296715736,-0.013508349657058716,0.0011655841954052448,-0.0015429649502038956,4
-0.00905437208712101,-0.0007396119763143361,0.008964900858700275,0.007147061172872782,-0.002938180463388562,-0.006451033987104893,0.0006823862786404788,0.008351589553058147,-0.004327930044382811,-0.00039664877112954855,0.014263618737459183,0.0002304373774677515,0.007386152166873217,0.012727610766887665,-0.004937054589390755,0.010642743669450283,-0.0061443764716386795,-0.0030952980741858482,-0.000960861740168184,0.012492218986153603,0.004629271570593119,0.004056858364492655,-0.008649991825222969,-0.00019251192861702293,-0.0025250220205634832,0.000613546057138592,0.0020741717889904976,-0.0004906861577183008,-0.013737712986767292,-0.0005791311850771308,0.01548623014241457,0.007403503637760878,0.00670241005718708,0.00947426725178957,-0.003177322680130601,0.0032764971256256104,0.0006224086391739547,0.00381011632271111,0.004719317425042391,0.005109986290335655,0.0019342167070135474,0.004826852586120367,-0.0021173858549445868,-0.0022332491353154182,-0.0004247582401148975,-0.003798522986471653,0.00038840010529384017,0.0022172797471284866,-0.006280323024839163,-0.007924836128950119,0.00046939635649323463,0.007669319864362478,0.003541012294590473,0.006963915191590786,0.0031984655652195215,0.0007785331108607352,-0.006734368856996298,-0.00701282499358058,0.0065978821367025375,-0.0014829643769189715,0.007206542417407036,-0.002689479850232601,-0.007491654716432095,-0.007554797455668449,0.000426261976826936,-0.003307378152385354,-0.0013022092171013355,-0.002813913393765688,-0.005616216920316219,-0.005037547554820776,-0.005481714382767677,-0.007400141097605228,-0.006884533911943436,0.002671564696356654,-0.002913718344643712,-5.24939205206465e-05,0.007651910185813904,0.0009839265840128064,-0.0070876977406442165,-0.0006983017083257437,0.005889476742595434,-0.0011757116299122572,0.004702093079686165,-0.0011430471204221249,-0.004178674891591072,0.004804267082363367,-0.0023697437718510628,0.003897198010236025,-0.0142428083345294,-0.0080240648239851,-0.0027684057131409645,-0.006377904210239649,0.009644911624491215,0.013665839098393917,0.009657599963247776,0.012993580661714077,-0.004962602164596319,-0.01250487845391035,-0.0013530227588489652,0.001891392981633544,-0.022592157125473022,0.03406761959195137,-0.011294135823845863,-0.03557229042053223,0.006008323282003403,0.000521798268891871,0.04826105386018753,-0.012627901509404182,0.0129550751298666,-0.01717522367835045,0.013481661677360535,0.013365393504500389,-0.036065325140953064,-0.0078732343390584,-0.01379341073334217,0.015099458396434784,-0.00837993435561657,-0.018873106688261032,-0.0025675371289253235,-0.022314341738820076,-0.01362605020403862,-0.012553971260786057,0.014811022207140923,-0.019477257505059242,-0.007187745068222284,-0.0029132734052836895,-0.0005756211467087269,-0.018443219363689423,0.009171165525913239,0.008502017706632614,0.010730217210948467,-0.024739541113376617,0.023262105882167816,0.024543697014451027,-0.010074739344418049,0.023082096129655838,-0.05538586899638176,0.013144426979124546,-0.0034713721834123135,0.017734598368406296,0.002445317804813385,0.06830388307571411,-0.04603378847241402,-0.017839286476373672,0.02907683700323105,0.057052962481975555,-0.006418359465897083,0.04278741776943207,-0.018171297386288643,-0.039012014865875244,0.0031849518418312073,-0.048897162079811096,0.03169402852654457,-0.0037552379071712494,-0.0599692165851593,0.028303690254688263,-0.0270870141685009,-0.005421224981546402,0.02729225344955921,0.0005637332797050476,0.0309995636343956,0.0162825807929039,-0.01295829564332962,0.06263034790754318,0.014025387354195118,0.027204040437936783,-0.050342194736003876,-0.036244578659534454,-0.015098728239536285,0.005250332877039909,0.015661250799894333,0.004092647694051266,-0.001389537937939167,0.00746307335793972,-0.027304377406835556,-0.025669891387224197,0.050152361392974854,0.026778768748044968,-0.04022226482629776,0.03715263307094574,-0.007856795564293861,-0.007773062214255333,0.06832816451787949,0.02114926651120186,0.016750771552324295,-0.013472440652549267,0.010191161185503006,-0.014516017399728298,-0.0029776711016893387,0.0033395399805158377,-0.00017385557293891907,-0.00641307607293129,0.0037895026616752148,-0.033165350556373596,-0.05743199586868286,0.029474055394530296,-0.0629713237285614,0.058291152119636536,-0.0018130820244550705,-0.014646890573203564,4
-0.00905437208712101,-0.0007396119763143361,0.008964900858700275,0.007147061172872782,-0.002938180463388562,-0.006451033987104893,0.0006823862786404788,0.008351589553058147,-0.004327930044382811,-0.00039664877112954855,0.014263618737459183,0.0002304373774677515,0.007386152166873217,0.012727610766887665,-0.004937054589390755,0.010642743669450283,-0.0061443764716386795,-0.0030952980741858482,-0.000960861740168184,0.012492218986153603,0.004629271570593119,0.004056858364492655,-0.008649991825222969,-0.00019251192861702293,-0.0025250220205634832,0.000613546057138592,0.0020741717889904976,-0.0004906861577183008,-0.013737712986767292,-0.0005791311850771308,0.01548623014241457,0.007403503637760878,0.00670241005718708,0.00947426725178957,-0.003177322680130601,0.0032764971256256104,0.0006224086391739547,0.00381011632271111,0.004719317425042391,0.005109986290335655,0.0019342167070135474,0.004826852586120367,-0.0021173858549445868,-0.0022332491353154182,-0.0004247582401148975,-0.003798522986471653,0.00038840010529384017,0.0022172797471284866,-0.006280323024839163,-0.007924836128950119,0.00046939635649323463,0.007669319864362478,0.003541012294590473,0.006963915191590786,0.0031984655652195215,0.0007785331108607352,-0.006734368856996298,-0.00701282499358058,0.0065978821367025375,-0.0014829643769189715,0.007206542417407036,-0.002689479850232601,-0.007491654716432095,-0.007554797455668449,0.000426261976826936,-0.003307378152385354,-0.0013022092171013355,-0.002813913393765688,-0.005616216920316219,-0.005037547554820776,-0.005481714382767677,-0.007400141097605228,-0.006884533911943436,0.002671564696356654,-0.002913718344643712,-5.24939205206465e-05,0.007651910185813904,0.0009839265840128064,-0.0070876977406442165,-0.0006983017083257437,0.005889476742595434,-0.0011757116299122572,0.004702093079686165,-0.0011430471204221249,-0.004178674891591072,0.004804267082363367,-0.0023697437718510628,0.003897198010236025,-0.0142428083345294,-0.0080240648239851,-0.0027684057131409645,-0.006377904210239649,0.009644911624491215,0.013665839098393917,0.009657599963247776,0.012993580661714077,-0.004962602164596319,-0.01250487845391035,-0.0013530227588489652,0.001891392981633544,0.019901975989341736,-0.015747029334306717,0.0033654612489044666,-0.03632471710443497,0.09456537663936615,0.0670727863907814,0.1920902580022812,0.04572567716240883,0.04062191769480705,0.1240314394235611,0.016196109354496002,0.07990884780883789,-0.09689442068338394,-0.06618727743625641,-0.041430652141571045,-0.0523345023393631,0.03341924399137497,-0.11789211630821228,-0.0317903570830822,-0.052808865904808044,-0.007092255167663097,0.02028701640665531,0.1428583711385727,-0.027414098381996155,-0.08579147607088089,0.00399013003334403,0.0071861702017486095,-0.0332927480340004,0.004831282421946526,0.030836742371320724,-0.10060083121061325,-0.022930579259991646,-0.04553624615073204,0.010508889332413673,-0.010346035473048687,0.023842399939894676,-0.0030776262283325195,-0.07267031073570251,-0.08600760996341705,0.07088109850883484,0.039998531341552734,0.0498877577483654,-0.13703669607639313,0.0275780837982893,0.118706613779068,0.09653228521347046,0.04980337619781494,0.09088883548974991,-0.0017204303294420242,-0.03217220678925514,0.005900798365473747,-0.09877994656562805,0.10114317387342453,0.004953338764607906,-0.033393122255802155,0.027972711250185966,-0.22143733501434326,-0.046712614595890045,-0.03027639538049698,0.03798357769846916,0.13159975409507751,-0.013990684412419796,0.003518206998705864,0.035273727029561996,0.10022589564323425,-0.008735567331314087,-0.04597346484661102,-0.04725199192762375,-0.04423960670828819,0.08597318828105927,0.020118530839681625,-0.02840239368379116,-0.037039615213871,0.026149779558181763,-0.07027353346347809,-0.030569007620215416,0.14087867736816406,0.04950963705778122,-0.08218803256750107,0.03112017922103405,0.10208556801080704,0.024767188355326653,0.1622713804244995,0.01481152419000864,0.017544297501444817,-0.14008687436580658,-0.0005397782661020756,-0.12663060426712036,0.06454144418239594,-0.02462538704276085,-0.08439943194389343,0.027440207079052925,-0.015044237487018108,-0.1196710616350174,-0.11459940671920776,0.0013232259079813957,-0.20382101833820343,0.07837190479040146,-0.01083550974726677,-0.1029856726527214,5
-0.00905437208712101,-0.0007396119763143361,0.008964900858700275,0.007147061172872782,-0.002938180463388562,-0.006451033987104893,0.0006823862786404788,0.008351589553058147,-0.004327930044382811,-0.00039664877112954855,0.014263618737459183,0.0002304373774677515,0.007386152166873217,0.012727610766887665,-0.004937054589390755,0.010642743669450283,-0.0061443764716386795,-0.0030952980741858482,-0.000960861740168184,0.012492218986153603,0.004629271570593119,0.004056858364492655,-0.008649991825222969,-0.00019251192861702293,-0.0025250220205634832,0.000613546057138592,0.0020741717889904976,-0.0004906861577183008,-0.013737712986767292,-0.0005791311850771308,0.01548623014241457,0.007403503637760878,0.00670241005718708,0.00947426725178957,-0.003177322680130601,0.0032764971256256104,0.0006224086391739547,0.00381011632271111,0.004719317425042391,0.005109986290335655,0.0019342167070135474,0.004826852586120367,-0.0021173858549445868,-0.0022332491353154182,-0.0004247582401148975,-0.003798522986471653,0.00038840010529384017,0.0022172797471284866,-0.006280323024839163,-0.007924836128950119,0.00046939635649323463,0.007669319864362478,0.003541012294590473,0.006963915191590786,0.0031984655652195215,0.0007785331108607352,-0.006734368856996298,-0.00701282499358058,0.0065978821367025375,-0.0014829643769189715,0.007206542417407036,-0.002689479850232601,-0.007491654716432095,-0.007554797455668449,0.000426261976826936,-0.003307378152385354,-0.0013022092171013355,-0.002813913393765688,-0.005616216920316219,-0.005037547554820776,-0.005481714382767677,-0.007400141097605228,-0.006884533911943436,0.002671564696356654,-0.002913718344643712,-5.24939205206465e-05,0.007651910185813904,0.0009839265840128064,-0.0070876977406442165,-0.0006983017083257437,0.005889476742595434,-0.0011757116299122572,0.004702093079686165,-0.0011430471204221249,-0.004178674891591072,0.004804267082363367,-0.0023697437718510628,0.003897198010236025,-0.0142428083345294,-0.0080240648239851,-0.0027684057131409645,-0.006377904210239649,0.009644911624491215,0.013665839098393917,0.009657599963247776,0.012993580661714077,-0.004962602164596319,-0.01250487845391035,-0.0013530227588489652,0.001891392981633544,9.006005711853504e-05,-0.008629919961094856,0.0019712536595761776,0.00932253710925579,-0.007398911751806736,-0.0032335149589926004,-0.03696579858660698,0.0013150651939213276,0.0009624955127947032,-0.016642197966575623,0.0036938730627298355,0.00475682970136404,0.017125168815255165,0.011642811819911003,0.008950425311923027,0.0212385356426239,-0.01236299704760313,0.02017541043460369,0.010390358977019787,0.005180324427783489,0.007398080080747604,-0.008122666738927364,-0.01915871724486351,-0.00018959050066769123,0.02241162396967411,0.0023561869747936726,0.003728562267497182,0.014529380947351456,-0.008861050941050053,-0.011183235794305801,0.00481022521853447,0.008094079792499542,-0.0034939858596771955,-0.008968185633420944,0.004446524661034346,-0.0033132312819361687,0.02285945415496826,0.0042738099582493305,0.001622360316105187,-0.016874084249138832,-0.00798086542636156,-0.0230935737490654,0.03532770276069641,0.0018782116239890456,-0.029727887362241745,-0.02030695416033268,0.0055946167558431625,-0.021974366158246994,0.0017065443098545074,0.0015403588768094778,0.00689676171168685,0.023769419640302658,-0.014577401801943779,0.011449558660387993,-0.0006733378395438194,-0.011922353878617287,0.037568483501672745,-0.0007055480964481831,0.0006643265369348228,-0.011070968583226204,-0.02785041183233261,0.0014375851023942232,0.010307244956493378,-0.02321864664554596,0.0014364710077643394,-0.005117666907608509,0.01468372531235218,0.023289699107408524,0.009948080405592918,-0.011513765901327133,-0.005623922683298588,-0.004363678395748138,-0.0059468913823366165,-0.006565356161445379,0.01921733468770981,0.0013734159292653203,-0.02277904562652111,-0.014114737510681152,0.02229023352265358,-0.01910337060689926,-0.022069711238145828,0.0024455441161990166,-0.03800266236066818,-0.010611163452267647,-0.0024507753551006317,0.014019387774169445,-0.01211622916162014,0.020271161571145058,-0.013096032664179802,-0.00019119825446978211,-0.00020770687842741609,-0.0034759303089231253,-0.0021791167091578245,0.03311455622315407,0.024117659777402878,-0.009091646410524845,0.04075819253921509,-0.035506218671798706,0.014370344579219818,0.02386917918920517,4

       在搭建深度学习之前,我们简单借助于机器学习模型来进行了简单的实验,首先基于上述的得到的特征做了简单的分析与可视化:

     性能对比结果如下:

       由于特征数量较多,但是通过分析发现部分数据特征冗余度较高,这里做了初步的特征选择工作,之后重复进行了上面的几个步骤,得到的结果示意图如下所示:

       接下来就需要搭建深度学习模型了,这里的模型可以使用的种类也是很多的,这里由于项目的原因我公开一种基于DNN的baseline的做法,感兴趣的话可以继续深入研究,下面是具体的代码实现:

def deepModel(data='dataset.json',saveDir='music/DNN/'):
    '''
    深度学习网络模型
    '''
    if not os.path.exists(saveDir):
        os.makedirs(saveDir)
    scaler,X_train,X_test,y_train,y_test=getVector(data=data)
    model=Sequential()
    model.add(Dense(1024,input_dim=X_train.shape[1]))
    model.add(Dropout(0.3))
    model.add(Dense(1024,activation='linear'))
    model.add(Dropout(0.3))
    model.add(Dense(1024,activation='sigmoid'))
    model.add(Dropout(0.3))
    model.add(Dense(1,activation='tanh'))  #softmax  relu  tanh
    optimizer=Adam(lr=0.002,beta_1=0.9,beta_2=0.999,epsilon=1e-08)
    model.compile(loss='mae',optimizer=optimizer)
    early_stopping=EarlyStopping(monitor='val_loss',patience=20)
    checkpointer=ModelCheckpoint(filepath=saveDir+'checkpointer.hdf5',verbose=1,save_best_only=True)  
    history=model.fit(X_train,y_train,batch_size=128,epochs=50,validation_split=0.3,verbose=1,shuffle=True,
                          callbacks=[checkpointer,early_stopping])  #validation_data=(X_validation,y_validation)
    lossdata,vallossdata=history.history['loss'],history.history['val_loss'] 
    plot_both_loss_acc_pic(lossdata,vallossdata,picpath=saveDir+'both_loss_epoch.png')
    y_predict=model.predict(X_test)
    y_predict_list=scaler.inverse_transform(y_predict.reshape(-1,1))
    y_true_list=scaler.inverse_transform(y_test.reshape(-1,1))
    # 拟合结果的评估和可视化
    y_predict_list=[int(one[0]) for one in y_predict_list.tolist()]
    y_true_list=[int(one[0]) for one in y_true_list.tolist()]
    res_list=calPerformance(y_true_list,y_predict_list)
    P=pearsonr(y_true_list,y_predict_list)[0]
    S=spearmanr(y_true_list,y_predict_list)[0]
    K=kendalltau(y_true_list,y_predict_list)[0]
    print('pearsonr: ',P)
    print('spearmanr: ',S)
    print('kendalltau: ',K)
    res_dict={}
    model.save(saveDir+'DL.model')
    plot_model(model,to_file=saveDir+'model_structure.png', show_shapes=True)  
    model_summary=model.summary()  
    print('-------------------------model_summary---------------------------------')
    print(model_summary)

      完成模型的代建和特征数据的生成处理之后就可以对其进行训练了,训练过程截图如下:

       在得到离线模型之后,我们就可以基于训练好的模型来进行推荐分析了,对单个指定用户的推荐具体代码实现如下:

def singleUserRecommend(userId='2230728513',model_path='results/music/DL/DL.model'):
    '''
    输入用户id,输出推荐的内容
    '''
    #加载歌、id数据
    with open('data/music/id_song.json') as f:
        song_dict=json.load(f)
    #加载原始的推荐清单数据
    with open('data/music/neteasy_playlist_recommend_data.csv') as f:
        data_list=[one.strip().split(',') for one in f.readlines() if one]
    user_song={}
    user,song=[],[]
    for i in range(len(data_list)):
        one_list=[]
        userId,songId,rating,T=data_list[i]
        user.append(userId)
        song.append(songId)
        if userId in user_song:
            user_song[userId].append(songId)
        else:
            user_song[userId]=[songId]
    user=list(set(user))
    song=list(set(song))
    with open('data/music/user2Vec.json') as U:
        user_vector=json.load(U)
    with open('data/music/song2Vec.json') as S:
        song_vector=json.load(S)
    model=load_model(model_path)
    try:
        one_song_list=user_song[userId]
        no_listen_list=[one for one in song if one not in one_song_list]  #获取没有听过的歌
        userVec=user_vector[userId]
        one_no_dict={}
        for one_no in no_listen_list:
            try:
                one=[]
                # print('user: ',userId)
                # print('song: ',one_no)
                one+=user_vector[userId]
                one+=song_vector[one_no]
                X=np.array([one])
                score=model.predict(X)
                y_pre=score.tolist()[0]
                # print('score: ',y_pre)
                one_no_dict[one_no]=y_pre
            except Exception as e:
                print('Exception1: ',e)
        one_no_sorted=sorted(one_no_dict.items(),key=lambda e:e[1],reverse=True)
        recommend_id_list=[one[0] for one in one_no_sorted][:10]
        for oneId in recommend_id_list:
            print('songId: ',oneId)
            print('songName: ',song_dict[oneId])
    except Exception as e:
        print('Exception2: ',e)

       结果输出如下所示:

       从结果上来看,这位用户应该是蛮喜欢日语歌曲的吧,一不小心就泄漏了用户的小秘密了哈。

       到这里,本文分享的推荐系统实践就到此结束了,如果有对这方面感兴趣的同学欢迎交流学习,共同进步!希望我的文章对您有所帮助,祝您工作顺利,学有所成!

2016-03-04 17:00:25 to_fung 阅读数 111
  • 机器学习&深度学习系统实战!

    购买课程后,可扫码进入学习群,获取唐宇迪老师答疑 数学原理推导与案例实战紧密结合,由机器学习经典算法过度到深度学习的世界,结合深度学习两大主流框架Caffe与Tensorflow,选择经典项目实战人脸检测与验证码识别。原理推导,形象解读,案例实战缺一不可!具体课程内容涉及回归算法原理推导、决策树与随机森林、实战样本不均衡数据解决方案、支持向量机、Xgboost集成算法、神经网络基础、神经网络整体架构、卷积神经网络、深度学习框架--Tensorflow实战、案例实战--验证码识别、案例实战--人脸检测。 专属会员卡优惠链接:http://edu.csdn.net/lecturer/1079

    39706 人正在学习 去看看 唐宇迪

最近在做音乐情绪识别的课题,需要数据集,据说这个数据集比较全

2018-09-08 20:05:06 hz371071798 阅读数 3957
  • 机器学习&深度学习系统实战!

    购买课程后,可扫码进入学习群,获取唐宇迪老师答疑 数学原理推导与案例实战紧密结合,由机器学习经典算法过度到深度学习的世界,结合深度学习两大主流框架Caffe与Tensorflow,选择经典项目实战人脸检测与验证码识别。原理推导,形象解读,案例实战缺一不可!具体课程内容涉及回归算法原理推导、决策树与随机森林、实战样本不均衡数据解决方案、支持向量机、Xgboost集成算法、神经网络基础、神经网络整体架构、卷积神经网络、深度学习框架--Tensorflow实战、案例实战--验证码识别、案例实战--人脸检测。 专属会员卡优惠链接:http://edu.csdn.net/lecturer/1079

    39706 人正在学习 去看看 唐宇迪

LSTM(Long Short-term Memory,长短期记忆网络)是一种用于处理序列数据的神经网络模型。它可以被应用于语言模型、机器翻译、图片标注、音乐自动化生成等等,在深度学习与大数据中有着不可或缺的地位。

深度学习(Deep Learning)

为什么我们需要深度学习

  • 随着数据量的爆炸式增长以及计算机性能的提升,传统的神经网络因为其自身的局限性限制了它们进一步提升效率与性能从而处理大数据问题的可能;
  • 传统的机器学习往往无法处理原始格式的数据。比如,当要进行图片分类时,我们并不会将整张图片的所有像素值作为输入,而是先人为地提取出图片的特征,再转化为数字形式,然后用来训练网络。

什么是深度学习

  • 让我们首先了解表示学习(Representation learning):表示学习是一种使机器能自动检测出原始数据特征的学习方法;
  • 深度学习是一种表示学习,它用于许多表示层,这些层以简单但是非线性的模块的形式组合在一起,每一层都会讲表示转化为更高层的抽象传递给下一层。

如何构造深度学习模型

在深度学习中有2种模型最为常见:

  • CNN(Convolutional Neural Network,卷积神经网络):它可以被看做是标准神经网络的进阶版。它包含了卷积层(convolutional layer)、池化层(pooling layer)、全连接成(full-connected layer)。这些结果使得它能够接收图片的完整像素值,而不再需要人工特征提取;
  • RNN(Recurrent Neural Network,循环神经网络):它是与CNN完全不同的模型,专用于展示序列数据的动态行为。

循环神经网络(RNN)

RNN是一种人工神经网络,它通过节点之间的连接构成了一个有向图。

它主要拥有3个特征:

  • 结合内部状态(internal state)来处理输入序列;
  • 展示一个时间序列的时间动态行为;
  • 处理连续性的、相关联的任务。

算法

前向传播算法(Forward propagation)

相比于传统神经网络,RNN首要的区别在于它维护了一个内部状态,也就是说当数据序列中一个新的数据输入时,输出并不仅仅通过输入,还通过前一时刻的内部状态进行计算,并且在计算过程中,这个内部状态会相应地更新。

假设时间序列xx1, x2, x3...构成,当x1输入时,a0将被更新至a1,同时y_hat1将计算出来,然后在x2输入时,a1将被更新至a2,同时y_hat2将计算出来...一次类推。

反向传播法(Backpropagation)

与神经网络类似,RNN也用Backpropagation进行训练更新其权重与偏差,最小化损失函数(loss function),从而让网络输出尽可能拟合真实数据。

通常情况下,RNN框架会实现这一算法,因此作为使用者不需要了解其内部的数学逻辑,只需要对一些关键参数进行设置即可。

类型

RNN可以分为4中类型。

一对一(one-to-one)

这种RNN等价于传统神经网络,无论是输入还是输出都只有一个数据。

多对一(many-to-one)

这种RNN会接收一段数据序列,而仅仅产生最后一个结果。例如,我们想要判断一句话的情感,那么网络会将这句话的词作为序列依次输入,最终产生一个数值对应相应的情感。

一对多(one-to-many)

与多对一相反,这种RNN只接受一个输入,但会产生一系列连续输出。例如,在自动化生成音乐中,我们输入一个体裁所对应的数值,那么RNN会输入这个体裁的整首歌的音符。

多对多(many-to-many)

这是最为常见的类型,它们会接收一段数据序列,同时也会输出一段数据序列。此外,输入数据序列的数据个数与输出数据序列的数据个数可以相等,也可以不相等。

梯度消失

与其它使用Backpropagation算法的神经网络一样,RNN也拥有着梯度消失的问题,也就是说,当数据序列足够长时,靠后的时间步长的错误会很难影响到先前时间步长的计算。这也意味着RNN很难将距离较远的上下文信息关联起来,这个问题也被称为长期依赖问题(Long-time dependencies/Long-term dependencies)

例如,我们有两句话

The cat, which already had fish and milk, is full.

The cats, which already had fish and mile, are full.

在这种情况下,RNN可能无法判断其系动词应该是单数还是负数,因为系动词的位置距离主语相对较远。

因此,LSTM(Long Short-term Memory,长短期记忆网络)被设计出来解决这一问题。


长短期记忆网络(LSTM)

LSTM是RNN的一种变体,它拥有着解决长期依赖问题的能力。

与RNN相比,LSTM用cell取代了内部存储,它将数据维护在这个cell中,被称为cell state。这个cell state贯穿整个LSTM架构,仅仅只有少量的线性交互,这就使得信息可以在传递的过程中保持不变。

此外,相比于RNN仅有1个网络层,LSTM拥有着4个,它们以特别的方式进行交互。一个标准的LSTM单元由以下4个部分构成。

架构

Forget Gate

首先,在新数据传入LSTM时,我们要决定哪些旧数据需要从cell state中扔掉。这个就是由forget gate决定的,它是一个sigmoid函数层。

Input Gate

第二步是决定哪些新的信息需要被存储进cell state,分为两个步骤。首先,一个sigmoid函数层,即inpu gate会决定哪些值需要被更新;然后,一个tanh函数层会创建一个向量,作为加入到cell state的候选值。

Cell

现在我们可以更新cell state了,这同样分为两步。首先,我们从cell state移除掉我们在forget gate决定的信息;然后,我们以决定对每一个状态值更新的比例来加入input gate计算出的候选值。

Output Gate

最后,我们决定将要输出的部分。输出是基于新的cell state,同时进行适当的处理。首先,我们通过一个sigmoid函数层来决定cell state中有哪些部分需要被更新,然后,我们将cell state经过一个tanh函数处理(其目的是使得数值落在(-1,1)区间内),并将其余sigmoid层的输出相乘,从而决定输出的部分。

LSTM的变体

在实际使用在,几乎每一篇涉及到LSTM的paper都会对LSTM进行适当的改进,这里我们简要介绍3种最常见的LSTM变体。

加入“peephole connections”

将forget gate与input gate进行耦合

GRU(Gated Recurrent Unit)

LSTM的应用

现在LSTM已经成为处理拥有长期依赖问题的序列数据问题的首要方法之一:

  • 语言模型(以字母/单词为单位);
  • 机器翻译;
  • 图片标注;
  • 图片自动化生成;
  • 音乐自动化生成;
  • ............

参考文献


英文版博文地址:https://ertong.blog/2018/09/07/deep-into-lstm-from-deep-learning

2018-02-16 22:26:40 lqfarmer 阅读数 1372
  • 机器学习&深度学习系统实战!

    购买课程后,可扫码进入学习群,获取唐宇迪老师答疑 数学原理推导与案例实战紧密结合,由机器学习经典算法过度到深度学习的世界,结合深度学习两大主流框架Caffe与Tensorflow,选择经典项目实战人脸检测与验证码识别。原理推导,形象解读,案例实战缺一不可!具体课程内容涉及回归算法原理推导、决策树与随机森林、实战样本不均衡数据解决方案、支持向量机、Xgboost集成算法、神经网络基础、神经网络整体架构、卷积神经网络、深度学习框架--Tensorflow实战、案例实战--验证码识别、案例实战--人脸检测。 专属会员卡优惠链接:http://edu.csdn.net/lecturer/1079

    39706 人正在学习 去看看 唐宇迪

课程描述:

这是一门讲解深度学习方法入门课程,深度学习主要应用于机器翻译,图像识别,游戏,图像生成等等。课程同时设置了两个非常有趣的实战项目:

(1)基于RNN生成音乐

(2)基于X光的基本检测,GitHub地址:github.com/aamini/intro

文末附课程所有视频教程、PPT及配套代码。



课程安排:

Session 1

Part1 深度学习详解

Part2 深度序列建模详解

Lab1 基于RNN生成音乐


Session 2

Part1 深度计算视觉详解

Part2 深度生成模型详解

Lab2 基于X光的基本检测


Session 3

Part1 深度强化学习详解

Part2 深度学习的局限性以及未来研究方向介绍


Session 4

Part1 Guest Lecture: Google

Part2 Guest Lecture: NVIDIA


Session 5

Part1 Guest Lecture: IBM

Part2 Guest Lecture: Tencent


视频及ppt下载地址:

链接:pan.baidu.com/s/1qZ0KDt

密码:公众号回复“mdl”,即可获得密码


往期精彩内容推荐:

OpenAI-2018年强化学习领域7大最新研究方向全盘点

麻省理工学院(MIT)-2018年最新自动驾驶视频课程分享

最前沿的深度学习论文、架构及资源分享

<模型汇总-6>堆叠自动编码器Stacked_AutoEncoder-SAE

<模型汇总_5>生成对抗网络GAN及其变体SGAN_WGAN_CGAN_DCGAN_InfoGAN_StackGAN

纯干货15 48个深度学习相关的平台和开源工具包,一定有很多你不知道的!!!

模型汇总19 强化学习(Reinforcement Learning)算法基础及分类

吴恩达-斯坦福CS229机器学习课程-2017(秋)最新课程分享

神经机器翻译(NMT)的一些重要资源分享

《纯干货16》调整学习速率以优化神经网络训练

《模型汇总-20》深度学习背后的秘密:初学者指南-深度学习激活函数大全

模型汇总22 机器学习相关基础数学理论、概念、模型思维导图分享

机器学习入门

阅读数 219

机器学习——聚类

阅读数 618

没有更多推荐了,返回首页