精华内容
下载资源
问答
  • surprise推荐系统实践

    2019-09-27 17:54:49
    # http://surprise.readthedocs.io/en/stable/index.html # ... from surprise import KNNBasic,SVD from surprise import Dataset from surprise.mo...

    协同过滤

    # http://surprise.readthedocs.io/en/stable/index.html
    # http://files.grouplens.org/datasets/movielens/ml-100k-README.txt
    from surprise import KNNBasic,SVD
    from surprise import Dataset
    from surprise.model_selection import cross_validate
    
    # Load the movielens-100k dataset
    data = Dataset.load_builtin('ml-100k')
    #协同过滤
    algo = KNNBasic()
    cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=3, verbose=True)
    

    在这里插入图片描述
    交叉验证

    from surprise.model_selection import GridSearchCV
    
    #lr_all学习率,reg_all正则化惩罚项
    param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
                  'reg_all': [0.4, 0.6]}
    
    #rmse均方误差与fcp协调分数
    grid_search = GridSearchCV(SVD, param_grid, measures=['rmse', 'fcp'], cv=3)
    grid_search.fit(data)
    
    # best RMSE score
    print(grid_search.best_score['rmse'])
    
    # combination of parameters that gave the best RMSE score
    print(grid_search.best_params['rmse'])
    
    # best FCP score
    print(grid_search.best_score['fcp'])
    
    # combination of parameters that gave the best FCP score
    print(grid_search.best_params['fcp'])
    

    在这里插入图片描述

    import pandas as pd  
    #从字典当中转为表格形式
    results_df = pd.DataFrame.from_dict(grid_search.cv_results)
    results_df
    

    在这里插入图片描述
    矩阵转换并打印推荐

    import io  # needed because of weird encoding of u.item file
    
    from surprise import KNNBaseline
    from surprise import Dataset
    from surprise import get_dataset_dir
    
    
    def read_item_names():
        """Read the u.item file from MovieLens 100-k dataset and return two
        mappings to convert raw ids into movie names and movie names into raw ids.
        """
    
        file_name = get_dataset_dir() + '/ml-100k/ml-100k/u.item'
        rid_to_name = {}
        name_to_rid = {}
        with io.open(file_name, 'r', encoding='ISO-8859-1') as f:
            for line in f:
                line = line.split('|')
                rid_to_name[line[0]] = line[1]
                name_to_rid[line[1]] = line[0]
    
        return rid_to_name, name_to_rid
    
    
    # First, train the algortihm to compute the similarities between items
    data = Dataset.load_builtin('ml-100k')
    #数据是一行一行的,优化需要对矩阵分解,转换为原始的比较稀疏的矩阵
    trainset = data.build_full_trainset()
    #皮尔逊相似度衡量方法,'user_based': False即item_based
    sim_options = {'name': 'pearson_baseline', 'user_based': False}
    #协同过滤,一种统计方法
    algo = KNNBaseline(sim_options=sim_options)
    #训练
    algo.fit(trainset)
    
    # 基于item的协同过滤,手上有电影名字
    rid_to_name, name_to_rid = read_item_names()
    
    # 想知道哪些电影离'Toy Story (1995)'最近
    #直接把名字传进去推荐系统不认识,用之前建的模将电影id找出来
    toy_story_raw_id = name_to_rid['Toy Story (1995)']
    
    #raw是数据样本中的id,build_full_trainset之后需要转换为矩阵id,inner_id
    toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)
    
    # 然后get近邻,k=10近邻
    toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)
    
    # 再将inner_id转换为raw_id
    #再由raw_id转为电影名字
    toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id)
                           for inner_id in toy_story_neighbors)
    toy_story_neighbors = (rid_to_name[rid]
                           for rid in toy_story_neighbors)
    
    print()
    print('The 10 nearest neighbors of Toy Story are:')
    for movie in toy_story_neighbors:
        print(movie)
    

    在这里插入图片描述

    展开全文
  • surprise推荐系统工具 surprise推荐系统工具下有很多推荐算法: 本文主要讲一下Baseline算法和SlopeOne Baseline算法 论文地址:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.4158&rep=rep1&...

    surprise推荐系统工具

    surprise推荐系统工具下有很多推荐算法:
    在这里插入图片描述
    本文主要讲一下Baseline算法和SlopeOne

    Baseline算法

    论文地址:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.4158&rep=rep1&type=pdf
    Baseline算法是一种基于统计基准分数进行预测的算法。
    bui为预测值;
    u为均值;
    bu为用户偏好;
    bi为商品差异。
    在这里插入图片描述
    结合原文进行解释。1.我们知道所有电影的平均评分为3.7,即u=3.7;
    2.然而泰坦尼克号电影是部好电影,评分要好于平均电影的评分,大约高了0.5,即bi=0.5;
    3.然而Joe是个吝啬的人,他打分比较低,一般比平均低0.3,即bu = -0.3
    最后预测Joe对泰坦尼克号的打分为3.7+0.5-0.3 = 3.9
    目标函数为:
    在这里插入图片描述
    如何求目标函数呢?变量有bu和bi两个。可以用ALS或者SGD作为优化方法。
    使用ALS进行优化
    在这里插入图片描述

    Step1,固定bu,优化bi
    Step2,固定bi,优化bu
    ALS、SGD优化方法:
    推荐算法-矩阵分解(Matrix Factorization,MF)

    Slope One算法

    论文地址:https://arxiv.org/pdf/cs/0702144.pdf
    Slope One算法是一种基于item-base的协同过滤算法。
    在这里插入图片描述
    论文指出当对userb-itemj评分进行预测时,查看itemi-itemj之间的差异。即userb-itemj = 2+(1.5-1)=2.5 .表示从usera看itemj比itemi要高0.5,同理,对于userb,itemj也比itemi要高0.5。
    SlopeOne算法:
    Step1,计算Item之间的评分差的均值,记为评分偏差(两个item都评分过的用户):
    在这里插入图片描述
    Step2,根据Item间的评分偏差和用户的历史评分,预测用户对未评分的item的评分:
    在这里插入图片描述
    在这里插入图片描述
    Step3,将预测评分排序,取topN对应的item推荐给用户

    Baseline算法、Slope One算法用到movieLens数据集

    #数据集来源:https://www.kaggle.com/jneupane12/movielens/download
    #baseline论文:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.4158&rep=rep1&type=pdf
    #surprise文档:https://surprise.readthedocs.io/en/stable/
    from surprise import Dataset
    from surprise import Reader
    from surprise import BaselineOnly
    from surprise import accuracy
    from surprise.model_selection import KFold
    
    #数据读取
    reader = Reader(line_format='user item rating timestamp',sep = ',',skip_lines=1)
    data = Dataset.load_from_file('./ratings.csv',reader = reader)
    train_set = data.build_full_trainset()
    
    #ALS优化,优化方式可以选其他的('SGD')
    #设置user、item的正则化项
    bsl_options = {'method':'als','n_epochs':5,'reg_u':12,'reg_i':5}
    model = BaselineOnly(bsl_options=bsl_options)
    
    #k折交叉验证
    kf = KFold(n_splits=5)
    for trainset,testset in kf.split(data):
        model.fit(trainset)
        pred = model.test(testset)
        #计算RMSE
        accuracy.rmse(pred)
    
    uid= str(300)
    iid = str(180)
    
    #输出uid对iid 的预测结果
    pred = model.predict(uid,iid,r_ui=4,verbose=True)
    

    输出结果如下:

    Estimating biases using als...
    RMSE: 0.8586
    Estimating biases using als...
    RMSE: 0.8632
    Estimating biases using als...
    RMSE: 0.8626
    Estimating biases using als...
    RMSE: 0.8641
    Estimating biases using als...
    RMSE: 0.8616
    user: 300        item: 180        r_ui = 4.00   est = 3.58   {'was_impossible': False}
    
    #SlopeOne论文:https://arxiv.org/pdf/cs/0702144.pdf
    from surprise import Dataset
    from surprise import Reader
    from surprise.model_selection import KFold
    from surprise import SlopeOne
    from surprise import accuracy
    
    reader = Reader(line_format= 'user item rating timestamp',sep = ',',skip_lines=1)
    data = Dataset.load_from_file('./ratings.csv',reader = reader)
    trainset = data.build_full_trainset()
    
    #使用SlopeOne算法
    algo = SlopeOne()
    algo.fit(trainset)
    #对指定用户和商品进行评分预测
    uid =str(200)
    iid = str(100)
    pred = algo.predict(uid,iid,r_ui=4,verbose=True)
    

    输出结果如下:

    user: 200        item: 100        r_ui = 4.00   est = 3.41   {'was_impossible': False}
    

    代码地址:推荐系统 surprise工具

    展开全文
  • pip install scikit-surprise 1、使用小例子(‘print_perf’ ,evaluate都不能使,查原代理换成PredictionImpossible) from surprise import Dataset,prediction_algorithms from surprise.model_selection import...

    直接pip安装:

    pip install scikit-surprise

    1、使用小例子(‘print_perf’ ,evaluate都不能使,查原代理换成PredictionImpossible)

    from surprise import Dataset,prediction_algorithms
    from surprise.model_selection import cross_validate
    data = Dataset.load_builtin('ml-100k')
    
    ### 使用NormalPredictor
    
    from surprise import KNNBasic
    from surprise import PredictionImpossible
    algo = KNNBasic()
    
    perf = cross_validate(algo, data, measures=['RMSE', 'MAE'])
    
    PredictionImpossible(perf)
    

    输出:urprise.prediction_algorithms.predictions.PredictionImpossible({'test_rmse': array([0.98543101, 0.98183076, 0.97598058, 0.97887157, 0.97005494]), 'test_mae': array([0.7777833 , 0.7760822 , 0.76922771, 0.77475951, 0.76503628]), 'fit_time': (0.5605289936065674, 0.5774590969085693, 0.6522552967071533, 0.658240795135498, 0.687117338180542), 'test_time': (4.028203248977661, 3.928906202316284, 4.177773714065552, 4.001417398452759, 4.107551574707031)})

    2、推荐电影top10案例(维护两个字典,index,电影,另一个电影,index 相反)

    import os
    from surprise import KNNBaseline
    import io
    from surprise import Dataset
    
    # step 1 : train model
    def TrainModel():
        data = Dataset.load_builtin('ml-100k')
        trainset = data.build_full_trainset()
        # use pearson_baseline to compute similarity
        sim_options = {'name' : 'pearson_baseline', 'user_based' : False}
        algo = KNNBaseline(sim_options=sim_options)
        # train
        algo.fit(trainset)
        return algo
    
    # step 2 : get id_name and name_id
    def Get_Dict():
        file_name = os.path.expanduser(r'C:\Users\lonng\.surprise_data\ml-100k\ml-100k\u.item')
        id_name = {}
        name_id = {}
        with open(file_name, 'r', encoding='ISO-8859-1') as f:
            for line in f:
                line = line.split('|')
                id_name[line[0]] = line[1]
                name_id[line[1]] = line[0]
    #     print(id_name,name_id)
        return id_name, name_id
    
    # step 3 : recommend movies based on the model
    def RecommendMovie(movieName, algo, id_name, name_id, recommendNum):
        # get movie's raw id 
        raw_id = name_id[movieName]
        print("%%%%%%%%%%%",raw_id)
        # translate raw_id to inner_id
        inner_id = algo.trainset.to_inner_iid(raw_id)
        print("%%%%%%%%%%%",inner_id)
        print("%%%%%%%%%%%",algo.trainset.to_raw_iid(inner_id))
        # recommend movies
        recommendations = algo.get_neighbors(inner_id, recommendNum)
        print("%%%%%%%%%%%",recommendations)
        # translate inner_id to raw_id
        raw_ids = [algo.trainset.to_raw_iid(inner_id) for inner_id in recommendations]
        print("%%%%%%%%%%%",raw_ids)
        # get movie name
        movies = [id_name[raw_id] for raw_id in raw_ids]
        for movie in movies:
            print(movie)
    
    if __name__ == '__main__':
        id_name, name_id = Get_Dict()
        algo = TrainModel()
        showMovies = RecommendMovie('Craft, The (1996)', algo, id_name, name_id, 10)
    
    

    3、读取自己文件格式

    import os
    from surprise import Reader, Dataset
    # 指定文件路径
    file_path = os.path.expanduser('./popular_music_suprise_format.txt')
    # 指定文件格式
    reader = Reader(line_format='user item rating timestamp', sep=',')
    # 从文件读取数据
    music_data = Dataset.load_from_file(file_path, reader=reader)
    # 分成5折
    music_data.split(n_folds=5)
    
    
    展开全文
  • 支持多种推荐算法,SVD, PMF, SVD++, NMF,neighborhood methods,baseline algorithms
  • 3.计算推荐 基于用户的协同过滤 和基于物品的协同过滤比较 实例: from surprise import KNNBasic,SVD from surprise import Dataset from surprise import evaluate, print_perf # http://surpris...

    在这里插入图片描述
    要实现协同过滤,需要的步骤?

    1.收集用户偏好
    2.找到相似的用户或物品
    3.计算推荐

    在这里插入图片描述

    在这里插入图片描述

    基于用户的协同过滤 和基于物品的协同过滤比较

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    在这里插入图片描述

    实例:

    from surprise import KNNBasic,SVD
    from surprise import Dataset
    from surprise import evaluate, print_perf
    # http://surprise.readthedocs.io/en/stable/index.html
    # http://files.grouplens.org/datasets/movielens/ml-100k-README.txt
    
    # Load the movielens-100k dataset (download it if needed),
    # and split it into 3 folds for cross-validation.
    data = Dataset.load_builtin('ml-100k')
    data.split(n_folds=3)
    
    # We'll use the famous KNNBasic algorithm.
    algo = KNNBasic()
    
    # Evaluate performances of our algorithm on the dataset.
    perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
    
    print_perf(perf)
    
    Evaluating RMSE, MAE of algorithm KNNBasic.
    
    ------------
    Fold 1
    Computing the msd similarity matrix...
    Done computing similarity matrix.
    RMSE: 0.9876
    MAE:  0.7807
    ------------
    Fold 2
    Computing the msd similarity matrix...
    Done computing similarity matrix.
    RMSE: 0.9871
    MAE:  0.7796
    ------------
    Fold 3
    Computing the msd similarity matrix...
    Done computing similarity matrix.
    RMSE: 0.9902
    MAE:  0.7818
    ------------
    ------------
    Mean RMSE: 0.9883
    Mean MAE : 0.7807
    ------------
    ------------
            Fold 1  Fold 2  Fold 3  Mean    
    MAE     0.7807  0.7796  0.7818  0.7807  
    RMSE    0.9876  0.9871  0.9902  0.9883  
    

    矩阵分解模型

    from surprise import GridSearch
    
    param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
                  'reg_all': [0.4, 0.6]}
    grid_search = GridSearch(SVD, param_grid, measures=['RMSE', 'FCP'])
    data = Dataset.load_builtin('ml-100k')
    data.split(n_folds=3)
    
    grid_search.evaluate(data)
    
    ------------
    Parameters combination 1 of 8
    params:  {'lr_all': 0.002, 'n_epochs': 5, 'reg_all': 0.4}
    ------------
    Mean RMSE: 0.9972
    Mean FCP : 0.6843
    ------------
    ------------
    Parameters combination 2 of 8
    params:  {'lr_all': 0.005, 'n_epochs': 5, 'reg_all': 0.4}
    ------------
    Mean RMSE: 0.9734
    Mean FCP : 0.6946
    ------------
    ------------
    Parameters combination 3 of 8
    params:  {'lr_all': 0.002, 'n_epochs': 10, 'reg_all': 0.4}
    ------------
    Mean RMSE: 0.9777
    Mean FCP : 0.6926
    ------------
    ------------
    Parameters combination 4 of 8
    params:  {'lr_all': 0.005, 'n_epochs': 10, 'reg_all': 0.4}
    ------------
    Mean RMSE: 0.9635
    Mean FCP : 0.6987
    ------------
    ------------
    Parameters combination 5 of 8
    params:  {'lr_all': 0.002, 'n_epochs': 5, 'reg_all': 0.6}
    ------------
    Mean RMSE: 1.0029
    Mean FCP : 0.6875
    ------------
    ------------
    Parameters combination 6 of 8
    params:  {'lr_all': 0.005, 'n_epochs': 5, 'reg_all': 0.6}
    ------------
    Mean RMSE: 0.9820
    Mean FCP : 0.6953
    ------------
    ------------
    Parameters combination 7 of 8
    params:  {'lr_all': 0.002, 'n_epochs': 10, 'reg_all': 0.6}
    ------------
    Mean RMSE: 0.9860
    Mean FCP : 0.6943
    ------------
    ------------
    Parameters combination 8 of 8
    params:  {'lr_all': 0.005, 'n_epochs': 10, 'reg_all': 0.6}
    ------------
    Mean RMSE: 0.9733
    Mean FCP : 0.6991
    ------------
    

    效果评估

    import pandas as pd  
    
    results_df = pd.DataFrame.from_dict(grid_search.cv_results)
    results_df
    

    在这里插入图片描述

    基于item的协同过滤

    from __future__ import (absolute_import, division, print_function,
                            unicode_literals)
    import os
    import io
    
    from surprise import KNNBaseline
    from surprise import Dataset
    
    
    def read_item_names():
    
    
        file_name = ('./ml-100k/u.item')
        rid_to_name = {}
        name_to_rid = {}
        with io.open(file_name, 'r', encoding='ISO-8859-1') as f:
            for line in f:
                line = line.split('|')
                rid_to_name[line[0]] = line[1]
                name_to_rid[line[1]] = line[0]
    
        return rid_to_name, name_to_rid
    
    
    
    data = Dataset.load_builtin('ml-100k')
    trainset = data.build_full_trainset()
    sim_options = {'name': 'pearson_baseline', 'user_based': False}
    algo = KNNBaseline(sim_options=sim_options)
    algo.train(trainset)
    

    推荐用户喜欢的10部电影

    toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)
    toy_story_neighbors
    
    toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id)
                           for inner_id in toy_story_neighbors)
    toy_story_neighbors = (rid_to_name[rid]
                           for rid in toy_story_neighbors)
    
    print()
    print('The 10 nearest neighbors of Toy Story are:')
    for movie in toy_story_neighbors:
        print(movie)
    
    The 10 nearest neighbors of Toy Story are:
    While You Were Sleeping (1995)
    Batman (1989)
    Dave (1993)
    Mrs. Doubtfire (1993)
    Groundhog Day (1993)
    Raiders of the Lost Ark (1981)
    Maverick (1994)
    French Kiss (1995)
    Stand by Me (1986)
    Net, The (1995)
    
    展开全文
  • Surprise——Python的推荐系统库(1)

    千次阅读 2018-04-17 20:22:19
    基于Surprise推荐系统实战本文就movielens数据集做测试,实践推荐。movielens数据集格式为:user item rating timestamp 其中主要用到前三列,timestamp在处理自己的数据集的时候可以用别的特征替换,在此不做详细...
  • 版权声明:本文为博主原创文章,未经博主允许不得转载。 文章目录一、Surprise介绍二、设计surprise时考虑到以下目的三、常用类四、安装...Surprise(Simple Python Recommendation System Engine)是一款推荐系统
  • 推荐系统surprise库教程

    千次阅读 2019-11-19 22:00:53
    推荐系统surprise库官方文档解读安装时常见问题Surprise的功能示例使用内置的数据集+交叉验证不使用交叉验证,只把数据集分割一次 安装时常见问题 surprise库的github地址https://github.com/NicolasHug/Surprise ...
  • surprise库使用 这个库的安装就安装了四个小时,他会报错,说什么``Microsoft Visual C++ 14.0 is required`,找了很多资料和博客,但是那些博客的方案都没有作用,最后终于解决: 直接下载Microsoft Build Tools for...
  • Surprise(Simple Python Recommendation System Engine)是一款推荐系统库,是scikit系列中的一个。简单易用,同时支持多种推荐算法。基础算法、协同过滤、矩阵分解等 Surprise使用 Surprise里有自带的Movielens...
  • python推荐系统Surprise

    千次阅读 2017-09-13 22:03:50
    # ## python推荐系统Surprise # ![](./Surprise.png) # 在推荐系统的建模过程中,我们将用到python库 [Surprise(Simple Python RecommendatIon System Engine)](https://github.com/NicolasHug/Surprise),是...
  • 推荐系统Surprise库内置推荐算法

    千次阅读 2018-04-07 20:43:06
    推荐系统的建模过程中,我们将用到python库 Surprise(Simple Python RecommendatIon System Engine),是scikit系列中的一个。简单易用,同时支持多种推荐算法:基础算法/baseline algorithms基于近邻方法(协同过滤...
  • Surprise推荐系统的建模过程中,我们将用到python库 Surprise(Simple Python RecommendatIon System Engine),是scikit系列中的一个(很多同学用过scikit-learn和scikit-image等库)。Sur...
  • Python推荐系统库——Surprise

    万次阅读 多人点赞 2018-01-24 09:12:11
    @ 2018-01-24 ...算法调参让推荐系统有更好的效果 在自己的数据集上训练模型 首先载入数据 使用不同的推荐系统算法进行建模比较 建模和存储模型 用协同过滤构建模型并进行预测 1 movielens的例子 2
  • 推荐系统-Python+surprise

    2019-10-15 15:40:55
    利用surprise+自己的数据构建推荐系统 surprise介绍 surprise是一个python的推荐系统的库,里面包含数据下载、数据处理、模型(包括svd、svd++、slopeone、random rating base 等,具体可以去看官方文档地址)、...
  • surprise官网:http://surprise.readthedocs.io/en/stable/index.html dateset:...from surprise import KNNBasic,SVD from surprise import Dataset f...
  • 基于Python库surprise的电影推荐系统

    千次阅读 2017-06-11 16:48:49
    基于Python库Surprise的电影推荐系统
  • Surprise概述surprise是一个便于使用的开源Python库,应用于推荐系统。安装/使用最简单的方式是使用pip:$ pip install surprise或者你可以clone仓库并build源:$ git clone ...
  • 毕业设计做有关推荐系统遇到安装库问题,经过查找,找到解决办法,帮到自己,分享到这里希望也能帮助别人。 1、cmd中直接pip3 install surprise出错 2、报错(图片用的其他作者的,但是我的报错和这个一样): 3、...
  • 算法调参让推荐系统有更好的效果 在自己的数据集上训练模型 首先载入数据 使用不同的推荐系统算法进行建模比较 建模和存储模型 用协同过滤构建模型并进行预测 1 movielens的例子 2 音乐预测的例子 ...
  • surprise官方网址:...from surprise import KNNBasic from surprise import Dataset from surprise.model_selection import cross_validate data = Dataset.load_bui...

空空如也

空空如也

1 2 3 4 5 ... 10
收藏数 199
精华内容 79
关键字:

surprise推荐系统