精华内容
下载资源
问答
  • 推荐系统项目实战 强烈推荐按这本书哦,资料很全,也很有逻辑 新的一年,学习新的知识,这里学习了这本书,计划两周学完 数据集 代码 # -*- coding:utf-8 -*- """ Author: Thinkgamer Desc: 代码2-1 ...

    推荐系统项目实战

    在这里插入图片描述

    强烈推荐按这本书哦,资料很全,也很有逻辑
    新的一年,学习新的知识,这里学习了这本书,计划两周学完
    在这里插入图片描述

    1. 数据集 链接:https://pan.baidu.com/s/1MVsdKM2q6cq-mL_I5DOt7A
      提取码:0tqo

    在这里插入图片描述

    1. 代码
    # -*- coding:utf-8 -*-
    
    """
        Author: Thinkgamer
        Desc:
            代码2-1  实例1:搭建你的第一个推荐系统-电影推荐系统
            从中随机选择1000个与用户进行计算
    """
    import os
    import json
    import random
    import math
    
    class FirstRec:
        """
            初始化函数
                filePath: 原始文件路径
                seed:产生随机数的种子
                k:选取的近邻用户个数
                nitems:为每个用户推荐的电影数
        """
        def __init__(self,file_path,seed,k,n_items):
            self.file_path = file_path
            self.users_1000 = self.__select_1000_users()
            self.seed = seed
            self.k = k
            self.n_items = n_items
            self.train,self.test = self._load_and_split_data()
    
        # 获取所有用户并随机选取1000个
        def __select_1000_users(self):
            print("随机选取1000个用户!")
            if os.path.exists("data/train.json") and os.path.exists("data/test.json"):
                return list()
            else:
                users = set()
                # 获取所有用户
                for file in os.listdir(self.file_path):
                    one_path = "{}/{}".format(self.file_path, file)
                    print("{}".format(one_path))
                    with open(one_path, "r") as fp:
                        for line in fp.readlines():
                            if line.strip().endswith(":"):
                                continue
                            userID, _ , _ = line.split(",")
                            users.add(userID)
                # 随机选取1000个
                users_1000 = random.sample(list(users),1000)
                print(users_1000)
                return users_1000
    
        # 加载数据,并拆分为训练集和测试集
        def _load_and_split_data(self):
            train = dict()
            test = dict()
            if os.path.exists("data/train.json") and os.path.exists("data/test.json"):
                print("从文件中加载训练集和测试集")
                train = json.load(open("data/train.json"))
                test = json.load(open("data/test.json"))
                print("从文件中加载数据完成")
            else:
                # 设置产生随机数的种子,保证每次实验产生的随机结果一致
                random.seed(self.seed)
                for file in os.listdir(self.file_path):
                    one_path = "{}/{}".format(self.file_path, file)
                    print("{}".format(one_path))
                    with open(one_path,"r") as fp:
                        movieID = fp.readline().split(":")[0]
                        for line in fp.readlines():
                            if line.endswith(":"):
                                continue
                            userID, rate, _ = line.split(",")
                            # 判断用户是否在所选择的1000个用户中
                            if userID in self.users_1000:
                                if random.randint(1,50) == 1:
                                    test.setdefault(userID, {})[movieID] = int(rate)
                                else:
                                    train.setdefault(userID, {})[movieID] = int(rate)
                print("加载数据到 data/train.json 和 data/test.json")
                json.dump(train,open("data/train.json","w"))
                json.dump(test,open("data/test.json","w"))
                print("加载数据完成")
            return train,test
    
        """
            计算皮尔逊相关系数
                rating1:用户1的评分记录,形式如{"movieid1":rate1,"movieid2":rate2,...}
                rating2:用户1的评分记录,形式如{"movieid1":rate1,"movieid2":rate2,...}
        """
        def pearson(self,rating1,rating2):
            sum_xy = 0
            sum_x = 0
            sum_y = 0
            sum_x2 = 0
            sum_y2 = 0
            num = 0
            for key in rating1.keys():
                if key in rating2.keys():
                    num += 1
                    x = rating1[key]
                    y = rating2[key]
                    sum_xy += x * y
                    sum_x += x
                    sum_y += y
                    sum_x2 += math.pow(x,2)
                    sum_y2 += math.pow(y,2)
            if num == 0:
                return  0
            # 皮尔逊相关系数分母
            denominator = math.sqrt( sum_x2 - math.pow(sum_x,2) / num) * math.sqrt( sum_y2 - math.pow(sum_y,2) / num )
            if denominator == 0:
                return  0
            else:
                return ( sum_xy - ( sum_x * sum_y ) / num ) / denominator
    
        """
            用户userID进行电影推荐
                userID:用户ID
        """
        def recommend(self,userID):
            neighborUser = dict()
            for user in self.train.keys():
                if userID != user:
                    distance = self.pearson(self.train[userID],self.train[user])
                    neighborUser[user]=distance
            # 字典排序
            newNU = sorted(neighborUser.items(),key = lambda k:k[1] ,reverse=True)
    
            movies = dict()
            for (sim_user,sim) in newNU[:self.k]:
                for movieID in self.train[sim_user].keys():
                    movies.setdefault(movieID,0)
                    movies[movieID] += sim * self.train[sim_user][movieID]
            newMovies = sorted(movies.items(), key = lambda  k:k[1], reverse=True)
            return newMovies
    
        """
            推荐系统效果评估函数
                num: 随机抽取 num 个用户计算准确率
        """
        def evaluate(self,num=30):
            print("开始计算准确率")
            precisions = list()
            random.seed(10)
            for userID in random.sample(self.test.keys(),num):
                hit = 0
                result = self.recommend(userID)[:self.n_items]
                for (item,rate) in result:
                    if item in self.test[userID]:
                        hit += 1
                precisions.append(hit/self.n_items)
            return  sum(precisions) / precisions.__len__()
    
    # main函数,程序的入口
    if __name__ == "__main__":
        file_path = "data/netflix/training_set"
        seed = 30
        k = 15
        n_items =20
        f_rec = FirstRec(file_path,seed,k,n_items)
        # 计算用户 195100 和 1547579的皮尔逊相关系数
        r = f_rec.pearson(f_rec.train["195100"],f_rec.train["1547579"])
        print("195100 和 1547579的皮尔逊相关系数为:{}".format(r))
        # 为用户195100进行电影推荐
        result = f_rec.recommend("195100")
        print("为用户ID为:195100的用户推荐的电影为:{}".format(result))
        print("算法的推荐准确率为: {}".format(f_rec.evaluate()))
    
    1. 结果
    随机选取1000个用户!
    从文件中加载训练集和测试集
    从文件中加载数据完成
    1951001547579的皮尔逊相关系数为:0.1194695382178992
    为用户ID为:195100的用户推荐的电影为:[('3938', 22.0), ('14538', 19.000000000000004), ('14103', 19.0), ('15205', 18.000000000000004), ('17355', 18.0), ('1905', 18.0), ('12317', 16.000000000000004), ('13255', 16.000000000000004), ('5317', 14.000000000000004), ('11283', 14.0), ('14240', 14.0), ('6974', 14.0), ('16265', 14.0), ('6206', 14.0), ('11521', 14.0), ('1145', 13.000000000000005), ('17169', 13.000000000000005), ('9340', 13.000000000000004), ('4306', 13.0), ('11132', 13.0), ('17324', 13.0), ('14313', 12.000000000000002), ('16879', 12.0), ('3917', 12.0), ('7624', 12.0), ('8644', 12.0), ('13593', 12.0), ('6844', 11.000000000000002), ('758', 11.0), ('313', 11.0), ('8393', 11.0), ('11089', 11.0), ('13050', 11.0), ('14454', 11.0), ('16882', 11.0), ('12911', 10.000000000000005), ('15582', 10.000000000000005), ('30', 10.0), ('14621', 10.0), ('16377', 10.0), ('5582', 10.0), ('9628', 10.0), ('3274', 10.0), ('5496', 10.0), ('16082', 10.0), ('10550', 9.999999999999998), ('1220', 9.999999999999998), ('1804', 9.999999999999998), ('12721', 9.999999999999998), ('12672', 9.000000000000005), ('6386', 9.0), ('12918', 9.0), ('13052', 9.0), ('5085', 9.0), ('6030', 9.0), ('7928', 9.0), ('9189', 9.0), ('12293', 9.0), ('14410', 9.0), ('14550', 9.0), ('14574', 9.0), ('223', 9.0), ('12161', 9.0), ('197', 9.0), ('1191', 9.0), ('3427', 9.0), ('13087', 9.0), ('17303', 9.0), ('1110', 9.0), ('15646', 9.0), ('17330', 9.0), ('2452', 9.0), ('3624', 9.0), ('13673', 9.0), ('996', 8.999999999999998), ('5577', 8.999999999999998), ('11022', 8.999999999999998), ('13258', 8.999999999999998), ('2152', 8.000000000000004), ('4972', 8.000000000000004), ('12470', 8.000000000000004), ('6972', 8.0), ('16668', 8.0), ('3756', 8.0), ('4123', 8.0), ('5087', 8.0), ('7406', 8.0), ('10583', 8.0), ('11607', 8.0), ('16452', 8.0), ('3894', 8.0), ('16242', 8.0), ('1406', 7.999999999999999), ('1962', 7.999999999999999), ('2342', 7.999999999999999), ('2862', 7.999999999999999), ('6134', 7.999999999999999), ('6615', 7.999999999999999), ('15563', 7.999999999999999), ('3638', 7.999999999999999), ('4384', 7.999999999999999), ('9818', 7.999999999999999), ('5320', 7.999999999999999), ('6475', 7.999999999999999), ('6859', 7.999999999999999), ('15063', 7.999999999999999), ('15099', 7.999999999999999), ('15409', 7.999999999999999), ('10729', 7.999999999999998), ('13380', 7.999999999999998), ('11149', 7.000000000000005), ('6287', 7.0000000000000036), ('14712', 7.000000000000001), ('3282', 7.0), ('11677', 7.0), ('15107', 7.0), ('15788', 7.0), ('4262', 7.0), ('12056', 7.0), ('14187', 7.0), ('10421', 7.0), ('13728', 6.999999999999999), ('17149', 6.999999999999999), ('9054', 6.999999999999999), ('11314', 6.999999999999999), ('11182', 6.999999999999998), ('5814', 6.999999999999998), ('2112', 6.999999999999998), ('4996', 6.0), ('7987', 6.0), ('12155', 6.0), ('6037', 6.0), ('3860', 5.999999999999999), ('10429', 5.999999999999999), ('571', 5.999999999999998), ('6648', 5.999999999999998), ('7060', 5.0), ('14533', 5.0), ('1102', 5.0), ('3962', 5.0), ('4356', 5.0), ('5531', 5.0), ('11040', 5.0), ('12870', 5.0), ('15101', 5.0), ('15296', 5.0), ('15844', 5.0), ('17157', 5.0), ('166', 5.0), ('199', 5.0), ('788', 5.0), ('1661', 5.0), ('17014', 5.0), ('17479', 5.0), ('762', 5.0), ('2989', 5.0), ('5285', 5.0), ('7429', 5.0), ('11370', 5.0), ('12433', 5.0), ('14302', 5.0), ('15124', 5.0), ('16147', 5.0), ('819', 5.0), ('937', 5.0), ('1364', 5.0), ('1542', 5.0), ('1590', 5.0), ('1914', 5.0), ('2023', 5.0), ('2140', 5.0), ('2162', 5.0), ('2254', 5.0), ('2326', 5.0), ('2594', 5.0), ('2612', 5.0), ('2953', 5.0), ('3807', 5.0), ('3825', 5.0), ('4829', 5.0), ('5875', 5.0), ('6119', 5.0), ('6194', 5.0), ('6448', 5.0), ('6482', 5.0), ('7186', 5.0), ('7617', 5.0), ('8192', 5.0), ('8339', 5.0), ('8595', 5.0), ('9036', 5.0), ('9188', 5.0), ('9326', 5.0), ('9471', 5.0), ('9756', 5.0), ('10123', 5.0), ('10359', 5.0), ('11433', 5.0), ('11805', 5.0), ('12766', 5.0), ('13090', 5.0), ('13217', 5.0), ('13462', 5.0), ('13810', 5.0), ('13851', 5.0), ('14167', 5.0), ('14755', 5.0), ('14963', 5.0), ('15170', 5.0), ('15755', 5.0), ('15798', 5.0), ('16139', 5.0), ('17053', 5.0), ('17250', 5.0), ('17441', 5.0), ('17707', 5.0), ('16128', 5.0), ('14376', 5.0), ('457', 5.0), ('1803', 5.0), ('3612', 5.0), ('4008', 5.0), ('4432', 5.0), ('6027', 5.0), ('6042', 5.0), ('8118', 5.0), ('8160', 5.0), ('11337', 5.0), ('12338', 5.0), ('12785', 5.0), ('13359', 5.0), ('17004', 5.0), ('17293', 5.0), ('17405', 5.0), ('17627', 5.0), ('290', 4.999999999999999), ('2913', 4.999999999999999), ('3138', 4.999999999999999), ('5695', 4.999999999999999), ('5947', 4.999999999999999), ('6366', 4.999999999999999), ('6450', 4.999999999999999), ('7193', 4.999999999999999), ('7713', 4.999999999999999), ('7786', 4.999999999999999), ('8966', 4.999999999999999), ('8993', 4.999999999999999), ('10189', 4.999999999999999), ('10986', 4.999999999999999), ('12367', 4.999999999999999), ('14264', 4.999999999999999), ('15209', 4.999999999999999), ('17339', 4.999999999999999), ('17449', 4.999999999999999), ('8954', 4.999999999999999), ('175', 4.999999999999999), ('210', 4.999999999999999), ('473', 4.999999999999999), ('561', 4.999999999999999), ('872', 4.999999999999999), ('1741', 4.999999999999999), ('1848', 4.999999999999999), ('2348', 4.999999999999999), ('2480', 4.999999999999999), ('3139', 4.999999999999999), ('3374', 4.999999999999999), ('4477', 4.999999999999999), ('5283', 4.999999999999999), ('5561', 4.999999999999999), ('5653', 4.999999999999999), ('5862', 4.999999999999999), ('6117', 4.999999999999999), ('6221', 4.999999999999999), ('6445', 4.999999999999999), ('6545', 4.999999999999999), ('6807', 4.999999999999999), ('6808', 4.999999999999999), ('7170', 4.999999999999999), ('7433', 4.999999999999999), ('7516', 4.999999999999999), ('7523', 4.999999999999999), ('7586', 4.999999999999999), ('7735', 4.999999999999999), ('8806', 4.999999999999999), ('8829', 4.999999999999999), ('8832', 4.999999999999999), ('8893', 4.999999999999999), ('8951', 4.999999999999999), ('9076', 4.999999999999999), ('9330', 4.999999999999999), ('9426', 4.999999999999999), ('10276', 4.999999999999999), ('10661', 4.999999999999999), ('11573', 4.999999999999999), ('11899', 4.999999999999999), ('12417', 4.999999999999999), ('12942', 4.999999999999999), ('14061', 4.999999999999999), ('14210', 4.999999999999999), ('14525', 4.999999999999999), ('15333', 4.999999999999999), ('15657', 4.999999999999999), ('16175', 4.999999999999999), ('16306', 4.999999999999999), ('16431', 4.999999999999999), ('16482', 4.999999999999999), ('16721', 4.999999999999999), ('17412', 4.999999999999999), ('17472', 4.999999999999999), ('270', 4.999999999999999), ('798', 4.999999999999999), ('985', 4.999999999999999), ('1256', 4.999999999999999), ('2938', 4.999999999999999), ('3078', 4.999999999999999), ('4345', 4.999999999999999), ('4577', 4.999999999999999), ('4951', 4.999999999999999), ('5309', 4.999999999999999), ('5414', 4.999999999999999), ('6034', 4.999999999999999), ('7057', 4.999999999999999), ('7155', 4.999999999999999), ('7158', 4.999999999999999), ('7230', 4.999999999999999), ('8438', 4.999999999999999), ('8840', 4.999999999999999), ('10988', 4.999999999999999), ('11271', 4.999999999999999), ('12184', 4.999999999999999), ('12453', 4.999999999999999), ('12530', 4.999999999999999), ('13663', 4.999999999999999), ('14961', 4.999999999999999), ('15070', 4.999999999999999), ('15307', 4.999999999999999), ('15609', 4.999999999999999), ('15689', 4.999999999999999), ('16083', 4.999999999999999), ('17023', 4.999999999999999), ('17328', 4.999999999999999), ('15151', 4.999999999999999), ('9939', 4.000000000000004), ('3610', 4.0), ('7635', 4.0), ('17431', 4.0), ('708', 4.0), ('759', 4.0), ('886', 4.0), ('1073', 4.0), ('1174', 4.0), ('1931', 4.0), ('2743', 4.0), ('3079', 4.0), ('3605', 4.0), ('4330', 4.0), ('4640', 4.0), ('5056', 4.0), ('6274', 4.0), ('6408', 4.0), ('6630', 4.0), ('6833', 4.0), ('7364', 4.0), ('9728', 4.0), ('10808', 4.0), ('12471', 4.0), ('13622', 4.0), ('13763', 4.0), ('13883', 4.0), ('14507', 4.0), ('14827', 4.0), ('15968', 4.0), ('16286', 4.0), ('17088', 4.0), ('660', 4.0), ('1646', 4.0), ('5084', 4.0), ('6362', 4.0), ('10982', 4.0), ('13923', 4.0), ('17426', 4.0), ('642', 4.0), ('8561', 4.0), ('283', 4.0), ('607', 4.0), ('896', 4.0), ('1045', 4.0), ('1610', 4.0), ('1625', 4.0), ('1645', 4.0), ('2430', 4.0), ('2541', 4.0), ('3021', 4.0), ('3127', 4.0), ('3242', 4.0), ('3542', 4.0), ('3737', 4.0), ('3905', 4.0), ('3999', 4.0), ('4263', 4.0), ('4533', 4.0), ('5421', 4.0), ('5503', 4.0), ('5897', 4.0), ('6281', 4.0), ('6555', 4.0), ('6692', 4.0), ('7019', 4.0), ('7076', 4.0), ('7077', 4.0), ('7633', 4.0), ('8253', 4.0), ('8278', 4.0), ('9205', 4.0), ('9617', 4.0), ('10809', 4.0), ('10921', 4.0), ('11103', 4.0), ('11669', 4.0), ('12101', 4.0), ('12102', 4.0), ('12273', 4.0), ('12299', 4.0), ('13523', 4.0), ('13656', 4.0), ('13805', 4.0), ('14144', 4.0), ('14149', 4.0), ('14593', 4.0), ('14856', 4.0), ('15048', 4.0), ('15247', 4.0), ('15540', 4.0), ('16339', 4.0), ('16516', 4.0), ('16724', 4.0), ('17035', 4.0), ('17559', 4.0), ('17743', 4.0), ('257', 4.0), ('3907', 4.0), ('5293', 4.0), ('7745', 4.0), ('8764', 4.0), ('12508', 4.0), ('13651', 4.0), ('15500', 4.0), ('15700', 4.0), ('16384', 4.0), ('17321', 4.0), ('273', 4.0), ('7234', 4.0), ('8204', 4.0), ('10255', 4.0), ('12739', 4.0), ('3526', 4.0), ('4315', 4.0), ('4522', 4.0), ('5284', 4.0), ('5621', 4.0), ('6060', 4.0), ('6267', 4.0), ('6329', 4.0), ('6437', 4.0), ('6698', 4.0), ('6874', 4.0), ('6971', 4.0), ('7852', 4.0), ('9662', 4.0), ('10358', 4.0), ('10906', 4.0), ('11812', 4.0), ('11910', 4.0), ('12600', 4.0), ('12966', 4.0), ('13330', 4.0), ('14467', 4.0), ('14999', 4.0), ('16380', 4.0), ('16707', 4.0), ('16793', 4.0), ('17174', 4.0), ('564', 4.0), ('1324', 4.0), ('2649', 4.0), ('3864', 4.0), ('4109', 4.0), ('5926', 4.0), ('6552', 4.0), ('7067', 4.0), ('9458', 4.0), ('13081', 4.0), ('13582', 4.0), ('14531', 4.0), ('14571', 4.0), ('14691', 4.0), ('14897', 4.0), ('16438', 4.0), ('16469', 4.0), ('16872', 4.0), ('14644', 3.9999999999999996), ('4914', 3.9999999999999996), ('8', 3.999999999999999), ('443', 3.999999999999999), ('2580', 3.999999999999999), ('3125', 3.999999999999999), ('5345', 3.999999999999999), ('5762', 3.999999999999999), ('6131', 3.999999999999999), ('6454', 3.999999999999999), ('6518', 3.999999999999999), ('6917', 3.999999999999999), ('7517', 3.999999999999999), ('8801', 3.999999999999999), ('8976', 3.999999999999999), ('9778', 3.999999999999999), ('10433', 3.999999999999999), ('10582', 3.999999999999999), ('11227', 3.999999999999999), ('12534', 3.999999999999999), ('12838', 3.999999999999999), ('13015', 3.999999999999999), ('14233', 3.999999999999999), ('14274', 3.999999999999999), ('14549', 3.999999999999999), ('16240', 3.999999999999999), ('16495', 3.999999999999999), ('17033', 3.999999999999999), ('17184', 3.999999999999999), ('17312', 3.999999999999999), ('6829', 3.999999999999999), ('14527', 3.999999999999999), ('15483', 3.999999999999999), ('599', 3.999999999999999), ('1466', 3.999999999999999), ('2175', 3.999999999999999), ('2965', 3.999999999999999), ('3106', 3.999999999999999), ('3879', 3.999999999999999), ('4139', 3.999999999999999), ('7384', 3.999999999999999), ('7419', 3.999999999999999), ('8526', 3.999999999999999), ('10004', 3.999999999999999), ('10162', 3.999999999999999), ('10662', 3.999999999999999), ('10832', 3.999999999999999), ('10920', 3.999999999999999), ('11295', 3.999999999999999), ('11575', 3.999999999999999), ('11904', 3.999999999999999), ('12360', 3.999999999999999), ('13082', 3.999999999999999), ('13186', 3.999999999999999), ('13317', 3.999999999999999), ('13909', 3.999999999999999), ('16810', 3.999999999999999), ('1144', 3.999999999999999), ('3538', 3.999999999999999), ('4570', 3.999999999999999), ('5939', 3.999999999999999), ('7233', 3.999999999999999), ('7331', 3.999999999999999), ('14215', 3.999999999999999), ('17215', 3.999999999999999), ('17762', 3.999999999999999), ('2192', 3.999999999999999), ('3347', 3.999999999999999), ('13342', 3.999999999999999), ('5071', 3.0000000000000036), ('12694', 3.0000000000000036), ('3197', 3.0), ('4745', 3.0), ('7446', 3.0), ('8782', 3.0), ('11064', 3.0), ('11837', 3.0), ('12343', 3.0), ('15339', 3.0), ('16765', 3.0), ('720', 3.0), ('1180', 3.0), ('1673', 3.0), ('2874', 3.0), ('3730', 3.0), ('4043', 3.0), ('4488', 3.0), ('5952', 3.0), ('6347', 3.0), ('7649', 3.0), ('8784', 3.0), ('9381', 3.0), ('10042', 3.0), ('10423', 3.0), ('10818', 3.0), ('13384', 3.0), ('13413', 3.0), ('13636', 3.0), ('13827', 3.0), ('13845', 3.0), ('14367', 3.0), ('14653', 3.0), ('15902', 3.0), ('16792', 3.0), ('16891', 3.0), ('2678', 3.0), ('3434', 3.0), ('3772', 3.0), ('5819', 3.0), ('7032', 3.0), ('14977', 3.0), ('5528', 3.0), ('5760', 3.0), ('8799', 3.0), ('14278', 3.0), ('2518', 3.0), ('4092', 3.0), ('5604', 3.0), ('6311', 3.0), ('7322', 3.0), ('10789', 3.0), ('15529', 3.0), ('17129', 3.0), ('17175', 3.0), ('17381', 3.0), ('16113', 3.0), ('11681', 3.0), ('15641', 3.0), ('1138', 3.0), ('5793', 3.0), ('5828', 3.0), ('5836', 3.0), ('6860', 3.0), ('7184', 3.0), ('7281', 3.0), ('8295', 3.0), ('10860', 3.0), ('11931', 3.0), ('12322', 3.0), ('14113', 3.0), ('15764', 3.0), ('312', 2.999999999999999), ('1283', 2.999999999999999), ('2779', 2.999999999999999), ('2958', 2.999999999999999), ('3151', 2.999999999999999), ('4493', 2.999999999999999), ('4695', 2.999999999999999), ('6497', 2.999999999999999), ('7238', 2.999999999999999), ('7971', 2.999999999999999), ('9415', 2.999999999999999), ('9442', 2.999999999999999), ('10773', 2.999999999999999), ('13061', 2.999999999999999), ('13214', 2.999999999999999), ('14890', 2.999999999999999), ('14940', 2.999999999999999), ('15343', 2.999999999999999), ('17062', 2.999999999999999), ('17111', 2.999999999999999), ('9645', 2.999999999999999), ('15034', 2.999999999999999), ('963', 2.999999999999999), ('1464', 2.999999999999999), ('406', 2.999999999999999), ('442', 2.999999999999999), ('2172', 2.999999999999999), ('2942', 2.999999999999999), ('4877', 2.999999999999999), ('5154', 2.999999999999999), ('7739', 2.999999999999999), ('8535', 2.999999999999999), ('10375', 2.999999999999999), ('11047', 2.999999999999999), ('11090', 2.999999999999999), ('11696', 2.999999999999999), ('13736', 2.999999999999999), ('15471', 2.999999999999999), ('305', 2.999999999999999), ('1307', 2.999999999999999), ('10101', 2.999999999999999), ('12303', 2.999999999999999), ('28', 2.0), ('6720', 2.0), ('12774', 2.0), ('15474', 2.0), ('1700', 2.0), ('2226', 2.0), ('16095', 2.0), ('17345', 2.0), ('1068', 2.0), ('11170', 2.0), ('6255', 2.0), ('8418', 2.0), ('17031', 2.0), ('17251', 2.0), ('331', 2.0), ('2477', 2.0), ('7249', 2.0), ('10947', 2.0), ('13519', 2.0), ('16640', 2.0), ('16859', 2.0), ('468', 1.9999999999999996), ('2856', 1.9999999999999996), ('4733', 1.9999999999999996), ('6084', 1.9999999999999996), ('8824', 1.9999999999999996), ('10078', 1.9999999999999996), ('13565', 1.9999999999999996), ('13855', 1.9999999999999996), ('14440', 1.9999999999999996), ('14898', 1.9999999999999996), ('15608', 1.9999999999999996), ('16603', 1.9999999999999996), ('16730', 1.9999999999999996), ('17704', 1.9999999999999996), ('9800', 1.9999999999999996), ('658', 1.9999999999999996), ('2391', 1.9999999999999996), ('2486', 1.9999999999999996), ('5837', 1.9999999999999996), ('10775', 1.9999999999999996), ('15777', 1.9999999999999996), ('3314', 1.9999999999999996), ('4590', 1.9999999999999996), ('7521', 1.9999999999999996), ('11065', 1.9999999999999996), ('13043', 1.9999999999999996), ('14389', 1.9999999999999996), ('17387', 1.000000000000001), ('1975', 1.0), ('2361', 1.0), ('4103', 1.0), ('5725', 1.0), ('16145', 1.0), ('191', 1.0), ('6975', 1.0), ('14332', 1.0), ('3713', 1.0), ('7904', 1.0), ('5991', 1.0), ('6596', 1.0), ('1012', 0.9999999999999998), ('2939', 0.9999999999999998), ('7780', 0.9999999999999998), ('3161', 0.9999999999999998), ('13471', 0.9999999999999998), ('14154', 0.9999999999999998)]
    开始计算准确率
    算法的推荐准确率为: 0.005000000000000001
    
    1. 总结
      只是抽取的1000个训练,结果并不是很理想,全部训练集基数大,估计可行
      后期有时间放上GPU结果
    展开全文
  • 推荐系统项目实战

    2020-03-14 20:44:56
  • 架构与业务流 基础数据层: 包括业务数据和用户... 召回 排序 点击率预估模型 特征处理、模型评价 推荐业务层:通过对外提供rpc接口来实现推荐业务的接入 Feed流推荐:今日推荐场景,用户可以在这些页面中不断下拉刷新

    架构与业务流

    • 基础数据层:
      • 包括业务数据和用户行为日志数据。
        • 业务数据主要包含用户数据和文章数据,用户数据即某头条注册用户的基础数据,文章数据在自媒体平台上传的文章的基本信息
        • 用户行为日志数据来源于前端埋点
      • 业务批量存储在HDFS上以用作离线分析
      • 日志数据实时流向Kafka以用作实时计算
    • 数据处理层:
      • 基础计算:基于离线和实时数据,对各类基础数据计算成用户画像、文章画像
      • 召回与排序
        • 召回环节使用各种算法逻辑从海量的文章中筛选出用户感兴趣的文章候选集合,集合大小:上千级别。排序即对候选集合中的文章进行用户相对的模型结果排序,生成一个排序列表。
        • 召回
        • 排序
          • 点击率预估模型
          • 特征处理、模型评价
    • 推荐业务层:通过对外提供rpc接口来实现推荐业务的接入
      • Feed流推荐:今日推荐场景,用户可以在这些页面中不断下拉刷新
    展开全文
  • 画像的构建作为推荐系统非常重要的环节,画像可以作为整个产品的推荐或者营销重要依据。需要通过各种方法来构建。 文章内容标签化:内容标签化,根据内容定性的制定一系列标签,这些标签可以是描述性标签。针对...

    1 离线画像流程

    画像构建内容:

    画像的构建作为推荐系统非常重要的环节,画像可以作为整个产品的推荐或者营销重要依据。需要通过各种方法来构建。

    • 文章内容标签化:内容标签化,根据内容定性的制定一系列标签,这些标签可以是描述性标签。针对于文章就是文章相关的内容词语。

      • 文章的关键词、主题词
    • 用户标签化:这个过程就是需要研究用户对内容的喜好程度,用户喜欢的内容即当作用户喜好的标签。

      • 在用户行为记录表中,我们所记下用户的行为在此时就发挥出重要的作用了。用户的浏览(时长/频率)、点击、分享/收藏/关注、其他商业化或关键信息均不同程度的代表的用户对这个内容的喜好程度。

     

    2 离线文章画像计算

     

    离线文章画像组成需求

    文章画像,就是给每篇文章定义一些词。

    • 关键词:TEXTRANK + IDF共同的词
    • 主题词:TEXTRANK + ITFDF共同的词

    步骤:

    1、原始文章表数据合并

    2、所有历史文章Tfidf计算

    3、所有历史文章TextRank计算

    2.1 Tfidf计算

    2.1.1 目的

    • 计算出每篇文章的词语的TFIDF结果用于抽取画像

    2.1.2TFIDF模型的训练步骤

    • 读取N篇文章数据
    • 文章数据进行分词处理
    • TFIDF模型训练保存,spark使用count与idf进行计算
    • 利用模型计算N篇文章数据的TFIDF值

    2.1.3 实现

    想要用TFIDF进行计算,需要训练一个模型保存结果

    • 新建一个compute_tfidf.ipynb的文件

     

    2.2 TextRank计算

    步骤:

    • 1、TextRank存储结构
    • 2、TextRank过滤计算

    2.3 文章画像结果

    对文章进行计算画像

    • 步骤:
      • 1、加载IDF,保留关键词以及权重计算(TextRank * IDF)
      • 2、合并关键词权重到字典结果
      • 3、将tfidf和textrank共现的词作为主题词
      • 4、将主题词表和关键词表进行合并,插入表

    加载IDF,保留关键词以及权重计算(TextRank * IDF)

    展开全文
  • 大数据项目实战:电商推荐系统

    千人学习 2019-03-01 10:46:10
    打造的电商推荐系统项目,就是以经过修改的中文亚马逊电商数据集作为依托,并以某电商网站真实的业务架构作为基础来实现的,其中包含了离线推荐与实时推荐体系,综合利用了协同过滤算法以及基于内容的推荐方法来提供...
  • 精心打造出了机器学习与推荐系统课程,将机器学习理论与推荐系统项目实战并重,对机器学习和推荐系统基础知识做了系统的梳理和阐述,并通过电影推荐网站的具体项目进行了实战演练,为有志于增加大数据项目经验、...
  • 推荐系统开发实战》这本书上市已经半年了,鉴于有很多读者对于最后三章的案例复现出现的一些问题和疑问,这里再详细说明一下。关于代码获取 可在书籍的前几页查看获取方式,之前把这三大案例的代码放到了github上...
  • 电商大数据项目-推荐系统实战(一)环境搭建以及日志,人口,商品分析https://blog.51cto.com/6989066/2325073电商大数据项目-推荐系统实战之推荐算法https://blog.51cto.com/6989066/2326209电商大数据项目-推荐...
  • 电商大数据项目-推荐系统实战(一)环境搭建以及日志,人口,商品分析https://blog.51cto.com/6989066/2325073电商大数据项目-推荐系统实战之推荐算法https://blog.51cto.com/6989066/2326209电商大数据项目-推荐...
  • 前面说了那么多,推荐系统的冷启动,协同过滤算法,聚类算法,关联规则,混合算法,那现在问题出来了,我用哪种算法来写我自己的公寓出租推荐系统呢?一是凭经验,二,所有算法都测试一遍,看哪个效果好!!!!效果...
  • 一、冷启动原理与项目实战 1 冷启动原理与技术原理 推荐系统需要根据用户的历史行为和兴趣预测用户未来的行为和兴趣,因此大量的用户行为数据就称为推荐系统的重要组成部分和先决条件。很多在开始阶段就希望有个性化...
  • 大数据项目实战:电商推荐系统 尚硅谷讲师,辽宁工程技术大学硕士,曾先后就职于...
  • 本文是Mahout实现推荐系统的又一案例,用Mahout构建图书推荐系统。与之前的两篇文章,思路上面类似,侧重点在于图书的属性如何利用。本文的数据在自于Amazon网站,由爬虫抓取获得。 目录 项目背景 需求分析 ...
  • 协同过滤算法是推荐系统中最重要也是最常用的算法之一,本课程以项目实现为主,讲解基于商品的协同过滤算法应用,通过不断对算法进行优化,提升推荐结果的准确率与召回率。
  • 刚好有这么一个数据集,包含了很多部的电影,于是打算对其进行一波简单的分析并尝试建一个简单的“推荐系统”,一起来看一下吧~~库:Pandas,Numpy,Re工具:Ipython Notebook这边插一句,小密圈曾经有一篇文章大家...
  • 最新机器学习超多项目实战项目实战+音乐推荐系统+Pytorch+机器翻译+金融反欺诈等 ===============课程目录=============== ├<讲义代码> │ ├<第01课> │ │ ├《推荐系统》数据...
  • 本教程为官方授权出品 如今大数据已经成了各大互联网公司工作的重点方向,而推荐系统可以说就是大数据最好的落地应用之一,已经为企业带来了...量身定制打造的电商推荐系统项目,就是以经过修改的中文亚马逊...
  • 本教程为官方授权出品 如今大数据已经成了各大互联网公司工作的重点方向,而推荐系统可以说就是大数据最好的落地应用之一,已经为企业带来了...量身定制打造的电商推荐系统项目,就是以经过修改的中文亚马逊...
  • 本项目是基于Spark MLLib的大数据电商推荐系统项目,使用了scala语言和java语言。基于python语言的推荐系统项目会另外写一篇博客。在阅读本博客以前,需要有以下基础:1.linux的基本命令2.至少有高中及以上的数学...
  • 推荐系统-Ctr点击率预估理论基础及项目实战

    千次阅读 热门讨论 2019-03-25 09:21:54
    Ctr点击率预估理论基础及项目实战 1.机器学习推荐算法模型回顾 召回(粗排) 利用业务规则结合机器学习推荐算法得到初始推荐结果,得到部分商品召回集 ALS\UserCF\ItemCF\FP-Growth\规则等方式召回 排序(精排) 1...
  • Hive电商推荐系统开发实战1.构建数据仓库2.数据清洗3.推荐算法实现4.数据ETL 项目的主要流程: #mermaid-svg-y1KC8DzwYk96JI79 .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-...
  • 即将分享给你的课程《机器学习/大数据 推荐系统实战》,是一位非常牛逼的大数据技术牛人制作的。数据党必备。 梁爽,花名:中天。 目前就职于国内BAT之一,多年大数据技术研发经验,有7年多的大数据实战教学经验。...

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 878
精华内容 351
关键字:

推荐系统项目实战