精华内容
下载资源
问答
  • importjsonimporttimefrom urllib.parse importquotefrom urllib importrequestimportrequests"""1.综合2....微头条10.话题"""tab_list=["pd=synthesis&from=search_tab","pd=video&from...

    importjsonimporttimefrom urllib.parse importquotefrom urllib importrequestimportrequests"""1.综合

    2.视屏

    3.资讯

    4.小视屏

    5.图片

    6.用户

    7.音乐

    8.问答

    9.微头条

    10.话题"""tab_list=["pd=synthesis&from=search_tab","pd=video&from=video","pd=information&from=news","pd=xiaoshipin&from=xiaoshipin","pd=atlas&from=gallery","pd=user&from=media","pd=music&from=music","pd=question&from=question","pd=weitoutiao&from=weitoutiao","pd=huati&from=huati"]

    headers={"User-Agent": "Mozilla/5.0 (Linux; U; Android 6.0.1; zh-cn; MI 5s Build/MXB48T) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.146 Mobile Safari/537.36 XiaoMi/MiuiBrowser/8.7.1"}#keyWords关键词,page分页,tab模块

    defqueryList(keyWords,page,tab):

    keyWords= quote(keyWords, safe=";/?:@&=+$,", encoding="utf-8")#秒 毫秒

    time_second,time_second_min =get_time()

    count= 10offset= (page-1) *count

    tab_str=tab_list[tab]

    url= "http://ic.snssdk.com/api/search/content/?qc_query=&offset="+str(offset)+"&action_type=input_keyword_search&has_count=&is_from_native=1&count="+str(count)+"&format=json&source=input&keyword_type=&search_id=&search_position=search_bar&"+tab_str+"&keyword="+str(keyWords)+"&from_search_subtab=1&iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_version=475404%2C680425%2C687252%2C684578%2C571130%2C665173%2C674056%2C639003%2C612193%2C691933%2C170988%2C643891%2C374117%2C687462%2C688267%2C655402%2C702095%2C613176%2C550042%2C686297%2C690816%2C687745%2C690975%2C649426%2C614097%2C677129%2C685523%2C522766%2C701302%2C416055%2C684977%2C703944%2C689886%2C693247%2C558140%2C586260%2C555254%2C471406%2C603441%2C700492%2C596392%2C660510%2C598626%2C701730%2C700540%2C686885%2C701724%2C677898%2C603383%2C603401%2C603403%2C603405%2C638928%2C699227%2C696109%2C703265%2C686031%2C661904%2C662644%2C703737%2C668775%2C673945%2C692060%2C693468%2C629151%2C645714%2C607361%2C609338%2C666965%2C698916%2C635529%2C669649%2C662099%2C696796%2C701078%2C693364%2C703077%2C697038%2C703339%2C689538%2C697022%2C668774%2C683805%2C698097%2C698380%2C688105%2C554836%2C694759%2C549647%2C699616%2C31240%2C572465%2C656568%2C644058%2C615291%2C606547%2C681183%2C703370%2C673168%2C702884%2C671426%2C546701%2C702195%2C641190%2C281297%2C678046%2C325620%2C678477%2C665474%2C696624%2C669034%2C700459%2C625065%2C652953%2C696373%2C696990%2C698915%2C693900%2C703230%2C680284%2C638336%2C467514%2C679100%2C697663%2C702714%2C702994%2C699109%2C702878%2C699036%2C595556%2C697759%2C702757%2C670151%2C661453%2C654127%2C698630%2C660830%2C688723%2C690189%2C691671%2C686376%2C699478%2C677774%2C697104%2C700437%2C486951%2C701439%2C662176%2C662350%2C633486%2C662684%2C661781%2C457480%2C649403%2C655988%2C648317%2C654049&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket="+str(time_second_min)+"&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts="+str(time_second)+"&as=a2c555b4d565fcd9004533&mas=005bc89b119dd3e1d3f552f76df48fc2a6f6cdc4e4660e08ab"response= requests.post(url=url,timeout=100,headers=headers)

    response_str= str(response.content,encoding="utf-8")print(response_str)

    result_json=json.loads(response_str)returnresult_jsondeftest():

    url= "http://ic.snssdk.com/api/search/content/?qc_query=&offset=10&action_type=input_keyword_search&has_count=&is_from_native=1&count=10&format=json&source=input&keyword_type=&search_id=&search_position=search_bar&pd=information&from=news&keyword=%E5%8D%8E%E4%B8%BA&from_search_subtab=3&iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_versionab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket=1547795488503&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts=1547795488&as=a2c5879430624c8cd12044&mas=00f71df35ab69fe5b9d8e4e1ec4ea19fc10f42e68cc0e4e63a"headers={"User-Agent": "Mozilla/5.0 (Linux; U; Android 6.0.1; zh-cn; MI 5s Build/MXB48T) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.146 Mobile Safari/537.36 XiaoMi/MiuiBrowser/8.7.1"}

    response= requests.post(url=url, timeout=100, headers=headers)

    text= str(response.content,encoding='utf-8')print(text)defget_detail_url(result_list):#秒 毫秒

    time_second, time_second_min =get_time()

    detail_url_param= "iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_version=475404%2C680425%2C687252%2C684578%2C571130%2C665173%2C674056%2C639003%2C612193%2C691933%2C170988%2C643891%2C374117%2C687462%2C688267%2C655402%2C702095%2C613176%2C550042%2C686297%2C690816%2C687745%2C690975%2C649426%2C614097%2C677129%2C685523%2C522766%2C701302%2C416055%2C684977%2C703944%2C689886%2C693247%2C558140%2C586260%2C555254%2C471406%2C603441%2C700492%2C596392%2C660510%2C598626%2C701730%2C700540%2C686885%2C701724%2C677898%2C603383%2C603401%2C603403%2C603405%2C638928%2C699227%2C696109%2C703265%2C686031%2C661904%2C662644%2C703737%2C668775%2C673945%2C692060%2C693468%2C629151%2C645714%2C607361%2C609338%2C666965%2C698916%2C635529%2C669649%2C662099%2C696796%2C701078%2C693364%2C703077%2C697038%2C703339%2C689538%2C697022%2C668774%2C683805%2C698097%2C698380%2C688105%2C554836%2C694759%2C549647%2C699616%2C31240%2C572465%2C656568%2C644058%2C615291%2C606547%2C681183%2C703370%2C673168%2C702884%2C671426%2C546701%2C702195%2C641190%2C281297%2C678046%2C325620%2C678477%2C665474%2C696624%2C669034%2C700459%2C625065%2C652953%2C696373%2C696990%2C698915%2C693900%2C703230%2C680284%2C638336%2C467514%2C679100%2C697663%2C702714%2C702994%2C699109%2C702878%2C699036%2C595556%2C697759%2C702757%2C670151%2C661453%2C654127%2C698630%2C660830%2C688723%2C690189%2C691671%2C686376%2C699478%2C677774%2C697104%2C700437%2C486951%2C701439%2C662176%2C662350%2C633486%2C662684%2C661781%2C457480%2C649403%2C655988%2C648317%2C654049&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket=" +\

    str(time_second_min)+ "&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts=" +\

    str(time_second)+ "&as=a2c555b4d565fcd9004533&mas=005bc89b119dd3e1d3f552f76df48fc2a6f6cdc4e4660e08ab"detail_url_head= "http://a.pstatp.com/article/full/22/1/"detail_url_center= "/0/0/0/0?"comment_url_head= "https://www.toutiao.com/api/comment/list/?group_id="comment_url_tail= "&offset=0&count=5"detail_list=[]for item inresult_list:

    id_str= item.get("id", None)if notid_str:

    id_str= item.get("group_id", None)

    title= item.get("title",None)

    detail_url= detail_url_head + str(id_str) + "/" + str(id_str) + detail_url_center +detail_url_param

    comment_url= comment_url_head + str(id_str) + "&item_id=" + str(id_str) +comment_url_tail

    detail_data={"detailUrl": detail_url,"commentUrl": comment_url,"id": id_str,"title": title

    }

    detail_list.append(detail_data)returndetail_listdefload_detail(detail_list):if len(detail_list) < 1:return

    for item indetail_list:

    detailUrl= item["detailUrl"]

    commentUrl= item["commentUrl"]

    id_str= item["id"]

    title= item.get("title",None)iftitle:

    title= title.replace("/","").replace("\n","").replace("\r","").replace(" ","")else:continueresponse= requests.post(url=detailUrl, timeout=100, headers=headers)

    response_str= str(response.content, encoding="utf-8")print(response_str)

    response_json= json.loads(response_str,encoding="utf-8")

    with open("detail/"+str(id_str) + title + ".txt",mode="w",encoding="utf-8") as file:

    file.write(json.dumps(response_json,ensure_ascii=False))

    response= requests.post(url=commentUrl, timeout=100, headers=headers)

    response_str= str(response.content, encoding="utf-8")print(response_str)

    response_json= json.loads(response_str,encoding="utf-8")

    with open("comment/"+str(id_str) + title + ".txt",mode="w",encoding="utf-8") as file:

    file.write(json.dumps(response_json,ensure_ascii=False))#获取时间

    defget_time():#毫秒

    t =time.time()

    time_second_min= int(round(t * 1000))#秒

    time_second =int(t)returntime_second,time_second_minif __name__ == '__main__':#test()

    keyWords = input("请输入关键词:")

    page= input("请输入页数:")

    tab_index= input("请输入模块:")

    result_json=queryList(keyWords,int(page),int(tab_index))

    result_list= result_json["data"]

    detail_list=get_detail_url(result_list)print(detail_list)

    load_detail(detail_list)

    展开全文
  • import json import time from urllib.parse import quote from urllib import request import requests """ 1.综合 2.视屏 3.资讯 4.小视屏 ...9.微头条 10.话题 """ tab_list = [ "pd=...
    import json
    import time
    from urllib.parse import quote
    from urllib import request
    import requests
    """
    1.综合
    2.视屏
    3.资讯
    4.小视屏
    5.图片
    6.用户
    7.音乐
    8.问答
    9.微头条
    10.话题
    """
    tab_list = [
        "pd=synthesis&from=search_tab",
        "pd=video&from=video",
        "pd=information&from=news",
        "pd=xiaoshipin&from=xiaoshipin",
        "pd=atlas&from=gallery",
        "pd=user&from=media",
        "pd=music&from=music",
        "pd=question&from=question",
        "pd=weitoutiao&from=weitoutiao",
        "pd=huati&from=huati"
    ]
    headers = {
            "User-Agent": "Mozilla/5.0 (Linux; U; Android 6.0.1; zh-cn; MI 5s Build/MXB48T) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.146 Mobile Safari/537.36 XiaoMi/MiuiBrowser/8.7.1"
        }
    
    #keyWords关键词,page分页,tab模块
    def queryList(keyWords,page,tab):
        keyWords = quote(keyWords, safe=";/?:@&=+$,", encoding="utf-8")
        #秒  毫秒
        time_second,time_second_min = get_time()
        count = 10
        offset = (page-1) * count
        tab_str = tab_list[tab]
        url = "http://ic.snssdk.com/api/search/content/?qc_query=&offset="+str(offset)+"&action_type=input_keyword_search&has_count=&is_from_native=1&count="+str(count)+"&format=json&source=input&keyword_type=&search_id=&search_position=search_bar&"+tab_str+"&keyword="+str(keyWords)+"&from_search_subtab=1&iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_version=475404%2C680425%2C687252%2C684578%2C571130%2C665173%2C674056%2C639003%2C612193%2C691933%2C170988%2C643891%2C374117%2C687462%2C688267%2C655402%2C702095%2C613176%2C550042%2C686297%2C690816%2C687745%2C690975%2C649426%2C614097%2C677129%2C685523%2C522766%2C701302%2C416055%2C684977%2C703944%2C689886%2C693247%2C558140%2C586260%2C555254%2C471406%2C603441%2C700492%2C596392%2C660510%2C598626%2C701730%2C700540%2C686885%2C701724%2C677898%2C603383%2C603401%2C603403%2C603405%2C638928%2C699227%2C696109%2C703265%2C686031%2C661904%2C662644%2C703737%2C668775%2C673945%2C692060%2C693468%2C629151%2C645714%2C607361%2C609338%2C666965%2C698916%2C635529%2C669649%2C662099%2C696796%2C701078%2C693364%2C703077%2C697038%2C703339%2C689538%2C697022%2C668774%2C683805%2C698097%2C698380%2C688105%2C554836%2C694759%2C549647%2C699616%2C31240%2C572465%2C656568%2C644058%2C615291%2C606547%2C681183%2C703370%2C673168%2C702884%2C671426%2C546701%2C702195%2C641190%2C281297%2C678046%2C325620%2C678477%2C665474%2C696624%2C669034%2C700459%2C625065%2C652953%2C696373%2C696990%2C698915%2C693900%2C703230%2C680284%2C638336%2C467514%2C679100%2C697663%2C702714%2C702994%2C699109%2C702878%2C699036%2C595556%2C697759%2C702757%2C670151%2C661453%2C654127%2C698630%2C660830%2C688723%2C690189%2C691671%2C686376%2C699478%2C677774%2C697104%2C700437%2C486951%2C701439%2C662176%2C662350%2C633486%2C662684%2C661781%2C457480%2C649403%2C655988%2C648317%2C654049&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket="+str(time_second_min)+"&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts="+str(time_second)+"&as=a2c555b4d565fcd9004533&mas=005bc89b119dd3e1d3f552f76df48fc2a6f6cdc4e4660e08ab"
        response = requests.post(url=url,timeout=100,headers=headers)
        response_str = str(response.content,encoding="utf-8")
        print(response_str)
        result_json = json.loads(response_str)
        return result_json
    
    def test():
        url = "http://ic.snssdk.com/api/search/content/?qc_query=&offset=10&action_type=input_keyword_search&has_count=&is_from_native=1&count=10&format=json&source=input&keyword_type=&search_id=&search_position=search_bar&pd=information&from=news&keyword=%E5%8D%8E%E4%B8%BA&from_search_subtab=3&iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_versionab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket=1547795488503&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts=1547795488&as=a2c5879430624c8cd12044&mas=00f71df35ab69fe5b9d8e4e1ec4ea19fc10f42e68cc0e4e63a"
        headers = {
            "User-Agent": "Mozilla/5.0 (Linux; U; Android 6.0.1; zh-cn; MI 5s Build/MXB48T) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.146 Mobile Safari/537.36 XiaoMi/MiuiBrowser/8.7.1"
        }
        response = requests.post(url=url, timeout=100, headers=headers)
        text = str(response.content,encoding='utf-8')
        print(text)
    
    def get_detail_url(result_list):
        # 秒  毫秒
        time_second, time_second_min = get_time()
        detail_url_param = "iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_version=475404%2C680425%2C687252%2C684578%2C571130%2C665173%2C674056%2C639003%2C612193%2C691933%2C170988%2C643891%2C374117%2C687462%2C688267%2C655402%2C702095%2C613176%2C550042%2C686297%2C690816%2C687745%2C690975%2C649426%2C614097%2C677129%2C685523%2C522766%2C701302%2C416055%2C684977%2C703944%2C689886%2C693247%2C558140%2C586260%2C555254%2C471406%2C603441%2C700492%2C596392%2C660510%2C598626%2C701730%2C700540%2C686885%2C701724%2C677898%2C603383%2C603401%2C603403%2C603405%2C638928%2C699227%2C696109%2C703265%2C686031%2C661904%2C662644%2C703737%2C668775%2C673945%2C692060%2C693468%2C629151%2C645714%2C607361%2C609338%2C666965%2C698916%2C635529%2C669649%2C662099%2C696796%2C701078%2C693364%2C703077%2C697038%2C703339%2C689538%2C697022%2C668774%2C683805%2C698097%2C698380%2C688105%2C554836%2C694759%2C549647%2C699616%2C31240%2C572465%2C656568%2C644058%2C615291%2C606547%2C681183%2C703370%2C673168%2C702884%2C671426%2C546701%2C702195%2C641190%2C281297%2C678046%2C325620%2C678477%2C665474%2C696624%2C669034%2C700459%2C625065%2C652953%2C696373%2C696990%2C698915%2C693900%2C703230%2C680284%2C638336%2C467514%2C679100%2C697663%2C702714%2C702994%2C699109%2C702878%2C699036%2C595556%2C697759%2C702757%2C670151%2C661453%2C654127%2C698630%2C660830%2C688723%2C690189%2C691671%2C686376%2C699478%2C677774%2C697104%2C700437%2C486951%2C701439%2C662176%2C662350%2C633486%2C662684%2C661781%2C457480%2C649403%2C655988%2C648317%2C654049&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket=" + \
            str(time_second_min) + "&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts=" + \
            str(time_second) + "&as=a2c555b4d565fcd9004533&mas=005bc89b119dd3e1d3f552f76df48fc2a6f6cdc4e4660e08ab"
        detail_url_head = "http://a.pstatp.com/article/full/22/1/"
        detail_url_center = "/0/0/0/0?"
        comment_url_head = "https://www.toutiao.com/api/comment/list/?group_id="
        comment_url_tail = "&offset=0&count=5"
        detail_list = []
        for item in result_list:
            id_str = item.get("id", None)
            if not id_str:
                id_str = item.get("group_id", None)
            title = item.get("title",None)
            detail_url = detail_url_head + str(id_str) + "/" + str(id_str) + detail_url_center + detail_url_param
            comment_url = comment_url_head + str(id_str) + "&item_id=" + str(id_str) + comment_url_tail
            detail_data = {
                "detailUrl" : detail_url,
                "commentUrl" : comment_url,
                "id" : id_str,
                "title" : title
            }
            detail_list.append(detail_data)
        return detail_list
    
    def load_detail(detail_list):
        if len(detail_list) < 1:
            return
        for item in detail_list:
            detailUrl = item["detailUrl"]
            commentUrl = item["commentUrl"]
            id_str = item["id"]
            title = item.get("title",None)
            if title:
                title = title.replace("/","").replace("\n","").replace("\r","").replace(" ","")
            else:
                continue
            response = requests.post(url=detailUrl, timeout=100, headers=headers)
            response_str = str(response.content, encoding="utf-8")
            print(response_str)
            response_json = json.loads(response_str,encoding="utf-8")
            with open("detail/"+str(id_str) + title + ".txt",mode="w",encoding="utf-8") as file:
                file.write(json.dumps(response_json,ensure_ascii = False))
            response = requests.post(url=commentUrl, timeout=100, headers=headers)
            response_str = str(response.content, encoding="utf-8")
            print(response_str)
            response_json = json.loads(response_str,encoding="utf-8")
            with open("comment/"+str(id_str) + title + ".txt",mode="w",encoding="utf-8") as file:
                file.write(json.dumps(response_json,ensure_ascii = False))
    
    
    #获取时间
    def get_time():
        # 毫秒
        t = time.time()
        time_second_min = int(round(t * 1000))
        #
        time_second = int(t)
        return time_second,time_second_min
    
    
    if __name__ == '__main__':
        # test()
        keyWords = input("请输入关键词:")
        page = input("请输入页数:")
        tab_index = input("请输入模块:")
        result_json = queryList(keyWords,int(page),int(tab_index))
        result_list = result_json["data"]
        detail_list = get_detail_url(result_list)
        print(detail_list)
        load_detail(detail_list)

     

    import json import time from urllib.parse import quote from urllib import request import requests """ 1.综合 2.视屏 3.资讯 4.小视屏 5.图片 6.用户 7.音乐 8.问答 9.微头条 10.话题 """ tab_list = [ "pd=synthesis&from=search_tab", "pd=video&from=video", "pd=information&from=news", "pd=xiaoshipin&from=xiaoshipin", "pd=atlas&from=gallery", "pd=user&from=media", "pd=music&from=music", "pd=question&from=question", "pd=weitoutiao&from=weitoutiao", "pd=huati&from=huati" ] headers = { "User-Agent": "Mozilla/5.0 (Linux; U; Android 6.0.1; zh-cn; MI 5s Build/MXB48T) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.146 Mobile Safari/537.36 XiaoMi/MiuiBrowser/8.7.1" } #keyWords关键词,page分页,tab模块 def queryList(keyWords,page,tab): keyWords = quote(keyWords, safe=";/?:@&=+$,", encoding="utf-8") #秒 毫秒 time_second,time_second_min = get_time() count = 10 offset = (page-1) * count tab_str = tab_list[tab] url = "http://ic.snssdk.com/api/search/content/?qc_query=&offset="+str(offset)+"&action_type=input_keyword_search&has_count=&is_from_native=1&count="+str(count)+"&format=json&source=input&keyword_type=&search_id=&search_position=search_bar&"+tab_str+"&keyword="+str(keyWords)+"&from_search_subtab=1&iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_version=475404%2C680425%2C687252%2C684578%2C571130%2C665173%2C674056%2C639003%2C612193%2C691933%2C170988%2C643891%2C374117%2C687462%2C688267%2C655402%2C702095%2C613176%2C550042%2C686297%2C690816%2C687745%2C690975%2C649426%2C614097%2C677129%2C685523%2C522766%2C701302%2C416055%2C684977%2C703944%2C689886%2C693247%2C558140%2C586260%2C555254%2C471406%2C603441%2C700492%2C596392%2C660510%2C598626%2C701730%2C700540%2C686885%2C701724%2C677898%2C603383%2C603401%2C603403%2C603405%2C638928%2C699227%2C696109%2C703265%2C686031%2C661904%2C662644%2C703737%2C668775%2C673945%2C692060%2C693468%2C629151%2C645714%2C607361%2C609338%2C666965%2C698916%2C635529%2C669649%2C662099%2C696796%2C701078%2C693364%2C703077%2C697038%2C703339%2C689538%2C697022%2C668774%2C683805%2C698097%2C698380%2C688105%2C554836%2C694759%2C549647%2C699616%2C31240%2C572465%2C656568%2C644058%2C615291%2C606547%2C681183%2C703370%2C673168%2C702884%2C671426%2C546701%2C702195%2C641190%2C281297%2C678046%2C325620%2C678477%2C665474%2C696624%2C669034%2C700459%2C625065%2C652953%2C696373%2C696990%2C698915%2C693900%2C703230%2C680284%2C638336%2C467514%2C679100%2C697663%2C702714%2C702994%2C699109%2C702878%2C699036%2C595556%2C697759%2C702757%2C670151%2C661453%2C654127%2C698630%2C660830%2C688723%2C690189%2C691671%2C686376%2C699478%2C677774%2C697104%2C700437%2C486951%2C701439%2C662176%2C662350%2C633486%2C662684%2C661781%2C457480%2C649403%2C655988%2C648317%2C654049&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket="+str(time_second_min)+"&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts="+str(time_second)+"&as=a2c555b4d565fcd9004533&mas=005bc89b119dd3e1d3f552f76df48fc2a6f6cdc4e4660e08ab" response = requests.post(url=url,timeout=100,headers=headers) response_str = str(response.content,encoding="utf-8") print(response_str) result_json = json.loads(response_str) return result_json def test(): url = "http://ic.snssdk.com/api/search/content/?qc_query=&offset=10&action_type=input_keyword_search&has_count=&is_from_native=1&count=10&format=json&source=input&keyword_type=&search_id=&search_position=search_bar&pd=information&from=news&keyword=%E5%8D%8E%E4%B8%BA&from_search_subtab=3&iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_versionab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket=1547795488503&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts=1547795488&as=a2c5879430624c8cd12044&mas=00f71df35ab69fe5b9d8e4e1ec4ea19fc10f42e68cc0e4e63a" headers = { "User-Agent": "Mozilla/5.0 (Linux; U; Android 6.0.1; zh-cn; MI 5s Build/MXB48T) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/53.0.2785.146 Mobile Safari/537.36 XiaoMi/MiuiBrowser/8.7.1" } response = requests.post(url=url, timeout=100, headers=headers) text = str(response.content,encoding='utf-8') print(text) def get_detail_url(result_list): # 秒 毫秒 time_second, time_second_min = get_time() detail_url_param = "iid=57820401425&device_id=54550815314&ac=wifi&channel=xiaomi&aid=13&app_name=news_article&version_code=707&version_name=7.0.7&device_platform=android&ab_version=475404%2C680425%2C687252%2C684578%2C571130%2C665173%2C674056%2C639003%2C612193%2C691933%2C170988%2C643891%2C374117%2C687462%2C688267%2C655402%2C702095%2C613176%2C550042%2C686297%2C690816%2C687745%2C690975%2C649426%2C614097%2C677129%2C685523%2C522766%2C701302%2C416055%2C684977%2C703944%2C689886%2C693247%2C558140%2C586260%2C555254%2C471406%2C603441%2C700492%2C596392%2C660510%2C598626%2C701730%2C700540%2C686885%2C701724%2C677898%2C603383%2C603401%2C603403%2C603405%2C638928%2C699227%2C696109%2C703265%2C686031%2C661904%2C662644%2C703737%2C668775%2C673945%2C692060%2C693468%2C629151%2C645714%2C607361%2C609338%2C666965%2C698916%2C635529%2C669649%2C662099%2C696796%2C701078%2C693364%2C703077%2C697038%2C703339%2C689538%2C697022%2C668774%2C683805%2C698097%2C698380%2C688105%2C554836%2C694759%2C549647%2C699616%2C31240%2C572465%2C656568%2C644058%2C615291%2C606547%2C681183%2C703370%2C673168%2C702884%2C671426%2C546701%2C702195%2C641190%2C281297%2C678046%2C325620%2C678477%2C665474%2C696624%2C669034%2C700459%2C625065%2C652953%2C696373%2C696990%2C698915%2C693900%2C703230%2C680284%2C638336%2C467514%2C679100%2C697663%2C702714%2C702994%2C699109%2C702878%2C699036%2C595556%2C697759%2C702757%2C670151%2C661453%2C654127%2C698630%2C660830%2C688723%2C690189%2C691671%2C686376%2C699478%2C677774%2C697104%2C700437%2C486951%2C701439%2C662176%2C662350%2C633486%2C662684%2C661781%2C457480%2C649403%2C655988%2C648317%2C654049&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=94563%2C102749&abflag=3&ssmix=a&device_type=MI+8&device_brand=Xiaomi&language=zh&os_api=27&os_version=8.1.0&openudid=1a16ce94f2005274&manifest_version_code=707&resolution=1080*2118&dpi=440&update_version_code=70714&_rticket=" + \ str(time_second_min) + "&plugin=26958&fp=9lT_FSDqFYPZFlwIFrU1FYwIPM4q&tma_jssdk_version=1.10.3.4&rom_version=miui_v10_8.8.31&ts=" + \ str(time_second) + "&as=a2c555b4d565fcd9004533&mas=005bc89b119dd3e1d3f552f76df48fc2a6f6cdc4e4660e08ab" detail_url_head = "http://a.pstatp.com/article/full/22/1/" detail_url_center = "/0/0/0/0?" comment_url_head = "https://www.toutiao.com/api/comment/list/?group_id=" comment_url_tail = "&offset=0&count=5" detail_list = [] for item in result_list: id_str = item.get("id", None) if not id_str: id_str = item.get("group_id", None) title = item.get("title",None) detail_url = detail_url_head + str(id_str) + "/" + str(id_str) + detail_url_center + detail_url_param comment_url = comment_url_head + str(id_str) + "&item_id=" + str(id_str) + comment_url_tail detail_data = { "detailUrl" : detail_url, "commentUrl" : comment_url, "id" : id_str, "title" : title } detail_list.append(detail_data) return detail_list def load_detail(detail_list): if len(detail_list) < 1: return for item in detail_list: detailUrl = item["detailUrl"] commentUrl = item["commentUrl"] id_str = item["id"] title = item.get("title",None) if title: title = title.replace("/","").replace("\n","").replace("\r","").replace(" ","") else: continue response = requests.post(url=detailUrl, timeout=100, headers=headers) response_str = str(response.content, encoding="utf-8") print(response_str) response_json = json.loads(response_str,encoding="utf-8") with open("detail/"+str(id_str) + title + ".txt",mode="w",encoding="utf-8") as file: file.write(json.dumps(response_json,ensure_ascii = False)) response = requests.post(url=commentUrl, timeout=100, headers=headers) response_str = str(response.content, encoding="utf-8") print(response_str) response_json = json.loads(response_str,encoding="utf-8") with open("comment/"+str(id_str) + title + ".txt",mode="w",encoding="utf-8") as file: file.write(json.dumps(response_json,ensure_ascii = False)) #获取时间 def get_time(): # 毫秒 t = time.time() time_second_min = int(round(t * 1000)) # 秒 time_second = int(t) return time_second,time_second_min if __name__ == '__main__': # test() keyWords = input("请输入关键词:") page = input("请输入页数:") tab_index = input("请输入模块:") result_json = queryList(keyWords,int(page),int(tab_index)) result_list = result_json["data"] detail_list = get_detail_url(result_list) print(detail_list) load_detail(detail_list)

    转载于:https://www.cnblogs.com/procedureMonkey/p/10320304.html

    展开全文
  • 关与今日头条app爬虫介绍

    千次阅读 热门讨论 2018-09-27 22:14:13
    主要爬的有今日头条,凤凰,网易,腾讯,大型网站的爬取,的总结, 1,必须熟悉手机抓包软件的配置,才可以有效的抓取到接口 2,从接口处寻找规律, 3,明确自己需要哪些内容, 4.写爬虫 我通过接口找到了所有...

    这段时间忙于工作,主要针对新闻资讯内容的爬取

    主要爬的有今日头条,凤凰,网易,腾讯,大型网站的爬取,的总结,

    1,必须熟悉手机抓包软件的配置,才可以有效的抓取到接口

    2,从接口处寻找规律,

    3,明确自己需要哪些内容,

    4.写爬虫

    我通过接口找到了所有的类目:

    classify_url = 'https://is.snssdk.com/article/category/get_subscribed/v4/?iid=45032656046&device_id=43306941482&ac=wifi&channel=update&aid=13&app_name=news_article&version_code=693&version_name=6.9.3&device_platform=android&ab_version=425531%2C511489%2C512527%2C421244%2C486953%2C494121%2C513028%2C519225%2C239095%2C500091%2C467914%2C170988%2C493249%2C398175%2C519895%2C442127%2C374116%2C437000%2C478532%2C517767%2C489317%2C501961%2C519804%2C276206%2C519509%2C459645%2C500387%2C416055%2C510641%2C392461%2C470730%2C495896%2C378451%2C471406%2C510754%2C519795%2C516760%2C509305%2C512393%2C512914%2C468954%2C271178%2C424178%2C326524%2C326532%2C496389%2C508197%2C345191%2C519949%2C516309%2C518639%2C515800%2C489801%2C510935%2C455646%2C424176%2C214069%2C497615%2C507003%2C482355%2C510710%2C519295%2C442255%2C519259%2C519017%2C520601%2C512958%2C489514%2C280447%2C520688%2C281294%2C513401%2C325616%2C515839%2C498551%2C520553%2C386888%2C520089%2C498375%2C516137%2C513578%2C467513%2C515673%2C513283%2C444465%2C304488%2C261581%2C403270%2C484178%2C457480%2C502680%2C512027%2C510536&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_group=94570%2C102754%2C181429&ab_feature=94570%2C102754&abflag=3&ssmix=a&device_type=NX563J&device_brand=nubia&language=zh&os_api=25&os_version=7.1.1&uuid=864460031530349&openudid=f1082e56b1908c9c&manifest_version_code=692&resolution=1080*1920&dpi=480&update_version_code=69305&_rticket=1538042842567&fp=GSTqFS4MLrx7FlPZc2U1Flx7P24M&tma_jssdk_version=1.3.0.1&pos=5r_-9Onkv6e_eBEKeScxeCUfv7G_8fLz-vTp6Pn4v6esrKuzr6WpqKSxv_H86fTp6Pn4v6eupLOlrqmtqqSxv_zw_O3e9Onkv6e_eBEKeScxeCUfv7G__PD87dHy8_r06ej5-L-nrKyrs6mkrKWoqrG__PD87dH86fTp6Pn4v6eupLOkrKmqpKTg&rom_version=25&plugin=26894&ts=1538042842&as=a2d5ea8a7aed3bfbec7259&mas=00f531ef9a8037a65e770c80d5e613fbf128caa4888a605ed5'

    然后找到列表页的接口

    base_url = 'https://is.snssdk.com/api/news/feed/v88/?list_count=17&category={}&refer=1&refresh_reason=5&session_refresh_idx=1&count=20&min_behot_time=1537635643&last_refresh_sub_entrance_interval=1538041336&loc_mode=0&loc_time=1537701890&latitude=39.834079&longitude=116.28459&city=%E5%8C%97%E4%BA%AC%E5%B8%82&tt_from=enter_auto&lac=4282&cid=7752303&plugin_enable=3&iid=45032656046&device_id=43306941482&ac=wifi&channel=update&aid=13&app_name=news_article&version_code=693&version_name=6.9.3&device_platform=android&ab_version=425531%2C511489%2C512527%2C421244%2C486953%2C494121%2C513028%2C519225%2C239095%2C500091%2C467914%2C170988%2C493249%2C398175%2C519895%2C442127%2C374116%2C437000%2C478532%2C517767%2C489317%2C501961%2C519804%2C276206%2C519509%2C459645%2C500387%2C416055%2C510641%2C392461%2C470730%2C495896%2C378451%2C471406%2C510754%2C519795%2C516760%2C509305%2C512393%2C512914%2C468954%2C271178%2C424178%2C326524%2C326532%2C496389%2C508197%2C345191%2C519949%2C516309%2C518639%2C515800%2C489801%2C510935%2C455646%2C424176%2C214069%2C497615%2C507003%2C482355%2C510710%2C519295%2C442255%2C519259%2C519017%2C520601%2C512958%2C489514%2C280447%2C520688%2C281294%2C513401%2C325616%2C515839%2C498551%2C520553%2C386888%2C520089%2C498375%2C516137%2C513578%2C467513%2C515673%2C513283%2C444465%2C510536%2C304488%2C261581%2C403270%2C484178%2C457480%2C502680%2C512027&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_group=94570%2C102754%2C181429&ab_feature=94570%2C102754&abflag=3&ssmix=a&device_type=NX563J&device_brand=nubia&language=zh&os_api=25&os_version=7.1.1&uuid=864460031530349&openudid=f1082e56b1908c9c&manifest_version_code=692&resolution=1080*1920&dpi=480&update_version_code=69305&_rticket=1538041336618&fp=GSTqFS4MLrx7FlPZc2U1Flx7P24M&tma_jssdk_version=1.3.0.1&pos=5r_-9Onkv6e_eBEKeScxeCUfv7G_8fLz-vTp6Pn4v6esrKuzr6WpqKSxv_H86fTp6Pn4v6eupLOlrqmtqqSxv_zw_O3e9Onkv6e_eBEKeScxeCUfv7G__PD87dHy8_r06ej5-L-nrKyrs6mkrKWoqrG__PD87dH86fTp6Pn4v6eupLOkrKmqpKTg&rom_version=25&plugin=26894&ts=1538041336&as=a2d56aba88bfab35ec7222&mas=00b339523bce59cab47cb99ee6d66e76d36864a4888a8080da&cp=58b0a9cfaa5f8q1'
    

    注意:category ={} 为所对应的类目

    category 所对应的字段可以从类目的接口获取

    字段匹配的代码如下:

            res = requests.get(classify_url)
            html = json.loads(res.text)
            datas = html['data']['data']
            print(len(datas))
            for data in datas:
                # 栏目
                column = data['name']
                print(column)
                #类目
                category = data['category']

    然后进行字段拼接就可以找到所对应的列表页,得到列表页然后就要获取到详情页的地址

    详情页的地址也只找的接口

    这就简单多了,有好几种可行方案,我就在这里说一种

    我通过抓包软件找到接口

    text_url = "http://a3.pstatp.com/article/content/21/1/{}/{}/1/0/?iid=37457543399&device_id=55215909025&ac=wifi&channel=tengxun2&aid=13&app_name=news_article&version_code=682&version_name=6.8.2&device_platform=android&ab_version=261581%2C403271%2C197606%2C293032%2C405731%2C418881%2C413287%2C271178%2C357705%2C377637%2C326524%2C326532%2C405403%2C415915%2C409847%2C416819%2C402597%2C369470%2C239096%2C170988%2C416198%2C390549%2C404717%2C374117%2C416708%2C416648%2C265169%2C415090%2C330633%2C297058%2C410260%2C276203%2C413705%2C320832%2C397738%2C381405%2C416055%2C416153%2C401106%2C392484%2C385726%2C376443%2C378451%2C401138%2C392717%2C323233%2C401589%2C391817%2C346557%2C415482%2C414664%2C406427%2C411774%2C345191%2C417119%2C377633%2C413565%2C414156%2C214069%2C31211%2C414225%2C411334%2C415564%2C388526%2C280449%2C281297%2C325614%2C324092%2C357402%2C414393%2C386890%2C411663%2C361348%2C406418%2C252782%2C376993%2C418024&ab_client=a1%2Cc4%2Ce1%2Cf1%2Cg2%2Cf7&ab_feature=102749%2C94563&abflag=3&ssmix=a&device_type=MI+3C&device_brand=Xiaomi&language=zh&os_api=19&os_version=4.4.4&uuid=99000549116036&openudid=efcc6d4284c6c458&manifest_version_code=682&resolution=1080*1920&dpi=480&update_version_code=68210&_rticket=1532142082952&rom_version=miui_v7_5.12.4&plugin=32&pos=5r_88Pzt0fzp9Ono-fi_p66ps6-oraylqrG__PD87d706eS_p794Iw14KgN4JR-_sb_88Pzt0fLz-vTp6Pn4v6esrKqzrKSvqq6k4A%3D%3D&fp=z2T_L2mOLSxbFlHIPlU1FYweFzKe&ts=1532142082&as=a255cac5b2208bd2a23862&mas=00e35bc961329fe4e2da0242394f32b692264a2c00d8a582a8"
           

    注意:{}{}这个也是所需要匹配的可以从列表页获取,列表页获取的这个字段有的时候有有的时候没有,所以我用的异常处理

    #获取这个字段的代码如下:

                res = requests.get(base_url, headers=self.headers)
                html = json.loads(res.text)
                print(res.status_code, '-------')
                datas = html['data']
                for data in datas:
                    try:
                        # 详情页的id
                        group_id = (json.loads(data["content"]))["group_id"]
                    except:
                        group_id = 0
                    if group_id != 0:
                        print(group_id)

    #接下来就是拼接详情页的地址了

    在然后就是匹配获取标题还有内容了在这里就不多说了,没有什么技术含量:

    想要源码>>>>>>>>>>>>>>>>>>>>>>>可以联系本主。。。希望你们自己通过抓包软件,找到接口,然后按照我的思路去完成??他的反爬主要是接口的访问量,还有要换ua,还有ip。。后续会有其他新闻类的介绍,谢谢关注!!!!

    展开全文
  • #coding:utf-8importbase64importrandom, reimportsqlite3importredis, pickleimportjson, timeimporturllib3,urllib2,hashlibfrom datetime importdatetimeimportthreadingimportlogging.handlersimportsysreload(s...

    #coding:utf-8

    importbase64importrandom, reimportsqlite3importredis, pickleimportjson, timeimporturllib3,urllib2,hashlibfrom datetime importdatetimeimportthreadingimportlogging.handlersimportsys

    reload(sys)

    sys.setdefaultencoding('utf-8')importuuidimportrequests

    session=requests.session()

    #把连接加密成 MD5 生成唯一的主键defmd5(str):importhashlib

    m=hashlib.md5()

    m.update(str)returnm.hexdigest()defjinri():

    list_data=[]for i in range(1,20):

    #请求得到url 链接url= "http://www.toutiao.com/api/pc/feed/"data={"category":"news_game","utm_source":"toutiao","widen":str(i),"max_behot_time":"0","max_behot_time_tmp":"0","tadrequire":"true","as":"479BB4B7254C150","cp":"7E0AC8874BB0985",

    }

    headers={"Host":"www.toutiao.com","Connection":"keep-alive","Accept":"text/javascript, text/html, application/xml, text/xml, */*","X-Requested-With":"XMLHttpRequest","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36","Content-Type":"application/x-www-form-urlencoded","Referer":"http://www.toutiao.com/ch/news_hot/","Accept-Encoding":"gzip, deflate","Accept-Language":"zh-CN,zh;q=0.8",

    }

    result1= session.get(url=url,params=data,headers=headers).text

    result2=json.loads(result1)if result2["message1"] =="success":

    for i in result2["data"]:

    source_url=i["source_url"]

    headers={"Host":"www.toutiao.com","Connection":"keep-alive","Cache-Control":"max-age=0","Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8","Accept-Encoding":"gzip, deflate","Accept-Language":"zh-CN,zh;q=0.8",

    }

    url1= "http://www.toutiao.com" +str(source_url)try:

    return_data= session.get(url=url1, headers=headers).contentexcept:pass

    #print return_data

    try:

    contentData= re.findall('(.*?)',return_data)[0]except:

    contentData= ""cx= sqlite3.connect("C:\\Users\\xuchunlin\\PycharmProjects\\study\\db.sqlite3",check_same_thread=False)

    cx.text_factory=strtry:print "正在插入链接 %s 数据" %(url)

    chinese_ta= i["chinese_tag"]

    media_avatar_url= i["media_avatar_url"]

    is_feed_ad= i["is_feed_ad"]

    tag_url= i["tag_url"]

    title= i["title"]

    tag= i["tag"]

    label= str(i["label"])

    abstract= i["abstract"]

    source_url= i["source_url"]printtitleprintchinese_taprintmedia_avatar_urlprintis_feed_adprinttag_urlprinttagprintlabelprintabstractprintsource_url

    url2=md5(str(url1))

    cx.execute("INSERT INTO toutiao (title,chinese_ta,media_avatar_url,is_feed_ad,tag_url,tag,label,abstract,source_url,url,contentData)VALUES (?,?,?,?,?,?,?,?,?,?,?)",

    (str(title), str(chinese_ta), str(media_avatar_url), str(is_feed_ad), str(tag_url), str(tag), str(label), str(abstract), str(source_url), str(url2),str(contentData)))

    cx.commit()#time.sleep(2)

    exceptException as e:printeprint "cha ru shi bai"cx.close()else:print "请求失败"

    returnlist_dataprint jinri()

    展开全文
  • ​ 前文中我们通过理论讲解和三个爬虫实例构建了一个比较完整的爬虫知识框架,而这三个实例有一个特点就是都是网页端爬虫,本文借助Fildder软件和Python实现对手机app进行抓包的爬虫程序,使得app...二、今日头条app...
  • ​ 前文中我们通过理论讲解和三个爬虫实例构建了一个比较完整的爬虫知识框架,而这三个实例有一个特点就是都是网页端爬虫,本文借助Fildder软件和Python实现对手机app进行抓包的爬虫程序,使得app的数据也无所遁形!...
  • 二、今日头条app抓包实战 1.源代码获取 首先我们在上述配置环境下打开手机今日头条app,并搜索“疫情”: 之后可在Fildder中观察到弹出诸多条目,通过查看和经验筛选发现带有search的url即为我们所求,双击这条URL...
  • 前文中我们通过理论讲解和三个爬虫实例构建了一个比较完整的爬虫知识框架,而这三个实例有一个特点就是都是网页端爬虫,本文借助Fildder软件和Py...
  • def ttapi(url): ####APP模式 channel = re.search('ch/(.*?)/', url).group(1) s = requests.session() headers = { 'Accept':'image/webp,image/*;q=0.8', 'User-Agent':'News/6.9.8.36 CFNetwork/975.0.3 ...
  • Scrapy第四篇:APP抓取 | 存入MongoDB咳咳,大家别误会哈,标题不想搞什么大新闻,恰巧是“今日头条爬虫而已。。。以前抓的都是网页端的数据,今天,我们来抓一抓手机App中的数据道理其实非常简单,抓包调用Api,...
  • "0CF8C421&device_id=51911855605&channel=App%20Store&resolution=750*1334&aid=13&ab_version=304488,346137,349052,271178,326588,326524,326532,338589,3" \ "36927,295827,325048,345778,239096,348856,...
  • 爬取今日头条[某关键字]图片 抓取今日头条[某关键字]图片,将每组图片分文件夹保存到本地。 分析 大致思路: 分析搜索页面,找到搜索结果数据来源 ​ 经过分析发现数据来源为一个json文件,链接存储于该文件下的...
  • 这几天在用手机版的APP头条时会发现,它开放了一个疫情数据的展示页面,作为一个有理想有抱负的,从事数据工作方面的程序员来说,想到了把他的数据拿下来就好了,这样我们自己也可以做一些数据处理的工作,从而进行...
  • 今天的目标是下载头条首页搜索街拍,下载各个标题下的大图,如下: 今天有点懒,仅分享下代码,自行学习消化: import requests import os from hashlib import md5 for i in range(3): offset=i*20 url='...
  • emmmmm,太简单了,写了四个程序,只有一个稍稍还行,凑合着看。 直接摆出程序吧,要用的,自己分析,没啥难度 import requests ...app_name=web_search&offset=0&format=json&keyw...
  • 相比于"今日头条"App, 大家可能对"趣头条"的了解少了很多,趣头条App作为一款以"阅读有奖"来吸引阅读的新闻类app,用户群体很大. 趣头条的首页如下,与其他内容类App大同小异,即包含列表页(样例地址)和详情页样例地址....
  • 最近沉迷于python爬虫,学习的是崔庆才老师的这本书python3网络爬虫开发实战,书是好书,只不过因为技术更新,原书的一些代码已经不能使用,特写此篇来记录自己的一些爬坑经历。 爬取结果: 如果你爬取的套图只有...
  • 今日头条的方法推荐公开课。服务端是阿里云CentOS7+Play!+Scala+Docker+Appache Mahout, 爬虫是Scrapy,做了安卓客户端和简易的iOS客户端。代码开源在:https://github.com/foamliu/hackathon-ocw 下边是服务端...
  • 1.第一步:知道今日头条的接口用的是ajax动态参数接口,所以选择用selenium模拟浏览器爬虫,但是效率极差。 2.第二步:头条号web端找接口,破解as,cp,sign参数,但还是不稳定。 3.第三步:以尝试的态度在app端...
  • ​通过前七章的学习,相信大家对整个爬虫有了一个比较全貌的了解 ,其中分别涉及四个案例:静态网页爬取、动态Ajax网页爬取、Selenium浏览器模拟爬取和Fillder今日头条app爬取,基本涵盖了爬虫的大致套路。...
  • 众所周知,互联网已经进入了下半场,上半场积累的数据非常多,要从海量的数据中高效提取出目标数据用于数据分析、人工智能、甚至抄袭同类产品(参考今日头条)。 OK,高效提取出目标数据就是爬虫的意义 数据的来源 ...
  • 我们知道,类似今日头条、UC头条这类的App,其内容绝大部分是来源于爬虫抓取。我们可以使用很多语言来实现爬虫,C/C++、Java、Python、PHP、NodeJS等,常用的框架也有很多,像Python的Scrapy、NodeJS的cheerio、...
  • 爬取网页的方法 按照网页和APP划分,参考崔庆才老师的分享,可以划分为:网页爬取和App...(2)客户端渲染:页面内容由JavaScript渲染而成,真实数据通过Ajax获取,比如淘宝,今日头条网页内容。当操作获取更多数据时,
  • 今日头条文章 豆瓣电影 大众评论 链家租房 我爱我家租房 京东商品 京东商品评论 淘宝商品 天猫商品 天猫商品评论 亚马逊商品 亚马逊商品评论 Kickstarter评论 Kickstarter用户 微博用户信息 微博用户关注 抖音小视频...
  • 爬取,内容源趣头条60%,腾讯20%,今日20%。内容形式图文,图集,短视频。 爬虫爬到后去除掉版权信息后入库,然后通过协同过滤算法提供给客户端。 用户获取: 收徒裂变(产品核心),基本实现了低成本大规模获取...
  • 内容,像今日头条那种爬内容,现在很少公司做了,因为做不出第二个今日头条app了。 d.征信,这个基本被打击全没了,而且也不知道怎么会有这种爬虫,个人信息的接口不知道哪里来的。 爬虫目前就电商和工商在做比较...
  • 2017 优秀博文

    2017-08-24 15:03:52
    仿今日头条的开源项目 手把手教你从零开始做一个好看的 APP KeepGank.IO - 又一款开源 Gank.IO 客户端. 欢迎 star 一个用来巩固 Android 基础的 APP jsoup爬虫简书首页数据做个小Demo 快毕业了,撸一个小项目(新
  • 今日头条,网易,腾讯等新闻 计算机书籍控图书 JK (制服写真) 爬虫 K 看知乎 课程格子校花榜 konachan L 链家 链家成交在售在租房源 拉勾 炉石传说 leetcode 领英销售导航器爬虫 LinkedInSalesNavigator M ...
  • 反爬集锦案例

    2019-10-18 15:24:11
    写了多年的爬虫,现在总结一份反爬方案和反爬实现方式: ...`source` tinyint(4) NOT NULL DEFAULT '0' COMMENT '来源2:今日头条 14小年糕小程序 15种子视频 16西瓜视频 17人民日报客户端 18央视新闻客户...

空空如也

空空如也

1 2
收藏数 35
精华内容 14
关键字:

今日头条app爬虫

爬虫 订阅