• 开发者最爱用的浏览器，谷歌Chrome浏览器，是诸多程序猿最受欢迎的模拟器之一，其开发者模式下的测试设备（类似模拟器）功能更是相当好用，其他浏览器的类似功能都是根据...device pixel ratio：设备像素比 User agent.

开发者最爱用的浏览器，谷歌Chrome浏览器，是诸多程序猿最受欢迎的模拟器之一，其开发者模式下的测试设备（类似模拟器）功能更是相当好用，其他浏览器的类似功能都是根据Chrome的内核优化过来的，但是官方给的测试设备列表有点少，只能自己添加，为了方便查询，做了下这个表格，供大家收藏使用。

这个功能官方是叫“Chrome DevTools”，需要的设备参数如下表
Device Name：设备名称 Width：宽 Height：高
device pixel ratio：设备像素比 User agent string：浏览器标识

Device NameWidthHeightDPRUser agent string
iPhone 114148962Mozilla/5.0 (iPhone; CPU iPhone OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Mobile/15E148 Safari/604.1
iPhone 11 Pro3758123Mozilla/5.0 (iPhone; CPU iPhone OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Mobile/15E148 Safari/604.1
iPhone 11 Pro Max4148963Mozilla/5.0 (iPhone; CPU iPhone OS 13_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1
“iPad 10.2"” (2019)"81010802Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Safari/605.1.15
iPhone Xs3758123Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1
iPhone Xs Max4148963Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1
iPhone XR4148962Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1
iPhone X3758123Mozilla/5.0 (iPhone; CPU iPhone OS 11_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B93 Safari/604.1
iPhone 8 Plus4147363Mozilla/5.0 (iPhone; CPU iPhone OS 11_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B93 Safari/604.1
iPhone 83756673Mozilla/5.0 (iPhone; CPU iPhone OS 11_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B93 Safari/604.1
iPhone 7 Plus4147363Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B150 Safari/604.1
iPhone 73756672Mozilla/5.0 (iPhone; CPU iPhone OS 11_1_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/11.0 Mobile/14E304 Safari/604.1
iPhone SE3205682Mozilla/5.0 (iPhone; CPU iPhone OS 9_3 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13E233 Safari/601.1
iPad Mini 476810242Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B93 Safari/604.1
iPad Pro (10.5")83411122Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B101 Safari/604.1
iPad Pro (12.9")102413662Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B101 Safari/604.1
Samsung Galaxy Tab S376810242Mozilla/5.0 (Linux; Android 7.0; SM-T827V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.83 Safari/537.36
Samsung Galaxy Tab S411387122.25Mozilla/5.0 (Linux; Android 8.1.0; SM-T837A) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.80 Safari/537.36
Samsung Galaxy Note 84128463.5Mozilla/5.0 (Linux; Android 7.1.1; SAMSUNG SM-N950U Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/6.2 Chrome/56.0.2924.87 Mobile Safari/537.36
Samsung Galaxy S83607403Mozilla/5.0 (Linux; Android 7.1.1; SAMSUNG SM-N950U Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/6.2 Chrome/56.0.2924.87 Mobile Safari/537.36
Samsung Galaxy S83607403Mozilla/5.0 (Linux; Android 7.0; SM-G950U Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36
Samsung Galaxy S8+4128463.5Mozilla/5.0 (Linux; Android 7.0; SM-G955U Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36
Samsung Galaxy S73606404Mozilla/5.0 (Linux; Android 6.0.1; SM-G935V Build/MMB29M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.83 Mobile Safari/537.36

为方便大家本地使用，也可以直接下载表格，但表格不保持实时更新哦
下载链接

本表相当于一个工具表，会一直根据当下市面流行型号保持更新。

展开全文
• UserAgent() chromes = ua.data['browsers']['chrome'][5:40] shuffle(chromes) return choice(chromes) The final dataset after scanning is available here. You can download this data and run your ...

恶意url

In this article, we walk through developing a simple feature set representation for identifying malicious URLs. We will create feature vectors for URLs and use these to develop a classification model for identifying malicious URLs. To evaluate how good the features are in separating malicious URLs from benign URLs, we build a Decision-Tree based machine learning model to predict the maliciousness of a given URL.

在本文中，我们将逐步开发一种用于识别恶意URL的简单功能集表示形式。 我们将为URL创建特征向量，并使用它们来开发用于识别恶意URL的分类模型。 为了评估功能在将恶意URL与良性URL分离方面的优势，我们构建了基于决策树的机器学习模型来预测给定URL的恶意性。

Malicious websites are well-known threats in cybersecurity. They act as an efficient tool for propagating viruses, worms, and other types of malicious codes online and are responsible for over 60% of most cyber attacks. Malicious URLs can be delivered via email links, text messages, browser pop-ups, page advertisements, etc. These URLs may be links to dodgy websites or most likely have embedded ‘downloadables’. These embedded downloads can be spy-wares, key-loggers, viruses, worms, etc. As such it has become a priority for cyber defenders to detect and mitigate the spread of malicious codes within their networks promptly. Various techniques for malicious URL detectors have previously relied mainly on URL blacklisting or signature blacklisting. Most of these techniques offer ‘after-the-fact’ solutions. To improve the timeliness and abstraction of malicious URL detection methods, machine learning techniques are increasingly being accepted.

恶意网站是网络安全中众所周知的威胁。 它们是在线传播病毒，蠕虫和其他类型的恶意代码的有效工具，并负责大多数网络攻击的60％以上。 恶意URL可以通过电子邮件链接，文本消息，浏览器弹出窗口，页面广告等进行传递。这些URL可能是指向不可靠网站的链接，或者最有可能嵌入了“可下载内容”。 这些嵌入式下载可以是间谍软件，按键记录程序，病毒，蠕虫等。因此，网络防御者应优先考虑立即检测并缓解恶意代码在网络中的传播。 恶意URL检测器的各种技术以前主要依靠URL黑名单或签名黑名单。 这些技术大多数都提供“事后”解决方案。 为了提高恶意URL检测方法的及时性和抽象性，越来越多地接受机器学习技术。

To develop a machine learning model, we need a feature extraction framework for featurizing URLs or converting URLs into feature vectors. In this article, We will collect samples of known malicious URLs and known benign URLs. We then develop a fingerprinting framework and extract a given set of M features for all URLs in the sample. We test the usefulness of these features in separating malicious URLs from benign URLs by developing a simple predictive model with these features. Finally, we measure the model’s ability to predict the maliciousness of URLs as the effectiveness of the features in separating malicious URLs from benign URLs.

要开发机器学习模型，我们需要一个特征提取框架，用于特征化URL或将URL转换为特征向量。 在本文中，我们将收集已知恶意URL和已知良性URL的样本。 然后，我们开发一个指纹识别框架，并为示例中的所有URL提取给定的M个功能集。 通过开发具有这些功能的简单预测模型，我们测试了这些功能在区分恶意URL和良性URL方面的有用性。 最后，我们测量模型预测URL恶意性的能力，作为功能将恶意URL与良性URL分离的功能的有效性。

The image below is an overview of the methodological process in this article.

下图是本文方法论过程的概述。

# 数据 (The Data)

We collected data from two sources: Alexa Top 1000 sites and phishtank.com. 1000 assumed benign URLs were crawled from Alexa top 1000 websites and 1000 suspected malicious URLs were crawled from phishtank.com. Due to virustotal API limit rates, we randomly sample 500 assumed benign URLs and 500 assumed malicious URLs. The URLs were then scanned through virustotal. URLs with 0 malicious detections were labeled as benign (b_urlX) and URLs with at least 8 detections were labeled as malicious (m_urlX). we dumped the JSON results of each scan in corresponding files ‘b_urlX.json’, ‘m_urlX.json’. You can find these files here.

我们从两个来源收集数据：Alexa前1000个网站和phishtank.com。 从Alexa排名前1000的网站中爬取了1000个假定的良性URL，并从phishtank.com中爬取了1000个可疑恶意URL。 由于API的病毒总数限制率，我们随机抽取了500个假定的良性URL和500个假定的恶意URL。 然后通过virustotal扫描URL。 具有0次恶意检测的URL被标记为良性(b_urlX)，具有至少8次检测的URL被标记为恶意(m_urlX)。 我们将每次扫描的JSON结果转储到相应的文件“ b_urlX.json”，“ m_urlX.json”中。 您可以在这里找到这些文件。

from requests import get
from os import listdir
import pandas as pd
import numpy as np
from pandas.io.json import json_normalize
import seaborn as sns
import matplotlib.pyplot as plt
import math
from datetime import datetime
plt.rcParams["figure.figsize"] = (20,20)

# 处理API速率限制和IP阻止 (Handling API Rate Limits and IP Blocking)

To confirm that malicious URLs in the sample are malicious, we need to send multiple requests to VirusTotal. VirustTotal provides aggregated results from multiple virus scan engines. Also, we pass URLs through (Shodan)[shodan.io]. Shodan is a search engine for all devices connected to the internet providing service-based features of the URL’s server. VirusTotal and Shodan currently have API rate limits of 4 requests per minute and at least 10,000 requests per month respectively per API key. While the number of URL requests for the data fell within the Shodan API request limits, VirusTotal proved a little more difficult. This is addressed by creating several VT API Keys (be kind, 4 at most) and randomly sampling them in each request. In addition to limits on the number of API requests, sending multiple requests within a short period will lead to IP blocking from VT and Shodan servers. We write a small crawler to get the latest set of elite IP addresses from https://free-proxy-list.net/ and create a new proxy-list on each request given the very short lifespan of free proxies. In addition to IP pooling, we use Python’s FakeUserAgent library to switch User-Agents on each request.

为了确认样本中的恶意URL是恶意的，我们需要向VirusTotal发送多个请求。 VirustTotal提供来自多个病毒扫描引擎的汇总结果。 另外，我们通过(Shodan)[shodan.io]传递URL。 Shodan是一个搜索引擎，它用于连接到Internet的所有设备，提供URL服务器的基于服务的功能。 VirusTotal和Shodan目前对API速率的限制是每分钟每分钟4个请求，每个API密钥每月分别至少有10,000个请求。 尽管对数据的URL请求数量在Shodan API请求限制之内，但VirusTotal却被证明有点困难。 通过创建几个VT API密钥(最多为4个)并在每个请求中随机采样它们，可以解决此问题。 除了限制API请求的数量外，在短时间内发送多个请求还会导致VT和Shodan服务器的IP阻塞。 我们编写了一个小型搜寻器，以从https://free-proxy-list.net/获取最新的精英IP地址集，并在给定免费代理的使用寿命非常短的情况下，针对每个请求创建一个新的代理列表。 除了IP池外，我们还使用Python的FakeUserAgent库在每个请求上切换User-Agent。

Finally, For each request, we can send 16 requests per minute as opposed to the previous 4, with a new proxy and user agent. Each request has the following request parameters:

最后，对于每个请求，我们可以使用新的代理和用户代理，每分钟发送16个请求，而之前的请求为4个。 每个请求具有以下请求参数：

• 1 VirusTotal Key: Sample from VT API keys pool.

1 VirusTotal Key：VT API密钥池中的样本。
• 1 Shodan Key: Sample from Shodan API keys pool.

1 Shodan密钥：Shodan API密钥池中的示例。
• 1 IP: Send a request to https://free-proxy-list.net/ to get the latest free elite proxy.

1个IP：向https://free-proxy-list.net/发送请求以获取最新的免费精英代理。

• 1 User-Agent: Sample useable user agents from Python’s (Fake User-Agent)[https://pypi.org/project/fake-useragent/]

1个User-Agent：来自Python的示例用户代理(虚假User-Agent)[https://pypi.org/project/fake-useragent/]

The scanning from Shodan and VT produced the following dataset. From shodan, we extract the following features:

Shodan和VT的扫描产生了以下数据集 。 从shodan中，我们提取了以下功能：

• numServices: Total number of services (open ports) running on the host

numServices：主机上运行的服务总数(开放端口)
• robotstxt: Is the site has robots txt enabled

robotstxt：网站是否启用了robots txt
## This gets elite proxy from https://pypi.org/project/fake-useragent/ on every requestclass Pooling(object):
def __init__(self):
self.proxies_url = ''
'''returns a list of currently available elite proxies'''
def proxy_pool(self, url = 'https://free-proxy-list.net/'):
pq, proxies = get_page(url), []
tr = pq('table#proxylisttable.table tbody tr')
rows = [j.text() for j in [PyQuery(i)('td') for i in tr]]
rows = [i for i in rows if 'elite' in i]
for row in rows:
row = row.split()
data = {}
data['ip'] = row[0]
data['port'] = row[1]
data['country'] = row[3]
data['proxy'] = {
'http' :'http://{}:{}'.format(data['ip'], data['port']),
'https' :'https://{}:{}'.format(data['ip'], data['port'])
}
proxies.append(data)
return choice(proxies)
'''return a random list of user agents'''
def ua_pool(self):
ua = UserAgent()
chromes = ua.data['browsers']['chrome'][5:40]
shuffle(chromes)
return choice(chromes)

The final dataset after scanning is available here. You can download this data and run your analysis.

扫描后的最终数据集可在此处获得 。 您可以下载此数据并运行分析。

# 指纹URL(用于恶意软件URL检测的特征URL) (Fingerprinting URLS (Featurizing URLs for Malware URL Detection))

The goal is to extract URL characteristics that are important in separating malicious URLs from good URLs. First, let’s look at the relevant parts in the structure of a URL.

目的是提取URL特征，这些特征对于将恶意URL与良好URL分开至关重要。 首先，让我们看一下URL结构中的相关部分。

A URL (short for Uniform Resource Locator) is a reference that specifies the location of a web resource on a computer network and a mechanism for retrieving it. The URL is made up of different components as shown in the figure below. The protocol or scheme specifies how (or what is needed for) information is to be transferred. The hostname is a human-readable unique reference of the computer’s IP address on the computer network. The Domain Name Service (DNS) naming hierarchy maps an IP address to a hostname. Compromised URLs are used to perpetrate cyber-attacks online. These attacks may be in any or more forms of phishing emails, spam emails, and drive-by downloads.

URL(统一资源定位符的缩写)是一个引用，用于指定Web资源在计算机网络上的位置以及用于检索该资源的机制。 URL由不同的组件组成，如下图所示。 该协议或方案指定如何(或需要什么)信息进行传输。 主机名是计算机网络上计算机IP地址的人类可读的唯一引用。 域名服务(DNS)命名层次结构将IP地址映射到主机名。 受损的URL用于在线进行网络攻击。 这些攻击可能以网络钓鱼电子邮件，垃圾邮件和偷渡式下载的任何或更多形式出现。

Regarding domains, owners buy domains that people find easier to remember. Owners would normally want names that are specific to a brand, product, or service which they are delivering. This part (the domain)of the URL cannot be changed once set. Malicious domain owners may opt for multiple cheap domain names for example ‘xsertyh.com’.

关于域名，所有者购买人们容易记住的域名。 所有者通常希望使用特定于他们所提供的品牌，产品或服务的名称。 网址的此部分(域)一旦设置就无法更改。 恶意域名所有者可能会选择多个廉价域名，例如“ xsertyh.com”。

The free URL parameters are parts of a URL that can be changed to create new URLs. These include directory names, file paths, and URL parameters. These free URL parameters are usually manipulated by attackers to create new URLs, embed malicious codes and propagate them.

可用的URL参数是URL的一部分，可以更改以创建新的URL。 其中包括目录名称，文件路径和URL参数。 这些免费的URL参数通常由攻击者操纵以创建新的URL，嵌入恶意代码并传播它们。

There are many techniques for malicious URL detection, two main techniques being a) Blacklisting Techniques, and b) Machine Learning Techniques. Blacklisting involves maintaining a database of known malicious domains and comparing the hostname of a new URL to hostnames in that database. This has an ‘after-the-fact’ problem. It will be unable to detect new and unseen malicious URL, which will only be added to the blacklist after it has been observed as malicious from a victim. Machine learning approaches, on the other hand, provide a predictive approach that is generalizable across platforms and independent of prior knowledge of known signatures. Given a sample of malicious and benign malware samples, ML techniques will extract features of known good and bad URLs and generalize these features to identify new and unseen good or bad URLs.

有许多用于恶意URL检测的技术，其中两项主要技术是a)黑名单技术和b)机器学习技术。 黑名单包括维护已知恶意域的数据库，并将新URL的主机名与该数据库中的主机名进行比较。 这有一个“事后”问题。 它将无法检测到新的和看不见的恶意URL，只有在被受害者视为恶意URL后，该URL才会添加到黑名单中。 另一方面，机器学习方法提供了一种预测性方法，该方法可跨平台通用并且独立于已知签名的先验知识。 给定一个恶意和良性恶意软件样本示例，机器学习技术将提取已知的好和坏URL的功能，并将这些功能概括化以识别新的和看不见的好或坏URL。

The URL fingerprinting process targets 3 types of URL features:

URL指纹识别过程针对3种类型的URL功能：

• URL String Characteristics: Features derived from the URL string itself.

URL字符串特征：从URL字符串本身派生的功能。
• URL Domain Characteristics: Domain characteristics of the URLs domain. These include whois information and shodan information.

URL域特征：URL域的域特征。 这些信息包括Whois信息和Shodan信息。
• Page Content Characteristics: Features extracted from the URL’s page (if any)

页面内容特征：从URL页面提取的特征(如果有)

A summary of all features extracted are shown in the table below:

下表中列出了提取的所有功能的摘要：

# This creates a feature vector from a URLclass UrlFeaturizer(object):
def __init__(self, url):
self.url = url
self.domain = url.split('//')[-1].split('/')[0]
self.today = datetime.now()

try:
self.whois = whois.query(self.domain).__dict__
except:
self.whois = None

try:
self.response = get(self.url)
self.pq = PyQuery(self.response.text)
except:
self.response = None
self.pq = None

## URL string Features
def entropy(self):
string = self.url.strip()
prob = [float(string.count(c)) / len(string) for c in dict.fromkeys(list(string))]
entropy = sum([(p * math.log(p) / math.log(2.0)) for p in prob])
return entropy

def numDigits(self):
digits = [i for i in self.url if i.isdigit()]
return len(digits)

def urlLength(self):
return len(self.url)

def numParameters(self):
params = self.url.split('&')
return len(params) - 1

def numFragments(self):
fragments = self.url.split('#')
return len(fragments) - 1

def numSubDomains(self):
subdomains = self.url.split('http')[-1].split('//')[-1].split('/')
return len(subdomains)-1

def domainExtension(self):
ext = self.url.split('.')[-1].split('/')[0]
return ext

## URL domain features
def hasHttp(self):
return 'http:' in self.url
def hasHttps(self):
return 'https:' in self.url
def urlIsLive(self):
return self.response == 200

def daysSinceRegistration(self):
if self.whois and self.whois['creation_date']:
diff = self.today - self.whois['creation_date']
diff = str(diff).split(' days')[0]
return diff
else:
return 0
def daysSinceExpiration(self):
if self.whois and self.whois['expiration_date']:
diff = self.whois['expiration_date'] - self.today
diff = str(diff).split(' days')[0]
return diff
else:
return 0

## URL Page Features
def bodyLength(self):
if self.pq is not None:
return len(self.pq('html').text()) if self.urlIsLive else 0
else:
return 0
def numTitles(self):
if self.pq is not None:
titles = ['h{}'.format(i) for i in range(7)]
titles = [self.pq(i).items() for i in titles]
return len([item for s in titles for item in s])
else:
return 0
def numImages(self):
if self.pq is not None:
return len([i for i in self.pq('img').items()])
else:
return 0
if self.pq is not None:
return len([i for i in self.pq('a').items()])
else:
return 0

def scriptLength(self):
if self.pq is not None:
return len(self.pq('script').text())
else:
return 0

def specialCharacters(self):
if self.pq is not None:
bodyText = self.pq('html').text()
schars = [i for i in bodyText if not i.isdigit() and not i.isalpha()]
return len(schars)
else:
return 0

def scriptToSpecialCharsRatio(self):
if self.pq is not None:
sscr = self.scriptLength()/self.specialCharacters
else:
sscr = 0
return sscr

def scriptTobodyRatio(self):
if self.pq is not None:
sbr = self.scriptLength()/self.bodyLength
else:
sbr = 0
return sbr

def bodyToSpecialCharRatio(self):
if self.pq is not None:
bscr = self.specialCharacters()/self.bodyLength
else:
bscr = 0
return bscr

def run(self):
data = {}
data['entropy'] = self.entropy()
data['numDigits'] = self.numDigits()
data['urlLength'] = self.urlLength()
data['numParams'] = self.numParameters()
data['hasHttp'] = self.hasHttp()
data['hasHttps'] = self.hasHttps()
data['urlIsLive'] = self.urlIsLive()
data['bodyLength'] = self.bodyLength()
data['numTitles'] = self.numTitles()
data['numImages'] = self.numImages()
data['scriptLength'] = self.scriptLength()
data['specialChars'] = self.specialCharacters()
data['ext'] = self.domainExtension()
data['dsr'] = self.daysSinceRegistration()
data['dse'] = self.daysSinceExpiration()
data['sscr'] = self.scriptToSpecialCharsRatio()
data['sbr'] = self.scriptTobodyRatio()
data['bscr'] = self.bodyToSpecialCharRatio()
return data

Running the script above produces the following data with 23 features. We will separate integers, booleans, and object column names into separate lists for easier data access.

运行上面的脚本会生成具有23个功能的以下数据。 我们将整数，布尔值和对象列名称分成单独的列表，以简化数据访问。

objects = [i for i in data.columns if 'object' in str(data.dtypes[i])]
booleans = [i for i in data.columns if 'bool' in str(data.dtypes[i])]
ints = [i for i in data.columns if 'int' in str(data.dtypes[i]) or 'float' in str(data.dtypes[i])]

# 删除高度相关的功能 (Removing Highly Correlated Features)

The most linear analysis assumes non-multicollinearity between predictor variables i.e pairs of predictor features must not be correlated. The intuition behind this assumption is that there is no additional information added to a model with multiple correlated features as the same information is captured by one of the features.

最线性的分析假定了预测变量之间的非多重共线性，即，预测特征对之间必须不相关。 该假设背后的直觉是，没有将附加信息添加到具有多个相关特征的模型中，因为其中一个特征捕获了相同的信息。

Multi-correlated features are also indicative of redundant features in the data and dropping them is a good first step for data dimension reduction. By removing correlated features (and only keeping, one of the groups of observed correlated features), we can address the issues of feature redundancy and collinearity between predictors.

多重关联的特征还指示数据中的冗余特征，将其删除是减少数据维度的良好第一步。 通过删除相关特征(并且仅保留观察到的相关特征组之一)，我们可以解决预测变量之间的特征冗余和共线性问题。

Let’s create a simple correlation grid to observe the correlation between the derived features for malicious and benign URL and remove one or more of highly correlated features.

让我们创建一个简单的关联网格，以观察恶意和良性URL的衍生功能之间的关联，并删除一个或多个高度关联的功能。

corr = data[ints+booleans].corr()# Generate a mask for the upper trianglemask = np.triu(np.ones_like(corr, dtype=np.bool))# Set up the matplotlib figuref, ax = plt.subplots(figsize=(20, 15))# Generate a custom diverging colormapcmap = sns.diverging_palette(220, 10, as_cmap=True)# Draw the heatmap with the mask and correct aspect ratiosns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5}, annot=True)

However, we do not want to remove all correlated variables-only those with a very strong correlation that do not add extra information to the model. For this, we define a certain ‘threshold’ (0.7) for positive and negative correlation observed.

但是，我们不希望删除所有相关变量，而只删除那些具有很强相关性且不会向模型添加额外信息的变量。 为此，我们为观察到的正相关和负相关定义了某个“阈值”(0.7)。

We see that most of the highly correlated features are negatively correlated. For example, there is a 0.56 negative correlation coefficient between the number of characters in a URL and the entropy of the URL which suggests that shorter URLs have

我们看到大多数高度相关的特征是负相关的。 例如，URL中的字符数与URL的熵之间存在0.56的负相关系数，这表明较短的URL具有

Here we will create a function to identify and drop one of multiple correlated features.

在这里，我们将创建一个函数来识别和删除多个相关特征之一。

def dropMultiCorrelated(cormat, threshold):
## Define threshold to remove pairs of features with correlation coefficient greater than 0.7 or -0.7
threshold = 0.7
# Select upper triangle of correlation matrix
upper = cormat.abs().where(np.triu(np.ones(cormat.shape), k=1).astype(np.bool))
# Find index of feature columns with correlation greater than threshold
to_drop = [column for column in upper.columns if any(upper[column] > threshold)]
for d in to_drop:
print("Dropping {}....".format(d))
data2 = data[corr.columns].drop(dropMultiCorrelated(corr, 0.7), axis=1)Dropping urlLength....
Dropping scriptLength....
Dropping specialChars....
Dropping bscr....
Dropping hasHttps....

# 预测URL的恶意(决策树) (Predicting Maliciousness of URLs (Decision Trees))

Modeling builds a blueprint for explaining data, from previously observed patterns in the data. Modeling is often predictive in that it tries to use this developed ‘blueprint’ in predicting the values of future or new observations based on what it has observed in the past.

建模会根据数据中先前观察到的模式构建用于解释数据的蓝图。 建模通常是可预测的，因为它会尝试使用已开发的“蓝图”根据过去的观测结果预测未来或新观测值。

Based on the extracted features, we want the best predictive model that tells us if an unseen URL is malicious or benign. Therefore, we seek a unique combination of useful features to accurately separate malicious from benign URLs. We will go through two stages, feature selection, where we select only features useful in predicting the target variable and modeling with decision trees to develop a predictive model for malicious and benign URLs.

基于提取的功能，我们需要最佳的预测模型，该模型可以告诉我们看不见的URL是恶意的还是良性的。 因此，我们寻求有用功能的独特组合，以准确区分恶意URL和良性URL。 我们将经历两个阶段，即特征选择，在此阶段中，我们仅选择对预测目标变量有用的特征以及使用决策树建模以开发恶意和良性URL的预测模型。

# 功能选择 (Feature Selection)

What variables are most useful in identifying a URL as ‘malicious’ or ‘benign’? Computationally, we can automatically select what variables are most useful by testing which ones ‘improves’ or ‘fails to improve’ the overall performance of the prediction model. This process is called ‘Feature Selection’. Feature selection also serves the purpose of reducing the dimension of data, addressing issues of computational complexity and model performance. The goal of feature selection is to obtain a useful subset of the original data that is predictive of the target feature in such a way that useful information is not lost (considering all predictors together). Although feature selection goes beyond simple correlation elimination, for this article, we limit our feature selection method simply retaining these features. Let’s create a subset of the original data that contain only uncorrelated features.

哪些变量在将URL标识为“恶意”或“良性”时最有用？ 通过计算，我们可以通过测试哪些变量“改善”或“未能改善”预测模型的整体性能来自动选择最有用的变量。 此过程称为“功能选择”。 特征选择还用于减小数据量，解决计算复杂性和模型性能的问题。 特征选择的目的是以不丢失有用信息(将所有预测变量都考虑在内)的方式获得可预测目标特征的原始数据的有用子集。 尽管特征选择超出了简单的相关性消除，但在本文中，我们限制了仅保留这些特征的特征选择方法。 让我们创建仅包含不相关功能的原始数据的子集。

predictor_columns = data2.columns
d = data2[predictor_columns]
x, y = d[predictor_columns], data['vt_class']

We keep only features that are unique in their contribution to the model. We can now start developing the model with 70% of the original sample and these 14 features. We will keep 30% of the sample to evaluate the model’s performance on new data.

我们仅保留对模型的贡献独特的特征。 现在，我们可以使用原始样本的70％和这14个功能来开发模型。 我们将保留30％的样本，以根据新数据评估模型的性能。

• numServices

numServices
• entropy

• numDigits

numDigits
• numParams

numParams
• bodyLength

体长
• numTitles

numTitles
• numImages

numImages

• dsr

数码单反相机
• dse

dse
• sscr

sscr
• sbr

sbr
• robots

机器人
• hasHttp

hasHttp
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 100)

# 决策树 (Decision Trees)

from sklearn import tree
from sklearn.metrics import accuracy_scorefrom sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus

There are multiple machine learning algorithms (classification) algorithms that can be applied to identifying malicious URLs. After converting URLs to a representative feature vector, we model the ‘malicious URL identification problem’ as a binary classification problem. A binary classification model trains a predictive model for a class with only two outcomes ‘Malicious’ and ‘Benign’. Batch learning algorithms are machine learning algorithms that work under the following assumptions:

有多种机器学习算法(分类)算法可用于识别恶意URL。 将URL转换为代表性特征向量后，我们将“恶意URL识别问题”建模为二进制分类问题。 二进制分类模型为只有两个结果“恶意”和“良性”的班级训练预测模型。 批处理学习算法是在以下假设下工作的机器学习算法：

- the entire training data is available before model development and

- the entire training data is available before model development and

- the target variable is known before the model training task.

- the target variable is known before the model training task.

Batch algorithms are ideal and effective in that they are explainable discriminative learning models that use simple loss minimization between training data points. Decision trees are one such batch learning algorithms in machine learning.

批处理算法是理想且有效的方法，因为它们是可解释的判别性学习模型，该模型使用训练数据点之间的简单损失最小化。 决策树就是这样一种机器学习中的批量学习算法。

In decision analysis, a decision tree is a visual representation of a model’s decision-making process to arrive at certain conclusions. The basic idea behind decision trees is an attempt to understand what factors influence class membership or why a data point belongs to a class label. A decision tree explicitly shows the conditions on which class members are made. Therefore they are a visual representation of the decision-making process.

在决策分析中，决策树是模型决策过程的直观表示，可以得出某些结论。 决策树背后的基本思想是试图了解哪些因素影响类成员资格或数据点为何属于类标签。 决策树明确显示了创建类成员的条件。 因此，它们是决策过程的直观表示。

Decision tree builds predictive models by breaking down the data set into smaller and smaller parts. The decision to split a subset is based on maximizing the information gain or minimizing information loss from splitting. Starting with the root node (the purest feature with no uncertainty), the tree is formed by creating various leaf nodes based on the purity of the subset.

决策树通过将数据集分解为越来越小的部分来构建预测模型。 分割子集的决定基于最大化信息增益或最小化来自分割的信息损失。 从根节点(没有不确定性的最纯特征)开始，通过基于子集的纯度创建各种叶节点来形成树。

In this case, the decision tree will explain class boundaries for each feature to classify a URL as malicious or benign. There are two main factors to consider when building a decision tree:

在这种情况下，决策树将解释每个功能的类边界，以将URL分类为恶意或良性。 构建决策树时，需要考虑两个主要因素：

- a) What criteria to use in splitting or creating leaf nodes and- b) tree pruning to control how long a tree is allowed to grow to control the risk of over-fitting.

- a) What criteria to use in splitting or creating leaf nodes and- b) tree pruning to control how long a tree is allowed to grow to control the risk of over-fitting.

The criterion parameter of the decision tree algorithm specifies what criteria (Gini or entropy) to control for while the max_depth parameter controls how far the tree is allowed to grow. Gini measurement is the probability of a random sample being classified incorrectly if we randomly pick a label according to the distribution in a branch. Entropy is a measurement of information (or rather lack thereof).

决策树算法的标准参数指定要控制的标准(Gini或熵)，而max_depth参数则控制允许树增长多远。 基尼系数测量是如果我们根据分支中的分布随机选择标签，则随机样本被错误分类的概率。 熵是信息(或缺乏信息)的度量。

Unfortunately, since there is no prior knowledge of the right combination of criteria and tree depth, we would have to iteratively test for the optimal values of these two parameters. We test a max_depth for 50 iterations for both criteria and visualize the model accuracy scores.

不幸的是，由于没有关于标准和树深度的正确组合的先验知识，我们将不得不迭代测试这两个参数的最佳值。 我们针对两个标准测试了50次迭代的max_depth，并可视化了模型的准确性得分。

maxd, gini, entropy = [], [], []for i in range(1,50):
###
dtree = tree.DecisionTreeClassifier(criterion='gini', max_depth=i)
dtree.fit(X_train, y_train)
pred = dtree.predict(X_test)
gini.append(accuracy_score(y_test, pred))

####
dtree = tree.DecisionTreeClassifier(criterion='entropy', max_depth=i)
dtree.fit(X_train, y_train)
pred = dtree.predict(X_test)
entropy.append(accuracy_score(y_test, pred))

####
maxd.append(i)
####
d = pd.DataFrame({'gini':pd.Series(gini), 'entropy':pd.Series(entropy), 'max_depth':pd.Series(maxd)})# visualizing changes in parameters
plt.plot('max_depth','gini', data=d, label='Gini Index')
plt.plot('max_depth','entropy', data=d, label='Entropy')
plt.xlabel('Max Depth')
plt.ylabel('Accuracy')
plt.legend()

It seems the best model is the simplest one with the Gini index and a max depth of 4 with 84% out of sample accuracy. Also, maximizing the entropy does not seem to produce good results suggesting that new parameters added to the model do not necessarily give new information but may produce improved node probability purity. So we can fit and visualize the tree with max_depth = 4 and Gini criteria to identify which features are most important in separating malicious and benign URLs.

似乎最好的模型是最简单的模型，具有基尼系数，最大深度为4，样本精度为84％。 同样，最大化熵似乎并不会产生好的结果，这表明添加到模型中的新参数不一定提供新信息，而是可以提高节点概率纯度。 因此，我们可以使用max_depth = 4和基尼标准来拟合和可视化树，以识别哪些功能在分离恶意URL和良性URL中最重要。

Build the model….

建立模型…。

###create decision tree classifier objectDT = tree.DecisionTreeClassifier(criterion="gini", max_depth=4)##fit decision tree model with training dataDT.fit(X_train, y_train)##test data predictionDT_expost_preds = DT.predict(X_test)

Visualize the tree …

可视化树…

dot_data = StringIO()
export_graphviz(DT, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names=X_train.columns, class_names=DT.classes_)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())

The accuracy of prediction models is very sensitive to parameter tuning of the max_depth (tree pruning) and split quality criteria (node splitting). This also helps in achieving the simplest parsimonious model that prevents over-fitting and performs just as well on unseen data. These parameters are specific to different data problems and it is good practice to test a combination of different parameter values.

预测模型的准确性对max_depth的参数调整(树修剪)和分割质量标准(节点分割)非常敏感。 这也有助于实现最简单的简约模型，该模型可以防止过度拟合并在看不见的数据上表现出色。 这些参数特定于不同的数据问题，并且优良作法是测试不同参数值的组合。

The model shows that malicious URLs have a lower script to special character ratio (sscr) and URL characters that are relatively more ‘ordered’ or more monotonous. Additionally, malicious URLs may have domains that have expired somewhere between 5–9 months ago. We also know of issues of ‘malvertising’ where scammers take ownership of expired legitimate domains to distribute downloadable malicious codes. Finally, probably the most distinguishing feature of benign URLs is longevity. They seem to moderate script to special character ratio in HTML body content with longer domain lifetime of 4–8 years.

该模型表明，恶意URL的脚本与特殊字符比率(sscr)较低，而URL字符则相对“有序”或更单调。 此外，恶意URL可能具有5-9个月前过期的域名。 我们还知道“恶意”问题，骗子会骗取过期合法域的所有权来分发可下载的恶意代码。 最后，良性URL的最大特色可能是寿命。 他们似乎在HTML正文内容中适度地调整了脚本与特殊字符的比例，具有更长的4-8年的域生存期。

恶意url

展开全文
• const gameName = "SHORT_NAME_YOUR_GAME"; const queries = {}; The code above is pretty straightforward. We simply require our dependencies and set the token we got from BotFather and also the short ...

node.js gbk编码

by Fernando García Álvarez

通过费尔南多·加西亚·阿尔瓦雷斯

# 如何使用Node.js将Chrome的霸王龙编码为电报游戏 (How to code Chrome’s T-Rex as a Telegram game using Node.js)

Last month I was really interested in learning how the Telegram game platform works. And as I was also really bored of playing Chrome’s T-Rex game alone, I decided to make it work as a Telegram game.

上个月，我对学习Telegram游戏平台的工作方式非常感兴趣。 而且，由于我也对独自玩Chrome的T-Rex游戏感到非常无聊，因此我决定将其作为Telegram游戏使用。

While developing it I noticed there weren’t many Telegram game bot tutorials. Tutorial would explain the whole process of building it, from start to finish. So I decided to write about it.

在开发它时，我注意到并没有太多Telegram游戏机器人教程。 教程将从头到尾说明构建它的整个过程。 因此，我决定写这篇文章。

If you want to see the result, the game is available as trexjumpbot in Telegram and is hosted here.

如果您想查看结果，可以在Telegram中以trexjumpbot的形式获得该游戏， 此处托管。

### 要求 (Requirements)

You need to have Node.js installed

您需要安装Node.js

### 第1步：创建我们的机器人 (Step 1: Creating our bot)

In order to create a game, we must first create an inline bot. We do this by talking to BotFather and sending the command

为了创建游戏，我们必须首先创建一个嵌入式机器人。 我们通过与BotFather交谈并发送命令来做到这一点

/newbot

/newbot

Then, we are asked to enter a name and a username for our bot and we are given an API token. We need to save it as we will need it later.

然后，要求我们输入机器人的名称和用户名，并获得一个API令牌。 我们需要保存它，因为稍后将需要它。

We can also complete our bot info by changing its description (which will be shown when a user enters a chat with our bot under the “What can this bot do?” section) with

我们还可以通过更改其说明(当用户在“此机器人可以做什么？”部分下与我们的机器人进行聊天时显示)来完成我们的机器人信息。

/setdescription

/setdescription

And also set its picture, in order to make it distinguishable from the chat list. The image must be square and we can set it with the following command:

并设置其图片，以使其与聊天列表区分开。 图片必须为正方形，我们可以使用以下命令进行设置：

/setuserpic

/setuserpic

We can also set the about text, which will appear on the bot’s profile page and also when sharing it with other users

Our bot has to be inline in order to be able to use it for our game. In order to do this, we simply have to execute the following and follow the instructions

我们的机器人必须是内联的，以便能够在我们的游戏中使用它。 为此，我们只需执行以下步骤并按照说明进行操作

/setinline

/setinline

### 步骤2：建立游戏 (Step 2: Creating our game)

Now that we have our inline bot completely set up, it’s time to ask BotFather to create a game:

现在我们已经完全设置了内联机器人，现在是时候让BotFather创建游戏了：

/newgame

/newgame

We simply follow the instructions and finally we have to specify a short name for our game. This will act as a unique identifier for it, which we will need later along with our bot API token

我们只需按照说明进行操作，最后我们必须为我们的游戏指定一个简短的名称。 这将充当它的唯一标识符，我们稍后将需要它以及bot API令牌

### 步骤3：获取T-Rex游戏源代码 (Step 3: Getting T-Rex game source code)

As Chromium is open source, some users have extracted the T-Rex game from it and we can easily find its source code online.

由于Chromium是开源的，因此一些用户已经从中提取了T-Rex游戏，我们可以轻松地在网上找到其源代码。

In order to make the game, I have used the code available in this GitHub repo, so go ahead and clone it:

为了制作游戏，我使用了此GitHub存储库中可用的代码，因此继续进行克隆：

git clone https://github.com/wayou/t-rex-runner.git

### 步骤4：设置依赖关系 (Step 4: Setting up dependencies)

First, go into the cloned folder and move all its files into a new folder called “public”

首先，进入克隆的文件夹，并将其所有文件移动到名为“ public”的新文件夹中

mkdir public && mv * public/.

And init the project

并启动项目

npm init

You can fill the requested info as you want (you can leave the default values), leave the entry point as index.js

您可以根据需要填写所需的信息(可以保留默认值)，将入口点保留为index.js

We will need Express and node-telegram-bot-api in order to easily interact with Telegram’s API

我们将需要Express和node-telegram-bot-api以便轻松与Telegram的API进行交互

npm install express --savenpm install node-telegram-bot-api --save

We are going to add a start script, since it’s necessary in order to deploy the game to Heroku. Open package.json and add the start script under the scripts section:

我们将添加一个启动脚本，因为将游戏部署到Heroku是必需的。 打开package.json并在脚本部分下添加启动脚本：

"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"start": "node index.js"
}

### 第5步：对服务器进行编码 (Step 5: Coding our server)

Now that we have all dependencies set up, it’s time to code the server for our bot. Go ahead and create the index.js file:

现在我们已经设置了所有依赖项，是时候为我们的机器人编写服务器代码了。 继续创建index.js文件：

const express = require("express");
const path = require("path");
const TelegramBot = require("node-telegram-bot-api");
const TOKEN = "YOUR_API_TOKEN_GOES_HERE";
const server = express();
const bot = new TelegramBot(TOKEN, { polling: true } );
const port = process.env.PORT || 5000;
const gameName = "SHORT_NAME_YOUR_GAME";
const queries = {};

The code above is pretty straightforward. We simply require our dependencies and set the token we got from BotFather and also the short name we defined as the game identifier. Also, we set up the port, initialize Express and declare a queries empty object. This will act as a map to store the Telegram user object under his id, in order to retrieve it later.

上面的代码非常简单。 我们只需要我们的依赖关系，并设置从BotFather获得的令牌，以及我们定义为游戏标识符的简称。 同样，我们设置端口，初始化Express并声明一个查询为空的对象。 这将用作将电报用户对象存储在其ID下的映射，以便以后检索。

Next, we need to make the contents of the public directory available as static files

接下来，我们需要将公共目录的内容作为静态文件提供

server.use(express.static(path.join(__dirname, 'public')));

Now we are going to start defining our bot logic. First, let’s code the /help command

现在，我们将开始定义机器人逻辑。 首先，让我们编写/ help命令的代码

bot.onText(/help/, (msg) => bot.sendMessage(msg.from.id, "This bot implements a T-Rex jumping game. Say /game if you want to play."));

We have to specify the command as a regex on the first parameter of onText and then specify the bot’s reply with sendMessage. Note we can access the user id in order to reply by using msg.from.id

我们必须在onText的第一个参数上将命令指定为正则表达式，然后使用sendMessage指定漫游器的回复。 请注意，我们可以使用msg.from.id来访问用户ID以便进行回复

When our bot receives the /start or /game command we are going to send the game to the user using bot.sendGame

当我们的机器人收到/ start或/ game命令时，我们将使用bot.sendGame将游戏发送给用户

bot.onText(/start|game/, (msg) => bot.sendGame(msg.from.id, gameName));

Now the user will be shown the game’s title, his high score and a button to play it, but the play button still doesn’t work. So, we are going to implement its logic

现在，将向用户显示游戏的标题，他的高分和一个播放游戏的按钮，但是播放按钮仍然不起作用。 因此，我们将执行其逻辑

bot.on("callback_query", function (query) {
if (query.game_short_name !== gameName) {
bot.answerCallbackQuery(query.id, "Sorry, '" + query.game_short_name + "' is not available.");
} else {
queries[query.id] = query;
let gameurl = "https://YOUR_URL_HERE/index.html?  id="+query.id;
callback_query_id: query.id,
url: gameurl
});
}
});

When the user clicks the play button Telegram sends us a callback. In the code above when we receive this callback first we check that the requested game is, in fact, our game, and if not we show an error to the user.

当用户单击播放按钮时，Telegram向我们发送回叫。 在上面的代码中，当我们首先收到此回调时，我们检查所请求的游戏是否确实是我们的游戏，如果没有，则向用户显示错误。

If all is correct, we store the query into the queries object defined earlier under its id, in order to retrieve it later to set the high score if necessary. Then we need to answer the callback by providing the game’s URL. Later we are going to upload it to Heroku so you’ll have to enter the URL here. Note that I’m passing the id as a query parameter in the URL, in order to be able to set a high score.

如果一切正确，我们会将查询存储到先前在其ID下定义的查询对象中，以便稍后进行检索以在必要时设置高分。 然后，我们需要通过提供游戏的网址来回答回调。 稍后，我们将其上传到Heroku，因此您必须在此处输入URL。 请注意，我将id作为查询参数传递给URL，以便能够设置较高的分数。

Right now we have a fully functional game but we still are missing high scores and inline behavior. Let’s start with implementing inline and offering our game:

目前，我们有一个功能齐全的游戏，但我们仍然缺少高分和内联行为。 让我们从内联实现和提供游戏开始：

bot.on("inline_query", function(iq) {
bot.answerInlineQuery(iq.id, [ { type: "game", id: "0", game_short_name: gameName } ] );
});

Last, we are going to implement the high score logic:

最后，我们将实现高分逻辑：

server.get("/highscore/:score", function(req, res, next) {
if (!Object.hasOwnProperty.call(queries, req.query.id)) return   next();
let query = queries[req.query.id];
let options;
if (query.message) {
options = {
chat_id: query.message.chat.id,
message_id: query.message.message_id
};
} else {
options = {
inline_message_id: query.inline_message_id
};
}
bot.setGameScore(query.from.id, parseInt(req.params.score),  options,
function (err, result) {});
});

In the code above, we listen for URLs like /highscore/300?id=5721. We simply retrieve the user from the queries object given its id (if it exists) and the use bot.setGameScore to send the high score to Telegram. The options object is different if the user is calling the bot inline or not, so we check both situations as defined in the Telegram Bot API

在上面的代码中，我们侦听/ highscore / 300？id = 5721之类的URL。 我们只需从查询对象中检索给定其ID(如果存在)的用户，并使用bot.setGameScore将高分发送给Telegram。 如果用户是否内联调用bot，options对象是不同的，因此我们检查了Telegram Bot API中定义的两种情况

The last thing we have to do on our server is to simply listen in the previously defined port:

我们在服务器上要做的最后一件事就是简单地侦听先前定义的端口：

server.listen(port);

### 第6步：修改霸王龙游戏 (Step 6: Modifying T-Rex game)

We have to modify the T-Rex game we cloned from the GitHub repo in order for it to send the high score to our server.

我们必须修改从GitHub存储库中克隆的T-Rex游戏，以将高分发送给我们的服务器。

Open the index.js file under the public folder, and at the top of it add the following lines in order to retrieve the player id from the url:

打开公用文件夹下的index.js文件，并在其顶部添加以下行，以便从url中检索播放器ID：

var url = new URL(location.href);
var playerid = url.searchParams.get("id");

Last, we are going to locate the setHighScore function and add the following code to the end of it, in order to submit the high score to our server:

最后，我们将定位setHighScore函数并将以下代码添加到它的末尾，以便将高分提交给我们的服务器：

// Submit highscore to Telegram
var xmlhttp = new XMLHttpRequest();
var url = "https://YOUR_URL_HERE/highscore/" + distance  +
"?id=" + playerid;
xmlhttp.open("GET", url, true);
xmlhttp.send();

### 步骤7：部署到Heroku (Step 7: Deploying to Heroku)

Our game is complete, but without uploading it to a server we can’t test it on Telegram, and Heroku provides us a very straightforward way to upload it.

我们的游戏已经完成，但是没有将其上传到服务器，就无法在Telegram上对其进行测试，而Heroku为我们提供了一种非常简单的上传方式。

Start by creating a new app:

首先创建一个新应用：

Change our URL placeholders with the actual URL (replace with your own):

用实际的URL更改我们的URL占位符(用您自己的URL代替)：

Replace the URL with the setHighScore function

用setHighScore函数替换URL

var url = "https://trexgame.herokuapp.com/highscore/" + distance +
"?id=" + playerid;

And also on the callback on the server:

以及服务器上的回调：

let gameurl = "https://trexgame.herokuapp.com/index.html?id="+query.id;

Finally, let’s upload our game to Heroku. Let’s follow the steps detailed on the Heroku page: After installing Heroku CLI, from the project folder login and push the files:

最后，让我们将游戏上传到Heroku。 让我们按照Heroku页面上详述的步骤进行操作：安装Heroku CLI后 ，从项目文件夹登录并推送文件：

heroku logingit initheroku git:remote -a YOUR_HEROKU_APP_NAMEgit add .git commit -m "initial commit"git push heroku master

And that’s it!, now you finally have a fully working Telegram game. Go ahead and try it!

就是这样！现在，您终于有了一个可以正常运行的Telegram游戏。 继续尝试！

Full source code of this example is available on GitHub

该示例的完整源代码可在GitHub找到

### 参考文献 (References)

node.js gbk编码

展开全文

# 1.环境

.windows 8 (x64)

# 2.事由

浏览器选用chrome.

.该文件是sqlite 3数据库
.encrypted_value是加密后的blob内容
.Windows下加密采用DPAPI,
文中有Chromium加密函数源代码.用CryptProtectData加密
.浏览器运行期间,其它进程无法打开文件,报以下错误
java.sql.SQLException: [SQLITE_BUSY]  The database file is locked (database is locked)

# 4.Java DPAP

官网
Java Data Protection API
http://jdpapi.sourceforge.net/

下载的包有2个文件：
--jdpapi-java-1.0.jar
--jdpapi-native-1.0.dll
jdpapi-native-1.0.dll是IA 32-bit版本，直接使用报以下错误：

需要自己编译AMD64-bit版本.

获取源代码：

svn checkout https://svn.code.sf.net/p/jdpapi/code/ jdpapi-code
保存目录E:\tool\jdpapi\jdpapi-code(工程主目录,以下目录都是相对主目录)
其中,
-jdpapi\BUILD.txt: 编译说明.
-jdpapi\jdpapi-native\pom.xml：mvn文件

VC编译器采用Visaul Studio 2010，编译x64版本.
<compilerStartOption>增加jni.h,jni_md.h的路径.

<configuration>做以下修改：

<envFactoryName>
org.codehaus.mojo.natives.msvc.MSVC2010x86AMD64EnvFactory
</envFactoryName>
<compilerStartOptions>
<compilerStartOption>/LD /I"C:\Java\jdk1.8.0_91\include" /I"C:\Java\jdk1.8.0_91\include\win32"</compilerStartOption>
</compilerStartOptions>

<javahOS>x64</javahOS

由于使用msvc2010编译x64位程序，需要设置环境变量(搜索预期的cl.exe)

进入目录c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64>
运行：

vcvars64.bat

执行命令：

cd jdpapi
mvn clean package assembly:assembly

执行后会生成jdpapi-java-1.0.1.jar.但编译jdpapi-native失败.

生成C++头文件：
执行命令:

cd jdpapi\jdpapi-java\target\classes
javah net.sourceforge.jdpapi.DPAPI

当前目录生成文件net_sourceforge_jdpapi_DPAPI.h,把头文件复制到目录：jdpapi\jdpapi-native\src\main\native

执行命令：

cd jdpapi
mvn clean package assembly:assembly

编译成功后，生成以下文件：
-jdpapi\jdpapi-java\target\jdpapi-java-1.0.1.jar
-jdpapi\jdpapi-native\target\jdpapi-native.dll

把这2个文件复制到测试目录下E:\tool\jdpapi

# 5.实现

采用java实现.
解密利用Java DPAPI,通过JNI方式调用Windows DPAPI

代码如下：

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
import net.sourceforge.jdpapi.DataProtector;;

static Connection conn = null;
static String dbPath = "C:\\Users\\Think\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\";
static String dbName = dbPath + "Cookies_copy";

private final DataProtector protector;
this.protector = new DataProtector();
}

private String decrypt(byte[] data) {
return protector.unprotect(data);
}

public static void main(String[] args) {
try {
Class.forName("org.sqlite.JDBC");
conn = DriverManager.getConnection("jdbc:sqlite:" + dbName, null, null);

conn.setAutoCommit(false);
Statement stmt = conn.createStatement();
stmt.setQueryTimeout(3);
String sql = String.format("select * from cookies where host_key like '%%.cainiao.com%%' and name='cna'");

ResultSet rs = stmt.executeQuery(sql);
while (rs.next()) {
String name = rs.getString("name");
String value = rs.getString("value");
InputStream inputStream = rs.getBinaryStream("encrypted_value");
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int ch;
while ((ch = inputStream.read()) != -1) {
byteArrayOutputStream.write(ch);
}
byte[] b = byteArrayOutputStream.toByteArray();
byteArrayOutputStream.close();

System.out.println(String.format("name=%s value=%s encrypted_value=%s",
name, value,testCoolies.decrypt(b)));
}
rs.close();
conn.close();
} catch (Exception e) {
e.printStackTrace();
}
}

static {
}
}

sqlite-jdbc驱动用3.23.1版本，有的版本会报以下错误：

java.sql.SQLException: not implemented by SQLite JDBC driver

pom.xml增加sqlite-jdbc依赖：

<!-- https://mvnrepository.com/artifact/org.xerial/sqlite-jdbc -->
<dependency>
<groupId>org.xerial</groupId>
<artifactId>sqlite-jdbc</artifactId>
<version>3.23.1</version>
</dependency>

展开全文
• it’s preferable to use the debugging tools provided by the browser, which can help developers figure out what’s happening at the code level without interfering with the actual user interface of the...
• ## Event Tracking Guide

千次阅读 2012-08-13 17:32:56
Additionally, if you decide to change the category name of an object that has already been tracked under a different name, the historical data for the original category will not be re-processed, so...
• This is a comma separated list of plugin dlls name and activex clsid. –test-sandbox Runs the security test for the sandbox. –user-data-dir Specifies the user data directory, ...
• [*] The user will be prompted, wait for them to click 'Ok' [*] Uploading win_updata.exe - 73802 bytes to the filesystem... [*] Executing Command! [*] Sending stage (179779 bytes) to 192.168.6.129 [*] ...
• array('name'=>$_POST['user']), 'Selectors' => array('id', 'type'))); setcookie('authtoken',$temp[0]['id']."-" . $_POST['user'] . "-" .$temp[0]['type'], time()+1800); </code></pre> <p>When I...
• chrome://user-actions/chrome://user-actions/ 监听用户行为 监听用户行为，比如你的鼠标点击事件，访问url事件都会被记录下来。哈哈，你看，连我切换浏览器都被捕获到了。是不是挺强大的。 写在...
•  printf "You user name is : $name \n" printf "You Password is :$passwd1 \n"  printf "You directory is : pwd \n"  echo " "  ;;  yes)  $FTP useradd$name -u nobody -g nobody -d pwd ...
• 软件说明： VMware Fusion 专业版 11.5 （windows的更加容易找） CentOS Linux release 7.7.1908 (Core) docker-ce-cli.x86_64 1:19.03.8-3.el7 Kubernetes 1.18.1 host name ip description k8s-master 192.168....
• , and the form includes username and password input fields, then the browser will still offer to remember this login, and if the user agrees, the browser will autofill those fields the next time the ...
• 我发现了一个chromium-ipc-sniffer的工具 ，它将使我能够使用Wireshark嗅探Chromes控制管道发送的数据。 启动后发送了许多不相关的数据，因此我使用以下过滤器将其优化为仅包含我想查看的通信数据： npfs.process_...
• You may have noticed the addition of a new button occupying the upper-right corner of the ... 翻译自: https://www.howtogeek.com/207614/everything-you-need-to-know-about-google-chromes-profile-switcher/
• 是构建在 Chromes V8 引擎上 js 的运行环境，可以解析javaScript 代码。 ​ javaScript 可以脱离浏览器运行，归功于node.js ​ 事件驱动，非阻塞I/O 模型 （异步） ​ nodejs npm 是世界上最大的开源生态系统 ...
• This also provides the option to name the report, which is valuable when comparing various optimization techniques.  Firebug还提供了JavaScript接口用于启动和停止分析器。这可精确控制测量某部分代码...
• 4) 处理来自远程内容的会话许可请求 当你使用Chromes时，也许见过这种许可请求：每当网站尝试使用某个特性时，就会弹出让用户手动确认(如网站通知) 此API基于Chromium permissions API，并已实现对应的许可类型。...
• 2.在"User Info"里，有"Username:"让你找某个User。试着在这用SQL注入。' or 'a'='a 3.SQL注入成功后，在返回的表里找Gary Hunter的用户名 . 3.找到 Gary Hunter的用户名，便要用他的用户名...
• 选中 “Relpace” 将会出现第二个输入区域来填写用于替换的文本。 在所有文件中搜索文字 如果你希望在所有加载的文件中搜索特定的文字，你可以用下面的快捷键来加载搜索框界面： Ctrl  +  Shift  +  F ...
• 要成为测试版的用户，并不需要具备所谓的资格或是去填写用户登记表格，作为测试版用户你所需要做的仅仅是提供你在使用过程中所遇到的潜在性的缺陷代码罢了。如果不在此警告你，你可能甚至会考虑以购买测试版代码的...
• : Use Sync Rename to easily change the name of classes and methods on the fly (based on DXCore).   New in Chrome 1.5! Field and Method Declaration Smart Editing : Use convenient ...