精华内容
下载资源
问答
  • matplotlib的plt.acorr中的自相关图缺陷
    2020-12-08 14:51:04

    这是统计和信号处理之间不同的共同定义的结果。基本上,信号处理定义假定您要处理去趋势化。统计定义假设减去平均值就是你将要做的所有改变,并且是为你做的。

    首先,让我们用一个独立的例子来演示这个问题:import numpy as np

    import matplotlib.pyplot as plt

    import pandas as pd

    from statsmodels.graphics import tsaplots

    def label(ax, string):

    ax.annotate(string, (1, 1), xytext=(-8, -8), ha='right', va='top',

    size=14, xycoords='axes fraction', textcoords='offset points')

    np.random.seed(1977)

    data = np.random.normal(0, 1, 100).cumsum()

    fig, axes = plt.subplots(nrows=4, figsize=(8, 12))

    fig.tight_layout()

    axes[0].plot(data)

    label(axes[0], 'Raw Data')

    axes[1].acorr(data, maxlags=data.size-1)

    label(axes[1], 'Matplotlib Autocorrelation')

    tsaplots.plot_acf(data, axes[2])

    label(axes[2], 'Statsmodels Autocorrelation')

    pd.tools.plotting.autocorrelation_plot(data, ax=axes[3])

    label(axes[3], 'Pandas Autocorrelation')

    # Remove some of the titles and labels that were automatically added

    for ax in axes.flat:

    ax.set(title='', xlabel='')

    plt.show()

    所以,为什么我说他们都是对的?他们明显不同!

    让我们编写自己的自相关函数来演示plt.acorr在做什么:def acorr(x, ax=None):

    if ax is None:

    ax = plt.gca()

    autocorr = np.correlate(x, x, mode='full')

    autocorr /= autocorr.max()

    return ax.stem(autocorr)

    如果我们用我们的数据来绘制这个图,我们将得到与plt.acorr大致相同的结果(我没有适当地标记延迟,只是因为我很懒):fig, ax = plt.subplots()

    acorr(data)

    plt.show()

    这是一个完全有效的自相关。这都是你的背景是信号处理还是统计的问题。

    这是信号处理中使用的定义。假设您要处理数据的格式转换(注意detrend中的plt.acorrkwarg)。如果你想取消渲染,你会明确地要求它(并且可能做一些比仅仅减去平均值更好的事情),否则就不应该假设它。

    在统计学中,简单地减去平均值被认为是你想要做的。

    所有其他函数都是在相关之前减去数据的平均值,类似于:def acorr(x, ax=None):

    if ax is None:

    ax = plt.gca()

    x = x - x.mean()

    autocorr = np.correlate(x, x, mode='full')

    autocorr /= autocorr.max()

    return ax.stem(autocorr)

    fig, ax = plt.subplots()

    acorr(data)

    plt.show()

    然而,我们仍然有一个很大的区别。这纯粹是一个阴谋约定。

    在大多数信号处理教科书中(我已经看到过了),显示的是“完全”自相关,这样零滞后在中心,结果在每一边都是对称的。R、 另一方面,有一个非常合理的惯例,只展示它的一面。(毕竟,另一方面是完全冗余的)统计绘图函数遵循R对流,而plt.acorr遵循Matlab所做的,这是相反的约定。

    基本上,你会想要这个:def acorr(x, ax=None):

    if ax is None:

    ax = plt.gca()

    x = x - x.mean()

    autocorr = np.correlate(x, x, mode='full')

    autocorr = autocorr[x.size:]

    autocorr /= autocorr.max()

    return ax.stem(autocorr)

    fig, ax = plt.subplots()

    acorr(data)

    plt.show()

    更多相关内容
  • Using statsmodels’ ols function, we construct our model setting housing_price_index as a function of total_unemployed. We assume that an increase in the total number of unemployed people will have ...

    pandas 线性回归

    This post was originally published here

    这篇文章最初发表在这里

    rel="stylesheet" type="text/css" href="/wp-content/themes/colormag-child/css/tim-dobbins-style.css">

    rel="stylesheet" type="text/css" href="/wp-content/themes/colormag-child/css/tim-dobbins-style.css">

    In this post, we’ll walk through building linear regression models to predict housing prices resulting from economic activity. Topics covered will include:

    在本文中,我们将逐步构建线性回归模型,以预测经济活动导致的房价。 涵盖的主题将包括:

    Future posts will cover related topics such as exploratory analysis, regression diagnostics, and advanced regression modeling, but I wanted to jump right in so readers could get their hands dirty with data.

    未来的文章将涵盖相关主题,例如探索性分析,回归诊断和高级回归建模,但是我想跳进去,以便读者可以轻松掌握数据。

    什么是回归?   (What is Regression? )

    Linear regression is a model that predicts a relationship of direct proportionality between the dependent variable (plotted on the vertical or Y axis) and the predictor variables (plotted on the X axis) that produces a straight line, like so:

    线性回归是一个模型,该模型可预测因变量(绘制在垂直或Y轴上)与预测变量(绘制在X轴上)之间的直接比例关系,该变量会产生一条直线,如下所示:

    linear regression

    Linear regression will be discussed in greater detail as we move through the modeling process.

    在建模过程中,将更详细地讨论线性回归。

    变量选择   (Variable Selection )

    For our dependent variable we’ll use housing_price_index (HPI), which measures price changes of residential housing.

    对于我们的因变量,我们将使用housing_price_index (HPI)来衡量住宅价格的变化。

    For our predictor variables, we use our intuition to select drivers of macro- (or “big picture”) economic activity, such as unemployment, interest rates, and gross domestic product (total productivity). For an explanation of our variables, including assumptions about how they impact housing prices, and all the sources of data used in this post, see here.

    对于我们的预测变量,我们使用直觉来选择宏观(或“全局”)经济活动的驱动力,例如失业率,利率和国内生产总值(总生产率)。 有关我们变量的解释,包括关于变量如何影响房价的假设以及本文中使用的所有数据来源,请参见此处

    熊猫读数据   (Reading in the Data with pandas )

    Once we’ve downloaded the data, read it in using pandas’ read_csv method.

    下载完数据后,请使用pandas的read_csv方法读取数据。

    import pandas as pd
    # read in from csv using pd.read_csv
    # be sure to use the file path where you saved the data
    housing_price_index = pd.read_csv('/Users/tdobbins/Downloads/hpi/monthly-hpi.csv')
    unemployment = pd.read_csv('/Users/tdobbins/Downloads/hpi/unemployment.csv')
    federal_funds_rate = pd.read_csv('/Users/tdobbins/Downloads/hpi/fed_funds.csv')
    shiller = pd.read_csv('/Users/tdobbins/Downloads/hpi/shiller.csv')
    gross_domestic_product = pd.read_csv('/Users/tdobbins/Downloads/hpi/gdp.csv')import pandas as pd
    # read in from csv using pd.read_csv
    # be sure to use the file path where you saved the data
    housing_price_index = pd.read_csv('/Users/tdobbins/Downloads/hpi/monthly-hpi.csv')
    unemployment = pd.read_csv('/Users/tdobbins/Downloads/hpi/unemployment.csv')
    federal_funds_rate = pd.read_csv('/Users/tdobbins/Downloads/hpi/fed_funds.csv')
    shiller = pd.read_csv('/Users/tdobbins/Downloads/hpi/shiller.csv')
    gross_domestic_product = pd.read_csv('/Users/tdobbins/Downloads/hpi/gdp.csv') 

    Once we have the data, invoke pandas’ merge method to join the data together in a single dataframe for analysis. Some data is reported monthly, others are reported quarterly. No worries. We merge the dataframes on a certain column so each row is in its logical place for measurement purposes. In this example, the best column to merge on is the date column. See below.

    有了数据后,调用pandas的merge方法将数据merge到单个数据框中进行分析。 一些数据每月报告一次,其他数据每季度报告一次。 别担心。 我们将数据帧合并到某一列上,以便每一行都位于其逻辑位置以进行测量。 在此示例中,要合并的最佳列是日期列。 见下文。

    Let’s get a quick look at our variables with pandas’ head method. The headers in bold text represent the date and the variables we’ll test for our model. Each row represents a different time period.

    让我们用pandas的head方法快速查看我们的变量。 粗体文本标题表示日期和我们将为模型测试的变量。 每行代表一个不同的时间段。

    Out[23]:
    出[23]:
    date 日期 sp500 sp500 consumer_price_index 消费者价格指数 long_interest_rate long_interest_rate housing_price_index housing_price_index total_unemployed 共有失业 more_than_15_weeks 超过15周 not_in_labor_searched_for_work not_in_labor_searched_for_work multi_jobs 多职位 leavers 离开者 losers 失败者 federal_funds_rate Federal_funds_rate total_expenditures 支出总额 labor_force_pr labor_force_pr producer_price_index 生产者价格指数 gross_domestic_product 国内生产总值
    0 0 2011-01-01 2011-01-01 1282.62 1282.62 220.22 220.22 3.39 3.39 181.35 181.35 16.2 16.2 8393 8393 2800 2800 6816 6816 6.5 6.5 60.1 60.1 0.17 0.17 5766.7 5766.7 64.2 64.2 192.7 192.7 14881.3 14881.3
    1 1个 2011-04-01 2011-04-01 1331.51 1331.51 224.91 224.91 3.46 3.46 180.80 180.80 16.1 16.1 8016 8016 2466 2466 6823 6823 6.8 6.8 59.4 59.4 0.10 0.10 5870.8 5870.8 64.2 64.2 203.1 203.1 14989.6 14989.6
    2 2 2011-07-01 2011-07-01 1325.19 1325.19 225.92 225.92 3.00 3.00 184.25 184.25 15.9 15.9 8177 8177 2785 2785 6850 6850 6.8 6.8 59.2 59.2 0.07 0.07 5802.6 5802.6 64.0 64.0 204.6 204.6 15021.1 15021.1
    3 3 2011-10-01 2011-10-01 1207.22 1207.22 226.42 226.42 2.15 2.15 181.51 181.51 15.8 15.8 7802 7802 2555 2555 6917 6917 8.0 8.0 57.9 57.9 0.07 0.07 5812.9 5812.9 64.1 64.1 201.1 201.1 15190.3 15190.3
    4 4 2012-01-01 2012-01-01 1300.58 1300.58 226.66 226.66 1.97 1.97 179.13 179.13 15.2 15.2 7433 7433 2809 2809 7022 7022 7.4 7.4 57.1 57.1 0.08 0.08 5765.7 5765.7 63.7 63.7 200.7 200.7 15291.0 15291.0

    Usually, the next step after gathering data would be exploratory analysis. Exploratory analysis is the part of the process where we analyze the variables (with plots and descriptive statistics) and figure out the best predictors of our dependent variable. For the sake of brevity, we’ll skip the exploratory analysis. Keep in the back of your mind, though, that it’s of utmost importance and that skipping it in the real world would preclude ever getting to the predictive section.

    通常,收集数据后的下一步将是探索性分析。 探索性分析是该过程的一部分,在该过程中,我们分析变量(使用图表和描述性统计数据)并找出我们因变量的最佳预测变量。 为了简洁起见,我们将跳过探索性分析。 不过,请记住,它至关重要,在现实世界中跳过它会阻止您进入预测领域。

    We’ll use ordinary least squares (OLS), a basic yet powerful way to assess our model.

    我们将使用普通最小二乘法(OLS),这是一种基本而强大的评估模型的方法。

    普通最小二乘假设   (Ordinary Least Squares Assumptions )

    OLS measures the accuracy of a linear regression model.

    OLS衡量线性回归模型的准确性。

    OLS is built on assumptions which, if held, indicate the model may be the correct lens through which to interpret our data. If the assumptions don’t hold, our model’s conclusions lose their validity. Take extra effort to choose the right model to avoid Auto-esotericism/Rube-Goldberg’s Disease.

    OLS建立在假设上,如果假设成立,则表明模型可能是解释我们数据的正确镜头。 如果这些假设不成立,那么我们模型的结论将失去其有效性。 付出额外的努力来选择正确的模型,以避免自闭症/鲁伯-戈德伯格病

    Here are the OLS assumptions:

    以下是OLS的假设:

    1. Linearity: A linear relationship exists between the dependent and predictor variables. If no linear relationship exists, linear regression isn’t the correct model to explain our data.
    2. No multicollinearity: Predictor variables are not collinear, i.e., they aren’t highly correlated. If the predictors are highly correlated, try removing one or more of them. Since additional predictors are supplying redundant information, removing them shouldn’t drastically reduce the Adj. R-squared (see below).
    3. Zero conditional mean: The average of the distances (or residuals) between the observations and the trend line is zero. Some will be positive, others negative, but they won’t be biased toward a set of values.
    4. Homoskedasticity: The certainty (or uncertainty) of our dependent variable is equal across all values of a predictor variable; that is, there is no pattern in the residuals. In statistical jargon, the variance is constant.
    5. No autocorrelation (serial correlation): Autocorrelation is when a variable is correlated with itself across observations. For example, a stock price might be serially correlated if one day’s stock price impacts the next day’s stock price.
    1. 线性 :因变量和预测变量之间存在线性关系。 如果不存在线性关系,则线性回归不是解释我们数据的正确模型。
    2. 没有多重共线性 :预测变量不是共线性的,即它们之间没有高度相关。 如果预测变量高度相关,请尝试删除其中一个或多个。 由于其他预测变量正在提供冗余信息,因此删除这些预测变量不应显着降低Adj。 R平方 (请参见下文)。
    3. 零条件均值 :观测值和趋势线之间的平均距离(或残差)为零。 有些会是积极的,有些则是消极的,但它们不会偏向一系列价值观。
    4. 同方性 :我们的因变量的确定性(或不确定性)在预测变量的所有值之间相等; 也就是说,残差中没有图案。 用统计术语来说,方差是恒定的。
    5. 无自相关(串行相关) :自相关是指变量在各个观测值之间与自身相关。 例如,如果一天的股票价格影响第二天的股票价格,则股票价格可能会顺序相关。

    Let’s begin modeling.

    让我们开始建模。

    简单线性回归   (Simple Linear Regression )

    Simple linear regression uses a single predictor variable to explain a dependent variable. A simple linear regression equation is as follows:

    简单线性回归使用单个预测变量来解释因变量。 一个简单的线性回归方程如下:

    Where:

    哪里:

    y = dependent variable

    y =因变量

    ß = regression coefficient

    ß= 回归系数

    α = intercept (expected mean value of housing prices when our independent variable is zero)

    α=截距(当我们的自变量为零时,房价的预期均值)

    x = predictor (or independent) variable used to predict Y

    x =用于预测Y的预测变量(或自变量)

    ε = the error term, which accounts for the randomness that our model can’t explain.

    ε=误差项,占我们模型无法解释的随机性。

    Using statsmodels’ ols function, we construct our model setting housing_price_index as a function of total_unemployed. We assume that an increase in the total number of unemployed people will have downward pressure on housing prices. Maybe we’re wrong, but we have to start somewhere!

    使用statsmodels' 的功能,我们构建我们的模型设定housing_price_index作为一个功能total_unemployed 。 我们假设失业人数的增加将对房价产生下行压力。 也许我们错了,但我们必须从某个地方开始!

    The code below shows how to set up a simple linear regression model with total_unemployment as our predictor variable.

    下面的代码显示了如何使用total_unemployment作为我们的预测变量来建立简单的线性回归模型。

    from IPython.display import HTML, display
    
    import statsmodels.api as sm
    from statsmodels.formula.api import ols
    
    # fit our model with .fit() and show results
    # we use statsmodels' formula API to invoke the syntax below,
    # where we write out the formula using ~
    housing_model = ols("housing_price_index ~ total_unemployed", data=df).fit()
    # summarize our model
    housing_model_summary = housing_model.summary()
    
    # convert our table to HTML and add colors to headers for explanatory purposes
    HTML(
    housing_model_summary\
    .as_html()\
    .replace(' Adj. R-squared: ', ' Adj. R-squared: ')\
    .replace('coef', 'coef')\
    .replace('std err', 'std err')\
    .replace('P>|t|', 'P>|t|')\
    .replace('[95.0% Conf. Int.]', '[95.0% Conf. Int.]')
    )
    Out[24]:
    出[24]:
    OLS Regression Results OLS回归结果
    Dep. Variable: 部门 变量: housing_price_index housing_price_index R-squared: R平方: 0.952 0.952
    Model: 模型: OLS 最小二乘 Adj. R-squared: 调整 R平方: 0.949 0.949
    Method: 方法: Least Squares 最小二乘 F-statistic: F统计: 413.2 413.2
    Date: 日期: Fri, 17 Feb 2017 2017年2月17日,星期五 Prob (F-statistic): 概率(F统计): 2.71e-15 2.71e-15
    Time: 时间: 17:57:05 17:57:05 Log-Likelihood: 对数似然: -65.450 -65.450
    No. Observations: 号观察: 23 23 AIC: AIC: 134.9 134.9
    Df Residuals: Df残渣: 21 21 BIC: BIC: 137.2 137.2
    Df Model: DF型号: 1 1个
    Covariance Type: 协方差类型: nonrobust 不稳健
    coef ef std err 标准错误 t Ť P>|t| P> | t | [95.0% Conf. Int.] [95.0%Conf。 整数]
    Intercept 截距 313.3128 313.3128 5.408 5.408 57.938 57.938 0.000 0.000 302.067 324.559 302.067 324.559
    total_unemployed 共有失业 -8.3324 -8.3324 0.410 0.410 -20.327 -20.327 0.000 0.000 -9.185 -7.480 -9.185 -7.480
    Omnibus: 综合: 0.492 0.492 Durbin-Watson: 杜宾·沃森: 1.126 1.126
    Prob(Omnibus): 概率(Omnibus): 0.782 0.782 Jarque-Bera (JB): Jarque-Bera(JB): 0.552 0.552
    Skew: 偏斜: 0.294 0.294 Prob(JB): 概率(JB): 0.759 0.759
    Kurtosis: 峰度: 2.521 2.521 Cond. No. 条件。 没有。 78.9 78.9

    Referring to the OLS regression results above, we’ll offer a high-level explanation of a few metrics to understand the strength of our model: Adj. R-squared, coefficients, standard errors, and p-values.

    参考上面的OLS回归结果,我们将提供一些指标的高级解释,以了解我们模型的强度:调整。 R平方,系数,标准误差和p值。

    To explain:

    解释:

    Adj. R-squared indicates that 95% of housing prices can be explained by our predictor variable, total_unemployed.

    调整 R平方表明,我们的预测变量total_unemployed可以解释房屋价格的95%。

    The regression coefficient (coef) represents the change in the dependent variable resulting from a one unit change in the predictor variable, all other variables being held constant. In our model, a one unit increase in total_unemployed reduces housing_price_index by 8.33. In line with our assumptions, an increase in unemployment appears to reduce housing prices.

    回归系数(coef)表示因预测变量一个单位变化而导致的因变量变化,所有其他变量保持不变。 在我们的模型中,增加一个单位total_unemployed减少housing_price_index 8.33。 根据我们的假设,失业率的上升似乎会降低房价。

    The standard error measures the accuracy of total_unemployed‘s coefficient by estimating the variation of the coefficient if the same test were run on a different sample of our population. Our standard error, 0.41, is low and therefore appears accurate.

    如果同一检验是在我们人口的不同样本上进行的,则标准误差通过估计系数的变化来衡量总total_unemployed系数的准确性。 我们的标准误差为0.41,很低,因此看起来很准确。

    The p-value means the probability of an 8.33 decrease in housing_price_index due to a one unit increase in total_unemployed is 0%, assuming there is no relationship between the two variables. A low p-value indicates that the results are statistically significant, that is in general the p-value is less than 0.05.

    p值表示假设两个变量之间没有关系,则由于total_unemployed增加1个单位而导致housing_price_index下降8.33的可能性为0%。 低p值表示结果具有统计意义,即通常p值小于0.05。

    The confidence interval is a range within which our coefficient is likely to fall. We can be 95% confident that total_unemployed‘s coefficient will be within our confidence interval, [-9.185, -7.480].

    置信区间是我们的系数可能下降的范围。 我们可以有95%的信心, total_unemployed的系数将在我们的信心区间[-9.185,-7.480]之内。

    Let’s use statsmodels’ plot_regress_exog function to help us understand our model.

    让我们使用statsmodels的plot_regress_exog函数来帮助我们了解我们的模型。

    回归图   (Regression Plots )

    Please see the four graphs below.

    请参阅下面的四个图表。

    1. The “Y and Fitted vs. X” graph plots the dependent variable against our predicted values with a confidence interval. The inverse relationship in our graph indicates that housing_price_index and total_unemployed are negatively correlated, i.e., when one variable increases the other decreases.
    2. The “Residuals versus total_unemployed” graph shows our model’s errors versus the specified predictor variable. Each dot is an observed value; the line represents the mean of those observed values. Since there’s no pattern in the distance between the dots and the mean value, the OLS assumption of homoskedasticity holds.
    3. The “Partial regression plot” shows the relationship between housing_price_index and total_unemployed, taking in to account the impact of adding other independent variables on our existing total_unemployed coefficient. We’ll see later how this same graph changes when we add more variables.
    4. The Component and Component Plus Residual (CCPR) plot is an extension of the partial regression plot, but shows where our trend line would lie after adding the impact of adding our other independent variables on our existing total_unemployed coefficient. More on this plot here.
    1. “ Y and Fitted vs. X”(Y和拟合与X的关系图)图以一个置信区间将因变量相对于我们的预测值进行绘制。 我们图表中的反比关系表明housing_price_indextotal_unemployed呈负相关,即,当一个变量增加而另一变量减少时。
    2. “残差与total_unemployed ”图显示了我们模型的误差与指定的预测变量的关系。 每个点都是一个观察值; 该线代表那些观察值的平均值。 由于点和平均值之间的距离没有规律,因此OLS假设为同方差。
    3. “偏回归图”显示了housing_price_indextotal_unemployed之间的关系,并考虑了添加其他自变量对我们现有的total_unemployed系数的影响。 稍后我们将看到当添加更多变量时,同一图形如何变化。
    4. Component and Component Plus Residual(CCPR)图是部分回归图的扩展,但显示了在添加其他自变量对我们现有的total_unemployed系数的影响后,趋势线将位于total_unemployed 。 更多关于这个情节在这里

    Simple Linear Regression Plot

    The next plot graphs our trend line (green), the observations (dots), and our confidence interval (red).

    下图绘制了趋势线(绿色),观察值(点)和置信区间(红色)。

    # this produces our trend line
    
    from statsmodels.sandbox.regression.predstd import wls_prediction_std
    import numpy as np
    
    # predictor variable
    x = df[['total_unemployed']]
    # dependent variable
    y = df[['housing_price_index']]
    
    # retrieve our confidence interval values
    # _ is a dummy variable since we don't actually use it for plotting but need it as a placeholder
    # since wls_prediction_std(housing_model) returns 3 values
    _, confidence_interval_lower, confidence_interval_upper = wls_prediction_std(housing_model)
    
    fig, ax = plt.subplots(figsize=(10,7))
    
    # plot the dots
    # 'o' specifies the shape (circle), we can also use 'd' (diamonds), 's' (squares)
    ax.plot(x, y, 'o', label="data")
    
    # plot the trend line
    # g-- and r-- specify the color to use
    ax.plot(x, housing_model.fittedvalues, 'g--.', label="OLS")
    # plot upper and lower ci values
    ax.plot(x, confidence_interval_upper, 'r--')
    ax.plot(x, confidence_interval_lower, 'r--')
    # plot legend
    ax.legend(loc='best');# this produces our trend line
    
    from statsmodels.sandbox.regression.predstd import wls_prediction_std
    import numpy as np
    
    # predictor variable
    x = df[['total_unemployed']]
    # dependent variable
    y = df[['housing_price_index']]
    
    # retrieve our confidence interval values
    # _ is a dummy variable since we don't actually use it for plotting but need it as a placeholder
    # since wls_prediction_std(housing_model) returns 3 values
    _, confidence_interval_lower, confidence_interval_upper = wls_prediction_std(housing_model)
    
    fig, ax = plt.subplots(figsize=(10,7))
    
    # plot the dots
    # 'o' specifies the shape (circle), we can also use 'd' (diamonds), 's' (squares)
    ax.plot(x, y, 'o', label="data")
    
    # plot the trend line
    # g-- and r-- specify the color to use
    ax.plot(x, housing_model.fittedvalues, 'g--.', label="OLS")
    # plot upper and lower ci values
    ax.plot(x, confidence_interval_upper, 'r--')
    ax.plot(x, confidence_interval_lower, 'r--')
    # plot legend
    ax.legend(loc='best'); 

    趋势图

    So far, our model looks decent. Let’s add some more variables and see how total_unemployed reacts.

    到目前为止,我们的模型看起来不错。 让我们添加更多变量,看看total_unemployed如何React的。

    多元线性回归   (Multiple Linear Regression )

    Mathematically, multiple linear regression is:

    从数学上讲,多元线性回归为:

    We know that unemployment cannot entirely explain housing prices. To get a clearer picture of what influences housing prices, we add and test different variables and analyze the regression results to see which combinations of predictor variables satisfy OLS assumptions, while remaining intuitively appealing from an economic perspective.

    我们知道失业不能完全解释房价。 为了更清楚地了解影响房价的因素,我们添加并测试了不同的变量,并对回归结果进行了分析,以查看哪些预测变量组合满足OLS假设,同时从经济角度仍然具有直观吸引力。

    We arrive at a model that contains the following variables: fed_funds, consumer_price_index, long_interest_rate, and gross_domestic_product, in addition to our original predictor, total_unemployed.

    我们到达包含以下变量的模型: fed_fundsconsumer_price_indexlong_interest_rategross_domestic_product ,除了我们原来的预测, total_unemployed

    Adding the new variables decreased the impact of total_unemployed on housing_price_index. total_unemployed‘s impact is now more unpredictable (standard error increased from 0.41 to 2.399), and, since the p-value is higher (from 0 to 0.943), less likely to influence housing prices.

    添加新变量减少了total_unemployedhousing_price_index的影响。 现在total_unemployed的影响更加不可预测( 标准误从0.41增加到2.399),并且由于p值较高(从0到0.943),因此影响房价的可能性较小。

    Although total_unemployed may be correlated with housing_price_index, our other predictors seem to capture more of the variation in housing prices. The real-world interconnectivity among our variables can’t be encapsulated by a simple linear regression alone; a more robust model is required. This is why our multiple linear regression model’s results change drastically when introducing new variables.

    尽管total_unemployed可能与housing_price_index相关housing_price_index ,但我们的其他预测变量似乎捕获了更多的房价变化。 我们变量之间的真实世界互连性不能仅通过简单的线性回归来封装。 需要一个更强大的模型。 这就是为什么我们的多元线性回归模型的结果在引入新变量时会发生巨大变化的原因。

    That all our newly introduced variables are statistically significant at the 5% threshold, and that our coefficients follow our assumptions, indicates that our multiple linear regression model is better than our simple linear model.

    我们所有新引入的变量在5%阈值处具有统计显着性,并且我们的系数遵循我们的假设,这表明我们的多元线性回归模型优于我们的简单线性模型。

    The code below sets up a multiple linear regression with our new predictor variables.

    下面的代码使用我们的新预测变量建立了多元线性回归。

    Out[27]:
    出[27]:
    OLS Regression Results OLS回归结果
    Dep. Variable: 部门 变量: housing_price_index housing_price_index R-squared: R平方: 0.980 0.980
    Model: 模型: OLS 最小二乘 Adj. R-squared: 调整 R平方: 0.974 0.974
    Method: 方法: Least Squares 最小二乘 F-statistic: F统计: 168.5 168.5
    Date: 日期: Fri, 17 Feb 2017 2017年2月17日,星期五 Prob (F-statistic): 概率(F统计): 7.32e-14 7.32e-14
    Time: 时间: 18:02:42 18:02:42 Log-Likelihood: 对数似然: -55.164 -55.164
    No. Observations: 号观察: 23 23 AIC: AIC: 122.3 122.3
    Df Residuals: Df残渣: 17 17 BIC: BIC: 129.1 129.1
    Df Model: DF型号: 5 5
    Covariance Type: 协方差类型: nonrobust 不稳健
    coef ef std err 标准错误 t Ť P>|t| P> | t | [95.0% Conf. Int.] [95.0%Conf。 整数]
    Intercept 截距 -389.2234 -389.2234 187.252 187.252 -2.079 -2.079 0.053 0.053 -784.291 5.844 -784.291 5.844
    total_unemployed 共有失业 -0.1727 -0.1727 2.399 2.399 -0.072 -0.072 0.943 0.943 -5.234 4.889 -5.234 4.889
    long_interest_rate long_interest_rate 5.4326 5.4326 1.524 1.524 3.564 3.564 0.002 0.002 2.216 8.649 2.216 8.649
    federal_funds_rate Federal_funds_rate 32.3750 32.3750 9.231 9.231 3.507 3.507 0.003 0.003 12.898 51.852 12.898 51.852
    consumer_price_index 消费者价格指数 0.7785 0.7785 0.360 0.360 2.164 2.164 0.045 0.045 0.020 1.537 0.020 1.537
    gross_domestic_product 国内生产总值 0.0252 0.0252 0.010 0.010 2.472 2.472 0.024 0.024 0.004 0.047 0.004 0.047
    Omnibus: 综合: 1.363 1.363 Durbin-Watson: 杜宾·沃森: 1.899 1.899
    Prob(Omnibus): 概率(Omnibus): 0.506 0.506 Jarque-Bera (JB): Jarque-Bera(JB): 1.043 1.043
    Skew: 偏斜: -0.271 -0.271 Prob(JB): 概率(JB): 0.594 0.594
    Kurtosis: 峰度: 2.109 2.109 Cond. No. 条件。 没有。 4.58e+06 4.58e + 06

    再看偏回归图   (Another Look at Partial Regression Plots )

    Now let’s plot our partial regression graphs again to visualize how the total_unemployed variable was impacted by including the other predictors. The lack of trend in the partial regression plot for total_unemployed (in the figure below, upper right corner), relative to the regression plot for total_unemployed (above, lower left corner), indicates that total unemployment isn’t as explanatory as the first model suggested. We also see that the observations from the latest variables are consistently closer to the trend line than the observations for total_unemployment, which reaffirms that fed_funds, consumer_price_index, long_interest_rate, and gross_domestic_product do a better job of explaining housing_price_index.

    现在,让我们再次绘制局部回归图,以可视化通过包含其他预测变量对total_unemployed变量的影响。 相对于total_unemployed回归图 (上方,左下角), total_unemployed的局部回归图( total_unemployed图中,右上角)缺少趋势,这表明总失业率不像第一个模型那样具有解释性。 建议 。 我们还看到,与total_unemployment的观察total_unemployment ,最新变量的观察始终比趋势线更接近趋势线,这再次表明fed_fundsconsumer_price_indexlong_interest_rategross_domestic_product的说明housing_price_index更好地解释housing_price_index

    These partial regression plots reaffirm the superiority of our multiple linear regression model over our simple linear regression model.

    这些部分回归图重申了我们的多元线性回归模型优于简单线性回归模型的优势。

    # this produces our six partial regression plots
    
    fig = plt.figure(figsize=(20,12))
    fig = sm.graphics.plot_partregress_grid(housing_model, fig=fig)# this produces our six partial regression plots
    
    fig = plt.figure(figsize=(20,12))
    fig = sm.graphics.plot_partregress_grid(housing_model, fig=fig) 

    回归图

    结论   (Conclusion )

    We have walked through setting up basic simple linear and multiple linear regression models to predict housing prices resulting from macroeconomic forces and how to assess the quality of a linear regression model on a basic level.

    我们已经逐步建立了基本的简单线性和多重线性回归模型,以预测由宏观经济力量产生的房价,以及如何在基本水平上评估线性回归模型的质量。

    To be sure, explaining housing prices is a difficult problem. There are many more predictor variables that could be used. And causality could run the other way; that is, housing prices could be driving our macroeconomic variables; and even more complex still, these variables could be influencing each other simultaneously.

    可以肯定的是,解释房价是一个难题。 还有更多可以使用的预测变量。 因果关系可以相反。 也就是说,房价可能会推动我们的宏观经济变量; 甚至更复杂的是,这些变量可能同时相互影响。

    I encourage you to dig into the data and tweak this model by adding and removing variables while remembering the importance of OLS assumptions and the regression results.

    我鼓励您深入研究数据并通过添加和删除变量来调整该模型,同时记住OLS假设和回归结果的重要性。

    Most importantly, know that the modeling process, being based in science, is as follows: test, analyze, fail, and test some more.

    最重要的是,要知道基于科学的建模过程如下:测试,分析,失败和测试更多。

    • No Lit Review: While it’s tempting to dive in to the modeling process, ignoring the existing body of knowledge is perilous. A lit review might have revealed that linear regression isn’t the proper model to predict housing prices. It also might have improved variable selection. And spending time on a lit review at the outset can save a lot of time in the long run.
    • Small sample size: Modeling something as complex as the housing market requires more than six years of data. Our small sample size is biased toward the events after the housing crisis and is not representative of long-term trends in the housing market.
    • Multicollinearity: A careful observer would’ve noticed the warnings produced by our model regarding multicollinearity. We have two or more variables telling roughly the same story, overstating the value of each of the predictors.
    • Autocorrelation: Autocorrelation occurs when past values of a predictor influence its current and future values. Careful reading of the Durbin-Watson score would’ve revealed that autocorrelation is present in our model.

      In a future post, we’ll attempt to resolve these flaws to better understand the economic predictors of housing prices.

    • 暂无评论 :尽管很想进入建模过程,但忽略现有知识是危险的。 轻描淡写的评论可能表明线性回归并不是预测房价的合适模型。 它还可能改善了变量选择。 从一开始就花时间进行简短的评论就可以节省很多时间。
    • 样本量小 :对像住房市场这样复杂的事物进行建模需要超过六年的数据。 我们的小样本样本倾向于住房危机后的事件,并不代表住房市场的长期趋势。
    • 多重共线性:细心的观察者会注意到我们的模型产生的关于多重共线性的警告。 我们有两个或多个变量讲述的故事大致相同,从而夸大了每个预测变量的价值。
    • 自相关 :当预测变量的过去值影响其当前值和将来值时,就会发生相关。 仔细阅读Durbin-Watson分数将表明我们的模型中存在自相关。

      在以后的文章中,我们将尝试解决这些缺陷,以更好地了解房价的经济预测因素。

    翻译自: https://www.pybloggers.com/2017/03/predicting-housing-prices-linear-regression-using-python-pandas-statsmodels/

    pandas 线性回归

    展开全文
  • 通过使用Python、pandas和statsmodels线性回归预测房屋的价格在这篇文章中,我们将逐步通过建立线性回归模型来预测经济活动导致的房屋价格。其中涵盖的主题包括:1. 什么是回归2. 变量的选择3. 利用pandas读取数据4....

    Python部落(python.freelycode.com)组织翻译,禁止转载,欢迎转发。

    通过使用Python、pandas和statsmodels线性回归预测房屋的价格

    在这篇文章中,我们将逐步通过建立线性回归模型来预测经济活动导致的房屋价格。其中涵盖的主题包括:

    1.    什么是回归

    2.    变量的选择

    3.    利用pandas读取数据

    4.    普通最小二乘假设

    5.    一元线性回归

    6.    回归图像

    7.    多元线性回归

    8.    另一个角度看偏回归图像

    9.    总结

    10.    实际浏览内容的缺陷

    未来的文章将涵盖例如探索性分析、回归诊断和先进的回归模型等话题,但是我打算先跳过这些,以便读者可以试着动手处理数据。

    什么是回归?

    线性回归是一个模型,通过预测因变量(绘制在垂直或y轴上)和预测变量之间的正比例关系,从而绘制出一条直线,如图所示:

    当进行建模过程时,我们会更详细地讨论线性回归。

    变量的选择

    我们将使用房价指数(HPI)作为因变量,通过房价指数测量住宅房的价格变动。

    而对于预测变量,我们选择对宏观经济活动有影响的指标,例如失业率、利率和国内生产总值(总生产率),这样选择完全是出于直觉。对于这些变量的解释,以及它们是如何影响房价的解释,以及所有的源数据都在https://github.com/LearnDataSci/blog-post-resources/tree/master/Housing%20Price%20Index%20Regression。

    利用pandas读取数据

    当我们下载了本文的源数据后,请通过使用pandas的read_csv方法来读取数据。

    拥有数据后,为了便于分析,我们可以调用merge方法将数据合并在一个简单的数据帧内。其中一些数据是按月份记录,而另一些是按季度记录。不必担心。我们依据某一列来合并数据帧,所以每个用于测量目的行都在其该在的位置。在下面的例子中,合并的最佳列是日期列,看下图。

    让我们使用pandas的head方法快速得到变量的统计表。其中粗体的首行代表日期和将要测试线性回归模型的变量,每一行对应不同时期的变量值。

    日期    sp500    居民消费价格指数    长期利率    房价指数    总失业率

    0    2011-01-01    1282.62    220.22    3.39    181.35    16.2

    1    2011-04-01    1331.51    224.91    3.46    180.80    16.1

    2    2011-07-01    1325.19    225.92    3.00    184.25    15.9

    3    2011-10-01    1207.22    226.42    2.15    181.51    15.8

    4    2011-01-01    1300.58    226.66    1.97    179.13    15.2

    超过15周    没有在工会找到工作的    多份工作    离开者    失败者    联邦基金利率

    0    8393    2800    6816    6.5    60.1    0.17

    1    8016    2466    6823    6.8    59.4    0.10

    2    8177    2785    6850    6.8    59.2    0.07

    3    7802    2555    6917    8.0    57.9    0.07

    4    7433    2809    7022    7.4    57.1    0.08

    支出总额    劳动力的公关    生产者价格指数    国内生产总值

    0    5766.7    64.2    192.7    14881.3

    1    5870.8    64.2    203.1    14989.6

    2    5802.6    64.0    204.6    15021.1

    3    5812.9    64.1    201.1    15190.3

    4    5765.7    63.7    200.7    15291.0

    通常聚集数据的下一步将会是探索性分析。探索性分析是我们分析变量过程的一部分(通过画图和描述性统计)和得出对于因变量最好的预测因素。为了简洁,我们将跳过探索性分析。将探索性分析记在你的脑海中,它非常重要,尽管在现实中跳过它将会影响预测额结果。

    我们将使用普通最小二乘法(OLS),一个基本的但功能强大的方法来评估我们的模型。

    普通最小二乘假设

    OLS测量了一个线性回归模型的准确性。

    OLS是基于假设,如果成立,则表明该模型大概是一个正确的透镜,通过它可以解释我们的数据。但如果假设不成立,模型的结论会失去它的有效性。所以你应该采取额外的努力来选择正确的模型,不应该有个人神秘主义和小题大做的毛病。

    下面是一些OLS的假设:

    1.    线性关系:一个线性关系存在于因变量和预测变量之间。而如果没有线性关系存在,线性回归不是一个解释我们数据的正确模型。

    2.    无多重共线性:因变量之间不是共线的,例如它们之间没有密切的关系。如果某些因变量之间存在密切联系,可以尝试删去一个或多个共线的因变量。由于额外的预测提供了冗余信息,消除他们不应该大大减少(Adj的判定系数)(看下面)。

    3.    零条件均值:观察与趋势线的平均距离(或残差)为0。结果有些是积极的,有些是消极的,但它们不会偏向于一组值。

    4.    同方差性:因变量的确定性(或不确定性)是等价映射一个预测变量的所有值;也就是说在残差中没有模式。即在统计学术语中,方差是不变的。

    5.    无自相关(序列相关性):自相关是指变量在观测中与本身相关。例如,如果某一天的股价影响第二天的股票价格,那么股票价格可能会连续相关的。

    让我们开始建模吧。

    一元线性回归

    一元线性回归使用了一个预测变量来解释一个因变量。一元线性回归方程如下:

    其中:

    y=因变量,

    ß=回归系数,

    α=截距(即当独立变量为0时的预期平均价格),

    x=用来预测Y的预测变量(或独立变量),

    ε=误差项,占我们模型无法解释的随机性的比例。

    使用statsmodels的ols函数,构造我们的模型,设定房价指数作为总失业率的一个函数。我们假定总的失业人数的增长将迫使房价下降。可能这样是错的,但我们至少得从某个方面先开始!

    下面的代码展示了如何将总失业率作为预测变量的一元线性回归模型。

    Out[24]:

    OLS回归的结果如下:

    部分变量    房价指数    决定系数    0.952

    模型    OLS    邻近决定系数    0.949

    方法    最小二乘法    F-统计量    413.2

    日期    2017年2月17日,星期五    概率(F-统计量)    2.71e-15

    时间    17:57:05    对数似然    -65.450

    观测次数    23    AIC    134.9

    Df残差    21    BIC    137.2

    Df模型    1

    协方差类型    覆盖

    协同系数    标准差    t    P>|t|    [95.0%Conf.Int]

    截距    313.3128    5.408    57.938    0.000    302.067

    324.559

    总失业率    -8.3324    0.410    -20.327    0.000    -9.185

    -7.480

    综合性    0.492    杜宾-沃森    1.126

    概率(综合性)    0.782    雅克-贝拉(JB)    0.552

    倾斜    0.294    概率(JB)    0.759

    峰态    2.521    Cond.No.    78.9

    根据上面得出的OLS回归结果,我们将提供一些参数的高层次解释,以了解我们的模型的实力:Adj.R-squared、系数、标准差和p值。

    Adj.R-squared,我们的预测变量,总失业率可以解释95%的房价。

    回归系数(coef)表示因变量的改变导致预测变量一个单位的改变,而其他变量保持不变。在我们的模型中,总失业率一个单位的增加会减少8.33的房价指数。符合我们的假设,失业率的增加似乎会降低房价。

    标准差测量了总失业率系数的准确性,通过评估在相同的测试运行在不同样本的人口的情况下得到的系数变化。我们的标准差0.41是比较低的,所以表现得比较准确。

    P值指当总失业率没有变化的情况下,房价指数降低8.33的可能性,即假定这两个变量之间没有关系。一个低的p值表明这些结果具有统计显著性,即通常情况下p值低于0.05。

    置信区间是一个系数可能下降在内部的范围。我们有95%的自信说,总失业率的系数将落在[-9.185,-7.480]的置信区间内。

    让我们使用statsmodels的plot_regress_exog函数来帮助我们理解我们的模型。

    回归图像

    请看下面的四幅图像。

    1. “Y和拟合x”图绘制了因变量相对于我们的预测值与置信区间。相反的关系在图中表明房价指数与总失业率是负相关的,例如当一个变量增加时另一个变量减少。

    2.“残差与总失业率”的图像显示了模型关于确定预测变量对应的误差。图像中每一个具体的点都是观测的值;图中的线表示那些观测值得平均值。因为有些点与平均没有距离关系,所以OLS假设同方差性成立。

    3.“偏回归图像”显示了房价指数与总失业率之间的关系,考虑到在已存在的总失业率的协同因素中添加其他独立变量的影响。之后我们会看到当增加更多的变量后同样的图像会怎样变化。

    4.分量和分量加残差的图像是一个偏回归图像的扩展,但显示了在总失业率的协同因素中添加了其他的独立变量后,增加的影响使得趋势线有错误。更多有关图像在这里。

    下一张图描绘了我们的趋势线(绿色),观测值(点)和我们的置信区间(红色)。

    到目前为止,我们的模型看上去不错,让我们增加更多的变量来看看会对总失业率产生怎样的影响。

    多元线性回归

    数学角度的多元线性回归方程为:

    我们知道失业率不能完全解释房价。为了得到更能说明影响房价的图像,我们增加并测试了不同变量并通过分析回归结果得到了不同预测变量中更满足OLS假设的组合,同时从经济角度也保持了直观的吸引力。

    我们得到的模型包含如下的变量:

    联邦基金、居民消费价格指数、长期利率和国内生产总值,除此之外还有我们最初的预测变量,总失业率。

    增加的新变量降低了总失业率对房价指数的影响,总失业率的影响是最不可预知的(标准误差从0.41增加到2.399),而且因为p值更高(从0增长到0.943),所以总失业率不太可能影响到房价。

    虽然总失业率可能与房价指数是有关联的,我们的其他预测变量似乎是为抓住更多的房价的变化。我们给出的变量在现实生活中联系不能仅仅由一个一元线性回归封装;我们需要一个更具有鲁棒性的模型。这就是为什么当我们引入新变量时,得到的多元线性回归模型的结果发生了巨大的改变。

    我们所有新引入的变量在5%阈值上具有统计学显著性,而且伴随我们假设的系数表明,我们的多元线性回归模型比一元线性回归模型更好。

    下面的代码利用我们的新预测变量建立起了多元线性回归。

    Out[27]: OLS回归结果

    部分变量    房价指数    决定系数    0.980

    模型    OLS    邻近决定系数    0.974

    方法    最小二乘法    F-统计量    168.5

    日期    2017年2月17日,星期五    概率(F-统计量)    7.32e-14

    时间    18:02:42    对数似然    -55.164

    观测次数    23    AIC    122.3

    Df残差    17    BIC    129.1

    Df模型    5

    协方差类型    覆盖

    协同系数    标准差    t    P>|t|    [95%Conf.Int.]

    截距    -389.2234    187.252    -2.079    0.053    -784.291

    5.844

    总失业率    -0.1727    2.399    -0.072    0.943    -5.234

    4.889

    长期利率    5.4326    1.524    3.564    0.002    2.216

    8.649

    联邦基金利率    32.3750    9.231    3.507    0.003    12.898

    51.852

    居民消费价格指数    0.7785    0.360    2.164    0.045    0.020

    1.537

    国内生产总值    0.0252    0.010    2.472    0.024    0.004

    0.047

    综合性    1.363    杜宾-沃森    1.899

    概率(综合性)    0.506    雅克-贝拉(JB)    1.043

    倾斜    -0.271    概率(JB)    0.594

    峰态    2.109    Cond.No.    4.58e+06

    另一个角度看偏回归图像

    现在让我们再次画出偏回归图像,以显示出通过包含其他预测变量导致总失业率变量是如何被影响的。总失业率在偏回归图像中缺少趋势(在上面的右上角图),相对于回归图的总失业率(上面的左下角图),表明了总失业率的作用并不是像第一个模型解释的那样。我们也看到最新变量所观测到的值始终比总失业率的观测值更接近于趋势线。重复一遍,联邦基金、居民消费价格指数、长期利率和国内生产总值对房价指数有更好的解释。

    这些偏回归图像重申了多元线性回归模型较于一元线性回归模型的优越性。

    总结

    我们已经通过建立基本的一元线性和多元线性回归模型来预测宏观经济力量造成的房价和如何评估质量的线性回归模型的基本水平。

    可以肯定的是,解释房价是一个难题。有许多可以使用的预测变量。因果关系可以以另一种方式运行,即房价可能会推动我们的宏观经济变量,甚至更复杂的,这些变量可能同时相互影响。

    我鼓励你深入挖掘数据,通过增加和移除变量来调整这个模型,同时记住OLS假设和回归结果的重要性。

    最重要的是,要知道基于科学的建模过程如下:测试,分析,失败和继续进一步的测试。

    实际浏览内容的缺陷

    本文是对基本的回归模型的入门引导,但是有经验的数据科学家会看到在方法和模型中存在几个缺陷,如下:

    没有点燃审查:虽然很容易深入到建模过程中,但忽略了现存的知识结构是危险的。一个点燃的回顾可能表明,线性回归可能不是正确预测房价的模型。它也可能有改进的变量选择。从长远来看,在一开始花时间做一次点燃审查可以节省大量的时间。

    小样本:建模复杂的房屋市场需要超过六年的数据。我们的小样本是偏向住房危机后的事件,并不代表房地产市场的长期趋势。

    多重共线性: 一个细心的观察者会注意到警告模型关于多重共线性的产生。我们有两个或两个以上变量得到大致相同的结果,通过高估每一个预测变量的意义。

    自相关:当预测的过去值影响其当前和未来值时会发生自相关。仔细阅读的杜宾-沃森评分将揭示,自相关的情况是存在于我们的模型中。

    在未来的文章中,我们将试图解决这些缺陷,以更好地理解关于房价的经济预测。

    英文原文:http://www.learndatasci.com/predicting-housing-prices-linear-regression-using-python-pandas-statsmodels/?imm_mid=0eddcf&cmp;=em-data-na-na-newsltr_20170301

    译者:一叶障慕

    展开全文
  • Deciphering解读 the Markets with Technical Analysis In this chapter, we will go through some popular methods of technical analysis and show how to apply them while analyzing market data....

    Deciphering解读 the Markets with  Technical Analysis 

    In this chapter, we will go through some popular methods of technical analysis and show how to apply them while analyzing market data. We will perform basic algorithmic trading using market trends, support, and resistance.

         You may be thinking of how we can come up with our own strategies? And are there any naive strategies that worked in the past that we can use by way of reference?

         As you read in the first chapter https://blog.csdn.net/Linli522362242/article/details/121337016, mankind has been trading assets for centuries. Numerous strategies have been created to increase the profit or sometimes just to keep the same profit. In this zero-sum game, the competition is considerable. It necessitates a constant innovation in terms of trading models and also in terms of technology. In this race to get the biggest part of the pie first, it is important to know the basic foundation of analysis in order to create trading strategies. When predicting the market, we mainly assume that the past repeats itself in future. In order to predict future prices and volumes, technical analysts study the historical market data. Based on behavioral economics and quantitative analysis, the market data is divided into two main areas.

         First, are chart patterns. This side of technical analysis is based on recognizing trading patterns and anticipating[ænˈtɪsɪpeɪtɪŋ]预期 when they will reproduce in the future. This is usually more difficult to implement.

         Second, are technical indicators. This other side uses mathematical calculation to forecast the financial market direction. The list of technical indicators is sufficiently long to fill an entire book on this topic alone, but they are composed of a few different principal domains: trend, momentum, volume, volatility, and support and resistance. We will focus on the support and resistance strategy as an example to illustrate one of the most well-known technical analysis approaches.

         In this chapter, we will cover the following topics:

    • Designing a trading strategy based on trend-and momentum-based indicators
    • Creating trading signals based on fundamental technical analysis
    • Implementing advanced concepts, such as seasonality, in trading instruments

    Designing a trading strategy based on trend-and momentum-based indicators

         Trading strategies based on trend and momentum are pretty similar. If we can use a metaphor比喻 to illustrate the difference, the trend strategy uses speed, whereas the momentum strategy uses acceleration. With the trend strategy, we will study the price historical data. If this price keeps increasing for the last fixed amount of days, we will open a long position (Long positions make money when market prices are higher than the price of the position, and lose money when market prices are lower than the price of the position.) by assuming that the price will keep raising.

         The trading strategy based on momentum is a technique where we send orders based on the strength of past behavior. The price momentum is the quantity of motion that a price has. The underlying rule is to bet that an asset price with a strong movement in a given direction will keep going in the same direction in the future. We will review a number of technical indicators expressing momentum in the market. Support and resistance are examples of indicators predicting future behavior.

    Support and resistance indicators

         In the first chapter, we explained the principle of the evolution of prices based on supply and demand. The price decreases when there is an increase in supply, and the price increases when demand rises.

    • When there is a fall in price, we expect the price fall to pause due to a concentration of demands需求集中( since people will flatten the position and convert the unrealized loss to realized loss for reducing future loss). This virtual limit will be referred to as a support line. Since the price becomes lower, it is more likely to find buyers.
    • Inversely, when the price starts rising, we expect a pause in this increase due to a concentration of supplies供应集中( since people will flatten the position and convert the unrealized profit to realized profit ). This is referred to as the resistance line. It is based on the same principle, showing that a high price leads sellers to sell.

    This exploits the market psychology of investors following this trend of buying when the price is low and selling when the price is high.

         To illustrate an example of a technical indicator (in this part, support and resistance), we will use the Google data from the first chapter https://blog.csdn.net/Linli522362242/article/details/121337016. Since you will use the data for testing many times, you should store this data frame to your disk. Doing this will help you save time when you want to replay the data. To avoid complications with stock split, we will only take dates without splits. Therefore, we will keep only 620 days. Let's have a look at the following code:

    import pandas as pd
    from pandas_datareader import data
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
    except:    
        # Call the function DataReader from the class data
        goog_data2 = data.DataReader( 'GOOG',  # ticker
                                      'yahoo', # source 
                                       start_date, end_date
                                    )
        goog_data2.to_pickle( SRC_DATA_FILENAME )

    In the following code, the following applies: 

    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(8,6) )
    ax1 = fig.add_subplot( 111 )
    
    ax1.plot( highs, color='c', lw=2. )
    ax1.plot( lows, color='y', lw=2. )
    plt.hlines( highs.head(200).max(), lows.index.values[0],
                                       lows.index.values[-1],
                linewidth=2, color='g'
              )
    plt.hlines( lows.head(200).min(), lows.index.values[0],
                                      lows.index.values[-1],
                linewidth=2, color='r'
              )
    # why not use .vlines since it need to provide the values of ymin and ymax
    plt.axvline( x=lows.index.values[200], # ymin=0, ymax=1
                linewidth=3, color='b', linestyle='--'
               )
    
    plt.setp( ax1.get_xticklabels(), rotation=45, horizontalalignment='right', fontsize=12 )
    # plt.xticks(fontsize=14)
    plt.yticks(fontsize=12)
    ax1.set_ylabel('Google price in $', fontsize=14, rotation=90)
    
    plt.show()

     
    In this plot, the following applies:

    • We draw the highs and lows of the GOOG price.
    • The green line represents the resistance level( highs.head(200).max() = 789.869995 ), and the red line represents the support level( lows.head(200).min() = 565.04998779).
    • To build these lines, we use the maximum value of the GOOG price and the minimum value of the GOOG price stored daily.
    • After the 200th day (dotted vertical blue line), we will buy when we reach the support line, and sell when we reach the resistance line. In this example, we used 200 days so that we have sufficient data points to get an estimate of the trend.
    • It is observed that the GOOG price will reach the resistance line around August 2016. This means that we have a signal to enter a short position (sell).
    • Once traded, we will wait to get out of this short position when the GOOG price will reach the support line.
    • With this historical data, it is easily noticeable that this condition will not happen. This will result in carrying a short position in a rising market without having any signal to sell it, thereby resulting in a huge loss.
    • This means that, even if the trading idea based on support/resistance has strong grounds根据 in terms of economical behavior, in reality, we will need to modify this trading strategy to make it work.
    • Moving the support/resistance line to adapt to the market evolution will be key to the trading strategy efficiency.

         In the middle of the following chart, we show three fixed-size time windows. We took care of adding the tolerance margin that we will consider to be sufficiently close to the limits (support and resistance):

    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(10,6) )
    ax1 = fig.add_subplot( 111 )
    
    ax1.plot( highs, color='c', lw=2. )
    ax1.plot( lows, color='y', lw=2. )
    plt.hlines( highs.head(200).max(), lows.index.values[0],
                                       lows.index.values[-1],
                linewidth=2, color='g'
              )
    plt.hlines( lows.head(200).min(), lows.index.values[0],
                                      lows.index.values[-1],
                linewidth=2, color='r'
              )
    
    # adding the tolerance margin to be close to the limits (support and resistance)
    plt.fill_betweenx( [ highs.head(200).max()*0.96, highs.head(200).max() ],
                       lows.index.values[200], lows.index.values[400],
                       facecolor='green', alpha=0.5
                     )
    plt.fill_betweenx( [ lows.head(200).min(), lows.head(200).min() * 1.05 ],
                       lows.index.values[200], lows.index.values[400],
                       facecolor='r', alpha=0.5
                     )
    
    # why not use .vlines since it need to provide the values of ymin and ymax
    plt.axvline( x=lows.index.values[200], # ymin=0, ymax=1
                linewidth=3, color='b', linestyle='--'
               )
    plt.axvline( x=lows.index.values[400], # ymin=0, ymax=1
                linewidth=3, color='b', linestyle=':'
               )
    
    plt.setp( ax1.get_xticklabels(), rotation=45, horizontalalignment='right', fontsize=12 )
    # plt.xticks(fontsize=14)
    plt.yticks(fontsize=12)
    ax1.set_ylabel('Google price in $', fontsize=14, rotation=90)
    
    plt.show()

         If we take a new 200-day window after the first one, the support/resistance levels will be recalculated. We observe that the trading strategy will not get rid of the GOOG position (while the market keeps raising) since the price does not go back to the support level.

         Since the algorithm cannot get rid of a position, we will need to add more parameters to change the behavior in order to enter a position. The following parameters can be added to the algorithm to change its position:

    • There can be a shorter rolling window.
    • We can count the number of times the price reaches a support or resistance line.
    • A tolerance margin can be added to consider that a support or resistance value can attain around a certain percentage of this value.

         This phase is critical when creating your trading strategy. You will start by observing how your trading idea will perform using historical data, and then you will increase the number of parameters of this strategy to adjust to more realistic test cases. 

    In our example, we can introduce two further parameters:

    • The minimum number of times that a price needs to reach the support/resistance level.
    • We will define the tolerance margin of what we consider being close to the support/resistance level.

    Let's now have a look at the code: 

    import pandas as pd
    from pandas_datareader import data
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data.pkl'
    
    try:
        goog_data = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        # Call the function DataReader from the class data
        goog_data = data.DataReader( 'GOOG',  # ticker
                                      'yahoo', # source 
                                       start_date, end_date
                                    )
        goog_data.to_pickle( SRC_DATA_FILENAME )
        
    goog_data_signal = pd.DataFrame( index=goog_data.index )
    goog_data_signal['price'] = goog_data['Adj Close']


    ###################

    import yfinance as yf
    import pandas as pd
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data2.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        goog_data2 = yf.download( 'goog', start=start_date, end=end_date) 
        goog_data2.to_pickle( SRC_DATA_FILENAME )
        
    goog_data2.head()


    ###################

    goog_data_signal.head()

     

    Now, let's have a look at the other part of the code where we will implement the trading strategy:

    import numpy as np
                                          # a shorter rolling window.
    def trading_support_resistance( data, bin_width=20 ):
        # tolerance margin of what we consider being close to the support/resistance level
        data['sup_tolerance'] = np.zeros( len(data) )
        data['res_tolerance'] = np.zeros( len(data) )
        
        # count the number of times the price reaches a support or resistance line.
        data['sup_count'] = np.zeros( len(data) )
        data['res_count'] = np.zeros( len(data) )
        
        data['sup'] = np.zeros( len(data) )
        data['res'] = np.zeros( len(data) )
        
        data['positions'] = np.zeros( len(data) )
        data['signal'] = np.zeros( len(data) )
        
        in_support=0
        in_resistance=0
        
        # assume len(data) >= 2*window_size, then jump over first window_size,
        # and window_size=bin_width
        for idx in range( bin_width-1+bin_width, len(data)):
            data_section = data[idx-bin_width:idx+1] # start_idx(hidden:jump):idx-bin_width=bin_width-1
            
            # The level of support and resistance is calculated by 
            # taking the maximum and minimum price and 
            # then subtracting and adding a 20% margin.
            support_level = min( data_section['price'] )
            resistance_level = max( data_section['price'] )
            data['sup'][idx] = support_level
            data['res'][idx] = resistance_level
            
            range_level = resistance_level-support_level
            data['sup_tolerance'][idx] = support_level + 0.2*range_level
            data['res_tolerance'][idx] = resistance_level - 0.2*range_level
            
            if data['res_tolerance'][idx] <= data['price'][idx] <= data['res'][idx]:
                in_resistance+=1
                data['res_count'][idx] = in_resistance
            elif data['sup'][idx] <= data['price'][idx] <= data['sup_tolerance'][idx]:
                in_support+=1
                data['sup_count'][idx] = in_support
            else:
                in_support = 0
                in_resistance=0
                
            if in_resistance>2: # The price is continuously hovering within the resistance margin
                data['signal'][idx] = 1 # The price may reach or break through the resistance level
            elif in_support>2:  # The price is continuously hovering within the support margin
                data['signal'][idx] = 0 # The price may reach or break through the support level
            else:
                data['signal'][idx] = data['signal'][idx-1]
        data['positions'] = data['signal'].diff()# (long) positions>0 ==> buy, positions=0 ==> wait
                                                 # (short) positions<0 ==> sell
    trading_support_resistance( goog_data_signal )
    goog_data_signal.info()

    goog_data_signal.reset_index(inplace=True) ###########
    
    import matplotlib.pyplot as plt
     
    fig = plt.figure(figsize=(8,6))
    ax1 = fig.add_subplot( 111, ylabel='Google price in $' )
     
    ax1.plot( goog_data_signal['Date'][40:],
              goog_data_signal['sup'][40:], 
              color='g', lw=2., label='sup' )
    ax1.plot( goog_data_signal['Date'][40:],
              goog_data_signal['res'][40:], 
              color='b', lw=2., label='res')
    ax1.plot( goog_data_signal['Date'],
              goog_data_signal['price'],
              color='r', lw=2., label='price'
            )
    
    
    # draw an up arrow when we buy one Google share: 
    ax1.plot( goog_data_signal[ goog_data_signal.positions == 1 ]['Date'],
              goog_data_signal[ goog_data_signal.positions == 1 ]['price'],
              '^', markersize=7, color='k', label='buy',
            )
    ax1.plot( goog_data_signal.loc[goog_data_signal.positions==-1.0]['Date'],
              goog_data_signal[goog_data_signal.positions == -1.0]['price'],
              'v', markersize=7, color='y', label='sell',
            )
    
    ax1.set_xlabel('Date')
    plt.setp( ax1.get_xticklabels(), rotation=45, horizontalalignment='right' )
    plt.legend()
    plt.show()

         The codes will return the following output. The plot shows a 20-day rolling window calculating resistance and support(note we jump over fist window (window_size=20), use the data from second window_size)

     From this plot, it is observed that a buy order is sent when a price stays in the resistance tolerance margin for 2 consecutive days, and that a sell order is sent when a price stays in the support tolerance margin for 2 consecutive days.

    ############################
    why we jumped over fist window (window_size=20), used the data from second window_size?

    import pandas as pd
    from pandas_datareader import data
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data.pkl'
    
    try:
        goog_data = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        # Call the function DataReader from the class data
        goog_data = data.DataReader( 'GOOG',  # ticker
                                      'yahoo', # source 
                                       start_date, end_date
                                    )
        goog_data.to_pickle( SRC_DATA_FILENAME )
        
    goog_data_signal = pd.DataFrame( index=goog_data.index )
    goog_data_signal['price'] = goog_data['Adj Close']
    
                                          # a shorter rolling window.
    def trading_support_resistance( data, bin_width=20 ):
        # tolerance margin of what we consider being close to the support/resistance level
        data['sup_tolerance'] = np.zeros( len(data) )
        data['res_tolerance'] = np.zeros( len(data) )
        
        # count the number of times the price reaches a support or resistance line.
        data['sup_count'] = np.zeros( len(data) )
        data['res_count'] = np.zeros( len(data) )
        
        data['sup'] = np.zeros( len(data) )
        data['res'] = np.zeros( len(data) )
        
        data['positions'] = np.zeros( len(data) )
        data['signal'] = np.zeros( len(data) )
        
        in_support=0
        in_resistance=0
        
        # assume len(data) >= 2*window_size, then jump over first window_size,
        # and window_size=bin_width
        for idx in range( bin_width-1, len(data)):###
            data_section = data[idx-bin_width+1:idx] # start_idx(hidden:jump):idx-bin_width=bin_width-1
            
            # The level of support and resistance is calculated by 
            # taking the maximum and minimum price and 
            # then subtracting and adding a 20% margin.
            support_level = min( data_section['price'] )
            resistance_level = max( data_section['price'] )
            data['sup'][idx] = support_level
            data['res'][idx] = resistance_level
            
            range_level = resistance_level-support_level
            data['sup_tolerance'][idx] = support_level + 0.2*range_level
            data['res_tolerance'][idx] = resistance_level - 0.2*range_level
            
            if data['res_tolerance'][idx] <= data['price'][idx] <= data['res'][idx]:
                in_resistance+=1
                data['res_count'][idx] = in_resistance
            elif data['sup'][idx] <= data['price'][idx] <= data['sup_tolerance'][idx]:
                in_support+=1
                data['sup_count'][idx] = in_support
            else:
                in_support = 0
                in_resistance=0
                
            if in_resistance>2: # The price is continuously hovering within the resistance margin
                data['signal'][idx] = 1 # The price may reach or break through the resistance level
            elif in_support>2:  # The price is continuously hovering within the support margin
                data['signal'][idx] = 0 # The price may reach or break through the support level
            else:
                data['signal'][idx] = data['signal'][idx-1]
        data['positions'] = data['signal'].diff()# (long) positions>0 ==> buy, positions=0 ==> wait
                                                 # (short) positions<0 ==> sell
    trading_support_resistance( goog_data_signal )
    
    goog_data_signal.reset_index(inplace=True)
    
    import matplotlib.pyplot as plt
     
    fig = plt.figure(figsize=(8,6))
    ax1 = fig.add_subplot( 111, ylabel='Google price in $' )
     
    ax1.plot( goog_data_signal['Date'][20:],###
              goog_data_signal['sup'][20:], ###
              color='g', lw=2., label='sup' )###
    ax1.plot( goog_data_signal['Date'][20:],###
              goog_data_signal['res'][20:], ###
              color='b', lw=2., label='res')
    ax1.plot( goog_data_signal['Date'],
              goog_data_signal['price'],
              color='r', lw=2., label='price'
            )
    
    
    # draw an up arrow when we buy one Google share: 
    ax1.plot( goog_data_signal[ goog_data_signal.positions == 1 ]['Date'],
              goog_data_signal[ goog_data_signal.positions == 1 ]['price'],
              '^', markersize=7, color='k', label='buy',
            )
    ax1.plot( goog_data_signal.loc[goog_data_signal.positions==-1.0]['Date'],
              goog_data_signal[goog_data_signal.positions == -1.0]['price'],
              'v', markersize=7, color='y', label='sell',
            )
    
    ax1.set_xlabel('Date')
    plt.setp( ax1.get_xticklabels(), rotation=45, horizontalalignment='right' )
    plt.legend()
    plt.show()

     

     vs We found that the adjusted close price line overlaps with the support level line, and it is very dangerous to fail to respond in time ( without selling the goog share will let us lose more money)

    Backtesting

    initial_capital = float( 1000.0 )
    
    positions = pd.DataFrame( index=goog_data_signal.index ).fillna(0.0)
    portfolio = pd.DataFrame( index=goog_data_signal.index ).fillna(0.0)
    
    
    # Next, we will store the GOOG positions in the following data frame:
    positions['GOOG'] = goog_data_signal['signal'] # 1(buy): daily_difference > 0, 0(sell): daily_difference <= 0
     
    # Then, we will store the amount of the GOOG positions for the portfolio in this one:
    portfolio['positions'] = ( positions.multiply( goog_data_signal['price'], 
                                                   axis=0
                                                 )
                             )
     
    # Next, we will calculate the non-invested money (cash or remaining cash):
                                          # positions.diff() == goog_data_signal['positions']
                                          # +1 : buy, -1: sell, 0:you not have any position on the market
    portfolio['cash'] = initial_capital - ( positions.diff().multiply( goog_data_signal['price'],
                                                                       axis=0
                                                                     )
                                          ).cumsum() # if current row in the result of cumsum() <0 : +profit + cash
                                                     # if current row in the result of cumsum() >0 : -loss + cash  
     
    # The total investment will be calculated by summing the positions and the cash:
    portfolio['total'] = portfolio['positions'] + portfolio['cash']
     
    fig = plt.figure( figsize=(8,6) )
    ax = fig.add_subplot( 111 )
     
    ax.plot( goog_data_signal['Date'], portfolio)
     
    plt.setp( ax.get_xticklabels(), rotation=45, horizontalalignment='right' )
    ax.set_xlabel('Date')
             # ['positions', 'cash', 'total']
    ax.legend(portfolio.columns, loc='upper left')
     
    plt.show()

    stackplot:  total = current cash+ current stock price

         When we create a trading strategy, we have an initial amount of money (cash). We will invest this money (holdings). This holding value is based on the market value of the investment. If we own a stock and the price of this stock increases, the value of the holding will increase. When we decide to sell, we move the value of the holding corresponding to this sale to the cash amount. The sum total of the assets is the sum of the cash and the holdings. The preceding chart shows that the strategy is profitable since the amount of cash increases toward the end. The graph allows you to check whether your trading idea can generate money.

    ############################

         In this section, we learned the difference between trend and momentum trading strategies( the trend strategy uses speed(each day price move), whereas the momentum strategy uses acceleration(rolling window)), and we implemented a very well used momentum trading strategy based on support and resistance levels. We will now explore new ideas to create trading strategies by using more technical analysis.

    Creating trading signals based on fundamental technical analysis

         This section will show you how to use technical analysis to build trading signals. We will start with one of the most common methods, the simple moving average, and we will discuss more advanced techniques along the way. Here is a list of the signals we will cover:

    • Simple Moving Average (SMA)
    • Exponential Moving Average (EMA)
    • Absolute Price Oscillator (APO)
    • Moving Average Convergence Divergence (MACD)
    • Bollinger Bands (BBANDS)
    • Relative Strength Indicator (RSI)
    • Standard Deviation (STDEV)
    • Momentum (MOM)

    Simple moving average

         Simple moving average, which we will refer to as SMA, is a basic technical analysis indicator. The simple moving average, as you may have guessed from its name, is computed by adding up the price of an instrument over a certain period of time divided by the number of time periods. It is basically the price average over a certain time period, with equal weight being used for each price. The time period over which it is averaged is often referred to as the lookback period or history. Let's have a look at the following formula of the simple moving average:

     Here, the following applies:

    • : Price at time period i
    • : Number of prices added together or the number of time periods

         Let's implement a simple moving average that computes an average over a 20-day moving window. We will then compare the SMA values against daily prices, and it should be easy to observe the smoothing that SMA achieves. 

    import pandas as pd
    from pandas_datareader import data
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
    except:    
        # Call the function DataReader from the class data
        goog_data2 = data.DataReader( 'GOOG',  # ticker
                                      'yahoo', # source 
                                       start_date, end_date
                                    )
        goog_data2.to_pickle( SRC_DATA_FILENAME )
    
    goog_data = goog_data2.tail(620)   
    goog_data.head()

     

    Implementation of the simple moving average

         In this section, the code demonstrates how you would implement a simple moving average, using a list (history) to maintain a moving window of prices and a list (SMA values) to maintain a list of SMA values: ==  goog_data['Close'].rolling(window=20, min_periods=1).mean()

    close = goog_data['Close']
    
    import statistics as stats
    
    time_period = 20 # number of days over which to average
    history = [] # to track a history of prices
    sma_values = [] # to track simple moving average values
    
    for close_price in close:
        history.append( close_price )
        if len(history) > time_period: # we remove oldest price because we only
            del( history[0] )          # average over last ' time_period' prices
        sma_values.append( stats.mean(history) )
    
    goog_data = goog_data.assign( ClosePrice = pd.Series( close,
                                                          index = goog_data.index
                                                        )
                                )
    goog_data = goog_data.assign( Simple20DayMovingAverage = pd.Series( sma_values,
                                                                        index = goog_data.index
                                                                      )
                                )
    goog_data.head()

    goog_data.tail()

    close_price = goog_data['ClosePrice']
    sma = goog_data['Simple20DayMovingAverage']
    
    import matplotlib.pyplot as plt
    import datetime
    import matplotlib.ticker as ticker
    
    fig = plt.figure( figsize= (10,6) )
    ax1 = fig.add_subplot(111, xlabel='Date', ylabel='Google close price in $')
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='close_price' )
    ax1.plot( goog_data.index.values, sma, color='r', lw=2., label='sma' )
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    from matplotlib.dates import DateFormatter
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    plt.legend()
    
    plt.show()

         In this plot, it is easy to observe that the 20-day SMA has the intended smoothing effect and evens out拉平 the micro-volatility in the actual stock price, yielding a more stable price curve.

    use rolling() to calculate SMA

    goog_data['SMA_20'] = goog_data['Close'].rolling(20).mean()
    goog_data[:25]

    close_price = goog_data['ClosePrice']
    sma = goog_data['SMA_20'] ###
    
    import matplotlib.pyplot as plt
    import datetime
    import matplotlib.ticker as ticker
    
    fig = plt.figure( figsize= (10,6) )
    ax1 = fig.add_subplot(111, xlabel='Date', ylabel='Google close price in $')
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='close_price' )
    ax1.plot( goog_data.index.values, sma, color='r', lw=2., label='sma' )
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    from matplotlib.dates import DateFormatter
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    plt.legend()
    
    plt.show()

     

         Note the difference from the previous sma curve: the value of sma is NaN in the first 20 days.

    min_periods  int, default None

         Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, min_periods will default to 1. Otherwise, min_periods will default to the size of the window.

    goog_data['SMA_20'] = goog_data['Close'].rolling(window=20, min_periods=1).mean()
    goog_data[:25]

     

    SMA from yahoo finance


         yahoo fiance uses interval = 1W to make the goog stock close price smoother, so if you use SMA=20, the curve will be more frequent, so you can only set SMA=5, so that the moving average you see will be more similar to what we drew

    Exponential moving average

         The exponential moving average, which we will refer to as the EMA, is the single most well-known and widely used technical analysis indicator for time series data.

         The EMA is similar to the simple moving average, but, instead of weighing all prices in the history equally, it places more weight on the most recent price observation and less weight on the older price observations. This is endeavoring to capture the intuitive idea that the new price observation has more up-to-date information than prices in the past. It is also possible to place more weight on older price observations and less weight on the newer price observations. This would try to capture the idea that longer-term trends have more information than short-term volatile price movements.

         The weighting depends on the selected time period of the EMA;

    • the shorter the time period, the more reactive越强烈 the EMA is to new price observations; in other words, the EMA converges to new price observations faster and forgets older observations faster, also referred to as Fast EMA.
    • The longer the time period, the less reactive the EMA is to new price observations; that is, EMA converges to new price observations slower and forgets older observations slower, also referred to as Slow EMA.

         Based on the description of EMA, it is formulated as a weight factor,  applied to new price observations and a weight factor applied to the current value of EMA(to get the new value of EMA. Since the sum of the weights should be 1 to keep the EMA units the same as price units, that is, $s, the weight factor applied to EMA() values turns out to be . Hence, we get the following two formulations of new EMA values based on old EMA values and new price observations, which are the same definitions, written in two different forms: 

     OR 

     Alternatively, we have the following: 

     Here, the following applies:
    P : Current price of the instrument
    : EMA value prior to the current price observation
    : Smoothing constant, most commonly set to 
    n : Number of time periods (similar to what we used in the simple moving average)

    Implementation of the exponential moving average 

         Let's implement an exponential moving average with 20 days as the number of time periods to compute the average over. We will use a default smoothing factor of 2 / (n + 1) for this implementation. Similar to SMA, EMA also achieves an evening out across normal daily prices. EMA has the advantage of allowing us to weigh recent prices with higher weights than an SMA does, which does uniform weighting. 

    In the following code, we will see the implementation of the exponential moving average:

    close = goog_data['Close']
    num_periods = 20 # number of days over which to average
    K = 2/(num_periods+1) # smoothing constant
    ema_p = 0
    ema_values = [] #  to hold computed EMA values
    
    for close_price in close:
        if ema_p == 0: # first observation, EMA = current-price
            ema_p = close_price
        else:
            ema_p = ( close_price - ema_p )*K + ema_p
        
        ema_values.append( ema_p )
                # append operation: goog_data['ClosePrice']
    goog_data = goog_data.assign( ClosePrice=pd.Series( close,
                                                        index=goog_data.index
                                                      )
                                )
    goog_data = goog_data.assign( Exponential20DayMovingAverage = pd.Series( ema_values, 
                                                                             index=goog_data.index 
                                                                           )
                                )
    close_price = goog_data['ClosePrice']
    ema = goog_data['Exponential20DayMovingAverage']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(10,6) )
    ax1 = fig.add_subplot( 111 )#, xlabel='Date', ylabel='Google price in $'
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.plot( goog_data.index.values, ema, color='b', lw=2., label='Exponential20DayMovingAverage' )
    ax1.set_xlabel('Date',fontsize=12)
    ax1.set_ylabel('Google price in $',fontsize=12)
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    from matplotlib.dates import DateFormatter
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    plt.legend()
    
    plt.show()

     

    ewm or ewma(Exponential Weighted Moving Average)

    https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ewm.html

    adjust   bool, default True
    Divide by decaying adjustment factor除以衰减调整因子 in beginning periods to account for imbalance in relative weightings (viewing EWMA as a moving average).

    • When adjust=True (default), the EW function is calculated using weights .
      For example, the EW moving average of the series [x0,x1,...,xt] (or a price list) of the instrument would be:
      等式的上方是对当前价格到最初价格的加权求和,用的加权因子是
      等式的下方是对所有加权因子的求和(可用等比例求和公式)
      等比例求和公式推导 
        and(公比)==>

    • When adjust=False, the exponentially weighted function is calculated recursively:
       OR 

    close = goog_data['Close']
    num_periods = 20 # number of days over which to average
    
    
    goog_data['close_20_ema'] = goog_data['Close'].ewm( ignore_na=False,
                                                        span=num_periods, # K = 2/(num_periods+1) # smoothing constant
                                                        min_periods=0,
                                                        adjust=False   ###
                                                      ).mean()
    
    goog_data.head(21)

    if adjust=True: https://blog.csdn.net/Linli522362242/article/details/121172551

    close = goog_data['Close']
    num_periods = 20 # number of days over which to average
    
    
    goog_data['close_20_ema'] = goog_data['Close'].ewm( ignore_na=False,
                                                        span=num_periods, # K = 2/(num_periods+1) # smoothing constant
                                                        min_periods=0,
                                                        adjust=True   ###
                                                      ).mean()
    
    
    close_price = goog_data['ClosePrice']
    ema = goog_data['Exponential20DayMovingAverage']
    ema_20 = goog_data['close_20_ema']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(10,6) )
    ax1 = fig.add_subplot( 111 )#, xlabel='Date', ylabel='Google price in $'
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.plot( goog_data.index.values, ema, color='b', lw=2., label='Exponential20DayMovingAverage' )
    ax1.plot( goog_data.index.values, ema_20, color='k', lw=2., label='close_20_ewma' )
    
    ax1.set_xlabel('Date',fontsize=12)
    ax1.set_ylabel('Google price in $',fontsize=12)
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    from matplotlib.dates import DateFormatter
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    plt.legend()
    

    Adjust=True and Adjust=False only have the initial difference, and then the same. The initial ewma is closer to the price trend

     goog_data.tail()

    %timeit goog_data['Close'].ewm( ignore_na=False,span=num_periods,  min_periods=0,adjust=True ).mean()

     Faster!

    %timeit goog_data['Close'].ewm( ignore_na=False,span=num_periods,  min_periods=0,adjust=False ).mean()

    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(12,8) )
    ax1 = fig.add_subplot( 111 )#, xlabel='Date', ylabel='Google price in $'
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.plot( goog_data.index.values, ema, color='b', lw=2., label='Exponential20DayMovingAverage' )
    ax1.plot( goog_data.index.values, ema_20, color='k', lw=2., label='close_20_ewma' )
    ax1.plot( goog_data.index.values, sma, color='y', lw=2., label='sma' )
    
    ax1.set_xlabel('Date',fontsize=12)
    ax1.set_ylabel('Google price in $',fontsize=12)
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    from matplotlib.dates import DateFormatter
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    plt.legend()
    
    plt.show()

     From the plot, it is observed that EMA has a very similar smoothing effect to SMA(ewma better than sma), as expected, and it reduces the noise in the raw prices. However the extra parameter, , available in EMA in addition to the parameter n, allows us to control the relative weight placed on the new price observation, as compared to older price observations. This allows us to build different variants of EMA by varying the parameter to make fast and slow EMAs, even for the same parameter,  We will explore fast and slow EMAs more in the rest of this chapter and in later chapters.

    Absolute price oscillator绝对价格震荡指标

         The absolute price oscillator, which we will refer to as APO, is a class of indicators that builds on top of moving averages of prices to capture specific short-term deviations in prices.

         The absolute price oscillator is computed by finding the difference between a fast exponential moving average and a slow exponential moving average. Intuitively, it is trying to measure how far the more reactive EMA () is deviating from the more stable EMA (). A large difference is usually interpreted as one of two things: instrument prices are starting to trend or break out, or instrument prices are far away from their equilibrium prices, in other words, overbought or oversold:

    Implementation of the absolute price oscillator

         Let's now implement the absolute price oscillator, with the faster EMA using a period of 10 days and a slower EMA using a period of 40 days, and default smoothing factors being 2/11 and 2/41, respectively, for the two EMAs: 

    import yfinance as yf
    import pandas as pd
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data2.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        goog_data2 = yf.download( 'goog', start=start_date, end=end_date) 
        goog_data2.to_pickle( SRC_DATA_FILENAME )
        
    goog_data=goog_data2.tail(620)

    close = goog_data['Close']
    
    num_periods_fast = 10 # time period for the fast EMA
    K_fast = 2/(num_periods_fast+1) # smoothing factor for fast EMA
    ema_fast = 0 # initial ema
    
    num_periods_slow = 40 # time period for slow EMA
    K_slow = 2/(num_periods_slow+1) # smoothing factor for slow EMA
    ema_slow = 0 # initial ema
    
    ema_fast_values = [] # we will hold fast EMA values for visualization purposes
    ema_slow_values = [] # we will hold slow EMA values for visualization purposes
    apo_values = [] # track computed absolute price oscillator values
    
    for close_price in close:
        if ema_fast == 0: # first observation
            ema_fast = close_price
            ema_slow = close_price
        else:
            ema_fast = (close_price - ema_fast) * K_fast + ema_fast
            ema_slow = (close_price - ema_slow) * K_slow + ema_slow
            
        ema_fast_values.append( ema_fast )
        ema_slow_values.append( ema_slow )
        apo_values.append( ema_fast - ema_slow )

         The preceding code generates APO values that have higher positive and negative values when the prices are moving away from long-term EMA(here, num_periods_slow=40) very quickly (breaking out), which can have a trend-starting interpretation or an overbought/sold interpretation. Now, let's visualize the fast and slow EMAs and visualize the APO values generated:

    goog_data = goog_data.assign( ClosePrice=pd.Series(close,
                                                       index=goog_data.index
                                                      )
                                )
    goog_data = goog_data.assign( FastExponential10DayMovingAverage = pd.Series( ema_fast_values, 
                                                                                 index=goog_data.index
                                                                               )
                                )
    goog_data = goog_data.assign( SlowExponential40DayMovingAverage = pd.Series( ema_slow_values, 
                                                                                 index=goog_data.index
                                                                               )
                                )
    goog_data = goog_data.assign( AbsolutePriceOscillator = pd.Series( apo_values,
                                                                       index=goog_data.index
                                                                     )
                                )
    
    close_price = goog_data['ClosePrice']
    ema_f = goog_data['FastExponential10DayMovingAverage']
    ema_s = goog_data['SlowExponential40DayMovingAverage']
    apo = goog_data['AbsolutePriceOscillator']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(15,8) )
    
    ax1 = fig.add_subplot(211)
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.plot( goog_data.index.values, ema_f, color='b', lw=2., label='FastExponential_10_DayMovingAverage' )
    ax1.plot( goog_data.index.values, ema_s, color='k', lw=2., label='SlowExponential_40_DayMovingAverage' )
    # ax1.set_xlabel('Date',fontsize=12)
    ax1.set_ylabel('Google price in $',fontsize=12)
    ax1.legend()
    
    ax2 = fig.add_subplot( 212 )
    ax2.plot( goog_data.index.values, apo, color='k', lw=2., label='AbsolutePriceOscillator')
    ax2.set_ylabel('APO', fontsize=12)
    ax2.set_xlabel('Date', fontsize=12)
    ax2.legend()
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    ax2.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax2.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax2.margins(0,0.05) # move all curves to up
    
    from matplotlib.dates import DateFormatter
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    ax2.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax2.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    plt.show()

         One observation here is the difference in behavior between fast and slow EMAs. The faster one is more reactive to new price observations, and the slower one is less reactive to new price observations and decays slower.

    • The APO values are positive when prices are breaking out to the upside, and the magnitude of the APO values captures the magnitude of the breakout.
    • The APO values are negative when prices are breaking out to the downside, and the magnitude of the APO values captures the magnitude of the breakout.
    • In a later chapter in this book, we will use this signal in a realistic trading strategy.

    Moving average convergence divergence

         The moving average convergence divergence is another in the class of indicators that builds on top of moving averages of prices. We'll refer to it as MACD. This goes a step further than the APO. Let's look at it in greater detail.

         The moving average convergence divergence was created by Gerald Appel. It is similar in spirit to an absolute price oscillator in that it establishes the difference between a fast exponential moving average and a slow exponential moving average. However, in the case of MACD, we apply a smoothing exponential moving average to the MACD value itself in order to get the final signal output from the MACD indicator. Optionally, you may also look at the difference between MACD values and the EMA of the MACD values (signal) and visualize it as a histogram. A properly configured MACD signal can successfully capture the direction, magnitude, and duration of a trending instrument price:

        MACD_EMA_SHORT = 12
        MACD_EMA_LONG = 26
        MACD_EMA_SIGNAL = 9
    
        @classmethod
        def _get_macd(cls, df):
            """ Moving Average Convergence Divergence
            This function will initialize all following columns.
            MACD Line (macd): (12-day EMA - 26-day EMA)
            Signal Line (macds): 9-day EMA of MACD Line
            MACD Histogram (macdh): MACD Line - Signal Line
            :param df: data
            :return: None
            """
            ema_short = 'close_{}_ema'.format(cls.MACD_EMA_SHORT)
            ema_long = 'close_{}_ema'.format(cls.MACD_EMA_LONG)
            ema_signal = 'macd_{}_ema'.format(cls.MACD_EMA_SIGNAL)
            fast = df[ema_short]
            slow = df[ema_long]
            df['macd'] = fast - slow
            df['macds'] = df[ema_signal]
            df['macdh'] = (df['macd'] - df['macds'])
            cls._drop_columns(df, [ema_short, ema_long, ema_signal])

    Implementation of the moving average convergence divergence 

        Let's implement a moving average convergence divergence signal with a fast EMA period of 10 days, a slow EMA period of 40 days, and with default smoothing factors of 2/11 and 2/41, respectively: 

    import yfinance as yf
    import pandas as pd
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data2.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        goog_data2 = yf.download( 'goog', start=start_date, end=end_date) 
        goog_data2.to_pickle( SRC_DATA_FILENAME )
        
    goog_data=goog_data2.tail(620)
    close = goog_data['Close']
    
    num_periods_fast = 10 # time period for the fast EMA
    K_fast = 2/(num_periods_fast+1) # smoothing factor for fast EMA
    ema_fast = 0 # initial ema
    
    num_periods_slow = 40 # time period for slow EMA
    K_slow = 2/(num_periods_slow+1) # smoothing factor for slow EMA
    ema_slow = 0 # initial ema
    
    num_periods_macd = 20 # MACD ema time period
    K_macd = 2/(num_periods_macd+1) # MACD EMA smoothing factor
    ema_macd= 0
    
    ema_fast_values = [] # we will hold fast EMA values for visualization purposes
    ema_slow_values = [] # we will hold slow EMA values for visualization purposes
    macd_values = [] # tract MACD values for visualization purpose # MACD = EMA_fast - EMA_slow
    
    macd_signal_values = [] # MACD EMA values tracker # MACD_signal = EMA_MACD
    
    macd_histogram_values = [] # MACD = MACD - MACD_signal
    
    for close_price in close:
        if ema_fast == 0: # first observation
            ema_fast = close_price
            ema_slow = close_price
        else:
            ema_fast = (close_price - ema_fast) * K_fast + ema_fast
            ema_slow = (close_price - ema_slow) * K_slow + ema_slow
            
        ema_fast_values.append( ema_fast )
        ema_slow_values.append( ema_slow )
        
        macd = ema_fast - ema_slow # MACD is fast_MA - slow_EMA # apo_values
        if ema_macd == 0 :
            ema_macd = macd
        else:
            ema_macd = (macd-ema_macd) * K_macd + ema_macd # signal is EMA of MACD values
            
        macd_values.append( macd )
        macd_signal_values.append( ema_macd )
        macd_histogram_values.append( macd-ema_macd )

    In the preceding code, the following applies:

    • The time period used a period of 20 days and a default smoothing factor of 2/21.
    • We also computed a .

         Let's look at the code to plot and visualize the different signals and see what we can understand from it:

    goog_data = goog_data.assign( ClosePrice=pd.Series(close,
                                                       index=goog_data.index
                                                      )
                                )
    goog_data = goog_data.assign( FastExponential10DayMovingAverage = pd.Series( ema_fast_values, 
                                                                                 index=goog_data.index
                                                                               )
                                )
    goog_data = goog_data.assign( SlowExponential40DayMovingAverage = pd.Series( ema_slow_values, 
                                                                                 index=goog_data.index
                                                                               )
                                )
    goog_data = goog_data.assign( MovingAverageConvergenceDivergence = pd.Series( macd_values,
                                                                                  index=goog_data.index
                                                                                )
                                )
    goog_data = goog_data.assign( Exponential20DayMovingAverageOfMACD = pd.Series( macd_signal_values,
                                                                                   index=goog_data.index
                                                                                 )
                                )
    goog_data = goog_data.assign( MACDHistorgram = pd.Series( macd_histogram_values,
                                                              index=goog_data.index
                                                            )
                                )
    
    close_price = goog_data['ClosePrice']
    ema_f = goog_data['FastExponential10DayMovingAverage']
    ema_s = goog_data['SlowExponential40DayMovingAverage']
    macd = goog_data['MovingAverageConvergenceDivergence']
    ema_macd = goog_data['Exponential20DayMovingAverageOfMACD']
    macd_histogram = goog_data['MACDHistorgram']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(15,8) )
    
    ax1 = fig.add_subplot(311)
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.plot( goog_data.index.values, ema_f, color='b', lw=2., 
              label='FastExponential_{}_DayMovingAverage'.format(num_periods_fast) )
    ax1.plot( goog_data.index.values, ema_s, color='k', lw=2.,
              label='SlowExponential_{}_DayMovingAverage'.format(num_periods_slow) )
    # ax1.set_xlabel('Date',fontsize=12)
    ax1.set_ylabel('Google price in $',fontsize=12)
    ax1.legend()
    
    ax2 = fig.add_subplot( 312 )
    ax2.plot( goog_data.index.values, macd, color='k', lw=2., label='MovingAverageConvergenceDivergence' )
    ax2.plot( goog_data.index.values, ema_macd, color='g', lw=2.,
              label='Exponential_{}_DayMovingAverageOfMACD'.format(num_periods_macd))
    #ax2.axhline( y=0, lw=2, color='0.7' )
    ax2.set_ylabel('MACD', fontsize=12)
    ax2.legend()
    
    ax3 = fig.add_subplot( 313 )
    ax3.bar( goog_data.index.values, macd_histogram, color='r', label='MACDHistorgram', width=0.9 )
    ax3.set_ylabel('MACD', fontsize=12)
    ax3.legend()
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    ax2.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax2.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax2.margins(0,0.05) # move all curves to up
    
    ax3.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax3.margins(0,0.05) # move all curves to up
    ax3.set_xticks([])#plt.xticks([]) ###
    ax3.set_ylim(bottom=-30, top=30)
    
    from matplotlib.dates import DateFormatter
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    ax2.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax2.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    plt.subplots_adjust( hspace=0.3 )
    plt.show()

     The preceding code will return the following output. Let's have a look at the plot:

         The MACD signal is very similar to the APO, as we expected, but now, in addition, the  is an additional smoothing factor on top of raw MACD values to capture lasting trending periods by smoothing out the noise of raw values. Finally, the , which is the difference in the two series, captures

    • (a) the time period when the trend is starting or reversion逆转, and
    • (b) the magnitude of lasting trends when  values stay positive or negative after reversing signs.

         MACD在应用上应先行计算出快速(一般选12日)移动平均值与慢速(一般选26日)移动平均值。以这两个数值作为测量两者(快速与慢速线)间的“差离值”依据。所谓“差离值”(DIF),即12日EMA数值减去26日EMA数值。因此,在持续的涨势中,12日EMA在26日EMA之上。其间的正差离值(+DIF)会愈来愈大。反之在跌势中,差离值可能变负(-DIF),此时是绝对值愈来愈大。至于行情开始回转,正或负差离值要缩小到一定的程度,才真正是行情反转的信号。MACD的反转信号界定为“差离值”的9日移动平均值MACD_ema(9日DIF)。 在MACD的异同移动平均线计算公式中,都分别加T+1交易日的份量权值,以现在流行的参数12和26为例,

    close = goog_data['Close']
    
    num_periods_fast = 12 # time period for the fast EMA
    K_fast = 2/(num_periods_fast+1) # smoothing factor for fast EMA
    ema_fast = 0 # initial ema
    
    num_periods_slow = 26 # time period for slow EMA
    K_slow = 2/(num_periods_slow+1) # smoothing factor for slow EMA
    ema_slow = 0 # initial ema
    
    num_periods_macd = 9 # MACD ema time period
    K_macd = 2/(num_periods_macd+1) # MACD EMA smoothing factor
    ema_macd= 0
    
    ema_fast_values = [] # we will hold fast EMA values for visualization purposes
    ema_slow_values = [] # we will hold slow EMA values for visualization purposes
    macd_values = [] # tract MACD values for visualization purpose # MACD = EMA_fast - EMA_slow
    
    macd_signal_values = [] # MACD EMA values tracker # MACD_signal = EMA_MACD
    
    macd_histogram_values = [] # MACD = MACD - MACD_signal
    
    for close_price in close:
        if ema_fast == 0: # first observation
            ema_fast = close_price
            ema_slow = close_price
        else:
            ema_fast = (close_price - ema_fast) * K_fast + ema_fast
            ema_slow = (close_price - ema_slow) * K_slow + ema_slow
            
        ema_fast_values.append( ema_fast )
        ema_slow_values.append( ema_slow )
        
        macd = ema_fast - ema_slow # MACD is fast_MA - slow_EMA # apo_values
        if ema_macd == 0 :
            ema_macd = macd
        else:
            ema_macd = (macd-ema_macd) * K_macd + ema_macd # signal is EMA of MACD values
            
        macd_values.append( macd )
        macd_signal_values.append( ema_macd )
        macd_histogram_values.append( macd-ema_macd )
    goog_data = goog_data.assign( ClosePrice=pd.Series(close,
                                                       index=goog_data.index
                                                      )
                                )
    goog_data = goog_data.assign( FastExponential10DayMovingAverage = pd.Series( ema_fast_values, 
                                                                                 index=goog_data.index
                                                                               )
                                )
    goog_data = goog_data.assign( SlowExponential40DayMovingAverage = pd.Series( ema_slow_values, 
                                                                                 index=goog_data.index
                                                                               )
                                )
    goog_data = goog_data.assign( MovingAverageConvergenceDivergence = pd.Series( macd_values,
                                                                                  index=goog_data.index
                                                                                )
                                )
    goog_data = goog_data.assign( Exponential20DayMovingAverageOfMACD = pd.Series( macd_signal_values,
                                                                                   index=goog_data.index
                                                                                 )
                                )
    goog_data = goog_data.assign( MACDHistorgram = pd.Series( macd_histogram_values,
                                                              index=goog_data.index
                                                            )
                                )
    
    close_price = goog_data['ClosePrice']
    ema_f = goog_data['FastExponential10DayMovingAverage']
    ema_s = goog_data['SlowExponential40DayMovingAverage']
    macd = goog_data['MovingAverageConvergenceDivergence']
    ema_macd = goog_data['Exponential20DayMovingAverageOfMACD']
    macd_histogram = goog_data['MACDHistorgram']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(15,8) )
    
    ax1 = fig.add_subplot(311)
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.plot( goog_data.index.values, ema_f, color='b', lw=2., 
              label='FastExponential_{}_DayMovingAverage'.format(num_periods_fast) )
    ax1.plot( goog_data.index.values, ema_s, color='k', lw=2.,
              label='SlowExponential_{}_DayMovingAverage'.format(num_periods_slow) )
    ax1.set_xlabel('Date',fontsize=12)
    ax1.set_ylabel('Google price in $',fontsize=12)
    ax1.legend()
    
    ax2 = fig.add_subplot( 312 )
    ax2.plot( goog_data.index.values, macd, color='k', lw=2., label='MovingAverageConvergenceDivergence' )
    ax2.plot( goog_data.index.values, ema_macd, color='g', lw=2.,
              label='Exponential_{}_DayMovingAverageOfMACD'.format(num_periods_macd))
    #ax2.axhline( y=0, lw=2, color='0.7' )
    ax2.set_ylabel('MACD', fontsize=12)
    ax2.legend()
    
    ax3 = fig.add_subplot( 313 )
    ax3.bar( goog_data.index.values, macd_histogram, color='r', label='MACDHistorgram', width=0.9 )
    ax3.set_ylabel('MACD', fontsize=12)
    ax3.legend()
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    ax2.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax2.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax2.margins(0,0.05) # move all curves to up
    
    ax3.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax3.margins(0,0.05) # move all curves to up
    ax3.set_xticks([])#plt.xticks([]) ###
    ax3.set_ylim(bottom=-30, top=30)
    
    from matplotlib.dates import DateFormatter
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    ax2.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax2.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    plt.subplots_adjust( hspace=0.3 )
    plt.show()

          故MACD指标是由两线一柱组合起来形成,快速线(黑色线)为DIF(MovingAverageConvergenceDivergence),慢速线(绿色线)为DEA(MACD_ema),柱状图为MACD。在各类投资中,有以下方法供投资者参考:https://baike.baidu.com/item/MACD%E6%8C%87%E6%A0%87/6271283?fromtitle=MACD&fromid=3334786&fr=aladdin

    1.当DIF和DEA大于0(即在图形上表示为它们处于零线以上)并向上移动时,一般表示为行情处于多头行情中,可以买入开仓多头持仓

    2.当DIF和DEA小于0(即在图形上表示为它们处于零线以下)并向下移动时,一般表示为行情处于空头行情中,可以卖出开仓或观望

    3.当DIF和DEA均大于0(即在图形上表示为它们处于零线以上)但都向下移动时,一般表示为行情处于下跌阶段,可以卖出开仓和观望

    4.当DIF和DEA均小于0时(即在图形上表示为它们处于零线以下)但向上移动时,一般表示为行情即将上涨,股票将上涨,可以买入开仓或多头持仓

    指数平滑异同移动平均线,简称MACD,它是一项利用短期指数平均数指标与长期指数平均数指标之间的聚合与分离状况,对买进、卖出时机作出研判的技术指标。

    根据移动平均线原理所发展出来的MACD,一来克服了移动平均线假信号频繁的缺陷,二来能确保移动平均线最大的战果

    其买卖原则为:

    1.DIF(MovingAverageConvergenceDivergence)、DEA((MACD_ema))均为正DIF向上突破DEA,买入信号参考

    2.DIF、DEA均为负DIF向下跌破DEA,卖出信号参考

    3.DIF线与K线发生背离,行情可能出现反转信号。

    4.DIF、DEA的值从正数变成负数,或者从负数变成正数并不是交易信号,因为它们落后于市场

    基本用法

    1. MACD金叉:DIFF 由下向上突破 DEA,为买入信号。

    2. MACD死叉:DIFF 由上向下突破 DEA,为卖出信号。

    3. MACD 绿转红:MACD(bar) 值由负变正,市场由空头转为多头

    4. MACD 红转绿:MACD(bar) 值由正变负,市场由多头转为空头

    5. DIFF 与 DEA 均为正值,即都在零轴线以上时,大势属多头市场DIFF 向上突破 DEA,可作买入信号。

    6. DIFF 与 DEA 均为负值,即都在零轴线以下时,大势属空头市场DIFF 向下跌破 DEA,可作卖出信号。

    7. 当 DEA 线与 K 线趋势发生背离时为反转信号

    8. DEA 在盘整局面时失误率较高,但如果配合RSI 及KDJ指标可适当弥补缺点

    缺点

    ⒈由于MACD是一项中、长线指标,买进点、卖出点和最低价、最高价之间的价差较大。当行情忽上忽下幅度太小或盘整时,按照信号进场后随即又要出场,买卖之间可能没有利润,也许还要赔点价差或手续费

    一两天内涨跌幅度特别大时,MACD来不及反应,因为MACD的移动相当缓和,比较行情的移动有一定的时间差,所以一旦行情迅速大幅涨跌,MACD不会立即产生信号,此时,MACD无法发生作用

    Bollinger bands

    import yfinance as yf
    import pandas as pd
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data2.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        goog_data2 = yf.download( 'goog', start=start_date, end=end_date) 
        goog_data2.to_pickle( SRC_DATA_FILENAME )
        
    goog_data=goog_data2.tail(620)

         Bollinger bands (BBANDS) also builds on top of moving averages, but incorporates recent price volatility that makes the indicator more adaptive to different market conditions. Let's now discuss this in greater detail.

          Bollinger bands is a well-known technical analysis indicator developed by John Bollinger. It

    • computes a moving average of the prices (you can use the simple moving average or the exponential moving average or any other variant). In addition, it
    • computes the standard deviation of the prices in the lookback period by treating the moving average as the mean price. It then
    • creates an upper band that is
      • a moving average,
      • plus
      • some multiple of standard price deviations,
    • and a lower band that is
      • a moving average
      • minus 
      • multiple standard price deviations.

         This band represents the expected volatility of the prices by treating the moving average of the price as the reference price.

         Now, when prices move outside of these bands, that can be interpreted as a breakout/trend signal or an overbought/sold mean reversion逆转 signal.

         Let's look at the equations to compute the upper Bollinger band, , and the lower Bollinger band, . Both depend, in the first instance, on the middle Bollinger band,, which is simply the simple moving average of the previous time periods( in this case, the last days ) denoted by. The upper and lower bands are then computed by adding/subtracting to , which is the product of standard deviation, , which we've seen before, and , which is a standard deviation factor of our choice. The larger the value of chosen, the greater the Bollinger bandwidth for our signal, so it is just a parameter that controls the width in our trading signal:

         Here, the following applies:
    : Standard deviation factor of our choice
    To compute the standard deviation, first we compute the variance: 
    Then, the standard deviation is simply the square root of the variance:

    Implementation of Bollinger bands

         We will implement and visualize Bollinger bands, with 20 days as the time period for SMA ( ): 

         In the preceding code, we used a stdev factor, , of 2 to compute the upper band and lower band from the middle band, and the standard deviation we compute. 

    import statistics as stats
    import math as math
    
    close = goog_data['Close']
    
    time_period = 20 # history length for Simple Moving Average for middle ban
    stdev_factor = 2 # Standard Deviation Scaling factor for the upper and lower bands
    
    history = []    # price history for computing simple moving average
    sma_values = [] # moving average of prices for visualization purposes
    upper_band = [] # upper band values
    lower_band = [] # lower band values
    
    for close_price in close:
        # step1: sma
        history.append( close_price )
        if len(history) > time_period: # only maintain at most 'time_period' number of price observations
            del (history[0])
        
        sma = stats.mean( history )
        sma_values.append( sma ) # simple moving average or middle band
        
        # step2: stdev
        variance = 0 # variance is the square of standard deviation
        
        for hist_price in history:
            variance += ( (hist_price-sma)**2 )
        stdev = math.sqrt( variance/len(history) ) # square root to get standard deviation
        
        # step3: 
        upper_band.append( sma + stdev_factor*stdev )
        lower_band.append( sma - stdev_factor*stdev )

         Now, let's add some code to visualize the Bollinger bands and make some observations:

    goog_data = goog_data.assign( ClosePrice = pd.Series( close,
                                                          index = goog_data.index
                                                        )
                                )
    goog_data = goog_data.assign( MiddleBollingerBand_20DaySMA = pd.Series( sma_values,
                                                                           index = goog_data.index
                                                                         )
                                )
    goog_data = goog_data.assign( UpperBollingerBand_20DaySMA_2StdevFactor = pd.Series( upper_band,
                                                                                      index = goog_data.index
                                                                                    )
                                )
    goog_data = goog_data.assign( LowerBollingerBand_20DaySMA_2StdevFactor = pd.Series( lower_band,
                                                                                      index = goog_data.index
                                                                                    )
                                )
    close_price = goog_data['ClosePrice']
    boll_m = goog_data['MiddleBollingerBand_20DaySMA']
    boll_ub = goog_data['UpperBollingerBand_20DaySMA_2StdevFactor']
    boll_lb = goog_data['LowerBollingerBand_20DaySMA_2StdevFactor']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(12,6) )
    
    ax1 = fig.add_subplot(111)
    ax1.plot( goog_data.index.values, close_price, color='k', lw=2., label='ClosePrice' )
    ax1.plot( goog_data.index.values, boll_m, color='b', lw=2., label='MiddleBollingerBand_20DaySMA')
    ax1.plot( goog_data.index.values, boll_ub, color='g', lw=2., label='UpperBollingerBand_20DaySMA_2StdevFactor')
    ax1.plot( goog_data.index.values, boll_lb, color='r', lw=2., label='LowerBollingerBand_20DaySMA_2StdevFactor')
    ax1.fill_between( goog_data.index.values, boll_ub, boll_lb, alpha=0.1 )
    ax1.set_xlabel('Date', fontsize=12)
    ax1.set_ylabel('Google price in $', fontsize=12)
    
    ax1.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
    # or plt.autoscale(enable=True, axis='x', tight=True)
    ax1.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
    ax1.margins(0,0.05) # move all curves to up
    
    ax1.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
    plt.setp( ax1.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    ax1.legend()
    plt.show()

    股价波动在上限和下限的区间之内,这条带状区的宽窄,随着股价波动幅度的大小而变化,股价涨跌幅度加大时,带状区变宽,涨跌幅度狭小盘整时,带状区则变窄。 

    布林线利用波带可以显示其安全的高低价位。

    当变易性变小,而波带变窄时,激烈的价格波动有可能随时产生

    高,低点穿越波带边线时,立刻又回到波带内,会有回档产生

    波带开始移动后,以此方式进入另一波带,这对于找出目标值有相当帮助。

    应用规则是这样的:当一只股票在一段时间内股价波幅很小,反映在布林线上表现为,股价波幅带长期收窄,而在某个交易日,股价在较大交易量的配合下收盘价突破布林线的阻力线,而此时布林线由收口明显转为开口,此时投资者应该果断买入(从当日的K线图就可明显看出),这是因为,该股票由弱转强,短期上冲的动力不会仅仅一天,短线必然会有新高出现,因此可以果断介入。

         For Bollinger bands, when prices stay within the upper and lower bounds, then not much can be said, but, when prices traverse the upper band, then one interpretation can be that prices are breaking out to the upside and will continue to do so. Another interpretation of the same event can be that the trading instrument is overbought and we should expect a bounce back down.

         The other case is when prices traverse the lower band, then one interpretation can be that prices are breaking out to the downside and will continue to do so. Another interpretation of the same event can be that the trading instrument is oversold and we should expect a bounce back up. In either case, Bollinger bands helps us to quantify and capture the exact time when this happens.

    BOLL指标应用技巧

    1)、当价格运行在布林通道的中轨和上轨之间的区域时,只要不破中轨,说明市场处于多头行情中,只考虑逢低买进,不考虑做空。

    2)、在中轨和下轨之间时,只要不破中轨(高出中轨线,中轨线是20天的简单移动平均线SMA_20),说明是空头市场,交易策略是逢高卖出,不考虑买进

    3)、当市场价格沿着布林通道上轨运行时,说明市场是单边上涨行情,持有的多单要守住,只要价格不脱离上轨区域就耐心持有

    4)、沿着下轨运行时,说明市场目前为单边下跌行情,一般为一波快速下跌行情,持有的空单(做空),只要价格不脱离下轨区域就耐心持有

    5)、当价格运行在中轨区域时,说明市场目前为盘整震荡行情,对趋势交易者来说,这是最容易赔钱的一种行情,应回避,空仓观望为上

    6)、布林通道的缩口状态。价格在中轨附近震荡,上下轨逐渐缩口,此是大行情来临的预兆,应空仓观望,等待时机

    7)、通道缩口后的突然扩张状态。意味着一波爆发性行情来临,此后,行情很可能走单边,可以积极调整建仓,顺势而为

    8)、当布林通道缩口后,在一波大行情来临之前,往往会出现假突破行情,这是主力的陷阱,应提高警惕,可以通过调整仓位化解

    9)、布林通道的时间周期应以周线为主,在单边行情时,所持仓单已有高额利润,为防止大的回调,可以参考日线布林通道的原则出局

    Relative strength indicator

         The relative strength indicator, which we will refer to as RSI, is quite different from the previous indicators we saw that were based on moving averages of prices. This is based on price changes over periods to capture the strength/magnitude of price moves

         The relative strength indicator was developed by J Welles Wilder. It comprises a lookback period, which it uses to compute the magnitude of the average of gains/price increases over that period, as well as the magnitude of the averages of losses/price decreases over that period. Then, it computes the RSI value that normalizes the signal value to stay between 0 and 100, and attempts to capture if there have been many more gains relative to the losses, or if there have been many more losses relative to the gains. RSI values over 50% indicate an uptrend, while RSI values below 50% indicate a downtrend.

    For the last n periods, the following applies: 

    Otherwise, the following applies: 

    Otherwise, the following applies: 

    Implementation of the relative strength indicator

    import yfinance as yf
    import pandas as pd
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data2.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        goog_data2 = yf.download( 'goog', start=start_date, end=end_date) 
        goog_data2.to_pickle( SRC_DATA_FILENAME )
        
    goog_data=goog_data2.tail(620)

    Now, let's implement and plot a relative strength indicator on our dataset: 

    avg_gain or avg_gain use the simple average(sma)

    import statistics as stats
    
    time_period = 20 # look back period to compute gains & losses
    
    gain_history = [] # history of gains over look back period (0 if no gain, magnitude of gain if gain)
    loss_history = [] # history of losses over look back period (0 if no loss, magnitude of loss if loss)
    
    avg_gain_values = [] # track avg gains for visualization purposes
    avg_loss_values = [] # track avg losses for visualization purposes
    
    rsi_values = [] # track computed RSI values
    
    last_price = 0 # current_price - last_price > 0 => gain. 
                   # current_price - last_price < 0 => loss.
    for close_price in close:
        if last_price ==0:
            last_price = close_price
        
        gain_history.append( max(0, close_price-last_price) )
        loss_history.append( max(0, last_price-close_price) )
        last_price = close_price
        
        if len(gain_history) > time_period: # maximum observations is equal to lookback period
            del ( gain_history[0] )
            del ( loss_history[0] )
        
        avg_gain = stats.mean( gain_history ) # average gain over lookback period
        avg_loss = stats.mean( loss_history ) # average loss over lookback period
        avg_gain_values.append( avg_gain )
        avg_loss_values.append( avg_loss )
        
        rs = 0
        if avg_loss > 0: # to avoid division by 0, which is undefined
            rs = avg_gain/avg_loss
        rsi = 100 - ( 100/(1+rs) )
        rsi_values.append( rsi )

    In the preceding code, the following applies:

    • We have used 20 days as our time period over which we computed the average gains and losses and then normalized it to be between 0 and 100 based on our formula for RSI values.
    • For our dataset where prices have been steadily rising, it is obvious that the RSI values are consistently over 50% or more.

    Now, let's look at the code to visualize the final signal as well as the components involved:

    goog_data = goog_data.assign( ClosePrice = pd.Series( close,
                                                          index = goog_data.index
                                                        )
                                )
    goog_data = goog_data.assign( RelativeStrengthAvg_GainOver_20Days = pd.Series( avg_gain_values,
                                                                                   index = goog_data.index
                                                                                 )
                                )
    goog_data = goog_data.assign( RelativeStrengthAvg_LossOver_20Days = pd.Series( avg_loss_values,
                                                                                   index = goog_data.index
                                                                                 )
                                )
    goog_data = goog_data.assign( RelativeStrength_IndicatorOver_20Days = pd.Series( rsi_values,
                                                                                     index = goog_data.index
                                                                                   )
                                )
    close_price = goog_data['ClosePrice']
    rs_gain = goog_data['RelativeStrengthAvg_GainOver_20Days']
    rs_loss = goog_data['RelativeStrengthAvg_LossOver_20Days']
    rsi = goog_data['RelativeStrength_IndicatorOver_20Days']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(15,10) )
    
    ax1 = fig.add_subplot( 311 )
    ax1.plot( goog_data.index.values, close_price, color='k', lw=2., label='ClosePrice' )
    ax1.set_ylabel( 'Google price in $', fontsize=12 )
    ax1.legend()
    
    ax2 = fig.add_subplot( 312 )
    ax2.plot( goog_data.index.values, rs_gain, color='g', lw=2., label='RelativeStrengthAvg_GainOver_20Days' )
    ax2.plot( goog_data.index.values, rs_loss, color='r', lw=2., label='RelativeStrengthAvg_LossOver_20Days' )
    ax2.set_ylabel( 'RS', fontsize=12 )
    ax2.legend()
    
    ax3 = fig.add_subplot( 313 )
    ax3.plot( goog_data.index.values, rsi, color='b', lw=2., label='RelativeStrength_IndicatorOver_20Days' )
    ax3.axhline( y=50, lw=2, color='0.7' )
    ax3.set_ylabel( 'RSI', fontsize=12 )
    ax3.legend()
    
    from matplotlib.dates import DateFormatter
    
    for ax in(ax1, ax2, ax3):
        ax.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
        # or plt.autoscale(enable=True, axis='x', tight=True)
        ax.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
        ax.margins(0,0.05) # move all curves to up
    
        ax.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
        plt.setp( ax.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    plt.subplots_adjust( hspace=0.3 ) # space between axes   
    plt.show()

     The preceding code will return the following output. Let's have a look at the plot:

    goog_data[goog_data['RelativeStrength_IndicatorOver_20Days']>50].count(axis=0)['RelativeStrength_IndicatorOver_20Days'] /\
    goog_data[goog_data['RelativeStrength_IndicatorOver_20Days']<=50].count(axis=0)['RelativeStrength_IndicatorOver_20Days']

     

         The first observation we can make from our analysis of the RSI signal applied to our GOOGLE dataset is that the AverageGain over our time frame of 20 days more often than not exceeds the AverageLoss over the same time frame, which intuitively makes sense because Google has been a very successful stock, increasing in value more or less consistently. Based on that, the RSI indicator also stays above 50% for the majority of the lifetime of the stock(1.6956521739130435)again reflecting the continued gains in the Google stock over the course of its lifetime.

        def _get_smma(cls, df, column, windows):
            """ get smoothed moving average.
            :param df: data
            :param windows: range
            :return: result series
            """
            window = cls.get_only_one_positive_int(windows)
            column_name = '{}_{}_smma'.format(column, window)
            smma = df[column].ewm(
                ignore_na=False, alpha=1.0 / window,
                min_periods=0, adjust=True).mean()
            df[column_name] = smma
            return smma
       
        def _get_rsi(cls, df, n_days):
            """ Calculate the RSI (Relative Strength Index) within N days
            calculated based on the formula at:
            https://en.wikipedia.org/wiki/Relative_strength_index
            :param df: data
            :param n_days: N days
            :return: None
            """
            n_days = int(n_days)
            d = df['close_-1_d']
     
            df['closepm'] = (d + d.abs()) / 2
            df['closenm'] = (-d + d.abs()) / 2
            closepm_smma_column = 'closepm_{}_smma'.format(n_days)
            closenm_smma_column = 'closenm_{}_smma'.format(n_days)
            p_ema = df[closepm_smma_column]
            n_ema = df[closenm_smma_column]
     
            rs_column_name = 'rs_{}'.format(n_days)
            rsi_column_name = 'rsi_{}'.format(n_days)
            df[rs_column_name] = rs = p_ema / n_ema
            df[rsi_column_name] = 100 - 100 / (1.0 + rs)
     
            columns_to_remove = ['closepm',
                                 'closenm',
                                 closepm_smma_column,
                                 closenm_smma_column]
            cls._drop_columns(df, columns_to_remove)
    n_days_7=7
    n_days_14=14
    n_days_20 = 20
    # # close_-1_d — this is the price difference between time t and t-1
    goog_data['close_-1_s'] = goog_data['Close'].shift(1)
    d = goog_data['close_-1_d'] = goog_data['Close']-goog_data['close_-1_s']
     
    goog_data['closepm'] = ( d+d.abs() )/2  # if d>0: (d+d)/2= d, if d<0, (d+(-d))/2= 0 
    goog_data['closenm'] = ( -d+d.abs() )/2 # if d>0: (-d+d)/= 0, if d<0, ((-d)+(-d))/2= -d (>0)
    
    for n_days in (n_days_20,):
        p_ema = goog_data['closepm'].ewm( com = n_days - 1,
                                          min_periods=0, # default 0
                                          adjust=True,
                                        ).mean()
        n_ema = goog_data['closenm'].ewm( com = n_days - 1,
                                          min_periods=0,
                                          adjust=True,
                                        ).mean()
        rs_column_name = 'rs_{}'.format(n_days)
        rsi_column_name = 'rsi_{}'.format(n_days)
        goog_data['p_ema'] = p_ema
        goog_data['n_ema'] = n_ema
        goog_data[rs_column_name] = rs = p_ema / n_ema
        goog_data[rsi_column_name] = 100 - 100 / (1.0 + rs)   
     
    goog_data=goog_data.drop(['closepm','closenm','close_-1_s', 'close_-1_d'], axis=1)
    goog_data[['RelativeStrengthAvg_GainOver_20Days',
               'p_ema',
               'RelativeStrengthAvg_LossOver_20Days',
               'n_ema',
               'RelativeStrength_IndicatorOver_20Days',
               'rsi_20'
              ]
             ].head(25)
    n_days_7=7
    n_days_14=14
    n_days_20 = 20
    # # close_-1_d — this is the price difference between time t and t-1
    goog_data['close_-1_s'] = goog_data['Close'].shift(1)
    d = goog_data['close_-1_d'] = goog_data['Close']-goog_data['close_-1_s']
     
    goog_data['closepm'] = ( d+d.abs() )/2  # if d>0: (d+d)/2= d, if d<0, (d+(-d))/2= 0 
    goog_data['closenm'] = ( -d+d.abs() )/2 # if d>0: (-d+d)/= 0, if d<0, ((-d)+(-d))/2= -d (>0)
    
    for n_days in (n_days_20,):
        p_ema = goog_data['closepm'].ewm( com = n_days - 1,
                                          min_periods=0, # default 0
                                          adjust=True,
                                        ).mean()
        n_ema = goog_data['closenm'].ewm( com = n_days - 1,
                                          min_periods=0,
                                          adjust=True,
                                        ).mean()
        rs_column_name = 'rs_{}'.format(n_days)
        rsi_column_name = 'rsi_{}'.format(n_days)
        goog_data['p_ema'] = p_ema
        goog_data['n_ema'] = n_ema
        goog_data[rs_column_name] = rs = p_ema / n_ema
        goog_data[rsi_column_name] = 100 - 100 / (1.0 + rs)   
     
    goog_data=goog_data.drop(['closepm','closenm','close_-1_s', 'close_-1_d'], axis=1)
    # goog_data[['RelativeStrengthAvg_GainOver_20Days',
    #            'p_ema',
    #            'RelativeStrengthAvg_LossOver_20Days',
    #            'n_ema',
    #            'RelativeStrength_IndicatorOver_20Days',
    #            'rsi_20'
    #           ]
    #          ].head(25)
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(15,10) )
    
    ax1 = fig.add_subplot( 311 )
    ax1.plot( goog_data.index.values, close_price, color='k', lw=2., label='ClosePrice' )
    ax1.set_ylabel( 'Google price in $', fontsize=12 )
    ax1.legend()
    
    ax2 = fig.add_subplot( 312 )
    ax2.plot( goog_data.index.values, goog_data['p_ema'], color='g', lw=2., label='p_ema_20day' )
    ax2.plot( goog_data.index.values, goog_data['n_ema'], color='r', lw=2., label='n_ema_20day' )
    ax2.set_ylabel( 'RS', fontsize=12 )
    ax2.legend()
    
    ax3 = fig.add_subplot( 313 )
    ax3.plot( goog_data.index.values, goog_data['rsi_20'], color='b', lw=2., label='rsi_20' )
    ax3.plot( goog_data.index.values, rsi, color='r', lw=2., label='RelativeStrength_IndicatorOver_20Days' )
    ax3.axhline( y=30, lw=2, color='0.7') # Line for oversold threshold
    ax3.axhline( y=50, lw=2, linestyle='--', color='0.8' ) # Neutral RSI
    ax3.axhline( y=70, lw=2, color='0.7') # Line for overbought threshold
    
    ax3.set_ylabel( 'RSI', fontsize=12 )
    ax3.legend()
    
    from matplotlib.dates import DateFormatter
    
    for ax in(ax1, ax2, ax3):
        ax.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
        # or plt.autoscale(enable=True, axis='x', tight=True)
        ax.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
        ax.margins(0,0.05) # move all curves to up
    
        ax.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
        plt.setp( ax.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    plt.subplots_adjust( hspace=0.3 ) # space between axes   
    plt.show()

         Readings below 30 generally indicate that the stock is oversold, while readings above 70 indicate that it is overbought. Traders will often place this RSI chart below the price chart for the security, so they can compare its recent momentum against its market price 

         Some traders will consider it a “buy signal” if a security’s RSI reading moves below 30, based on the idea that the security has been oversold and is therefore poised for a rebound. However, the reliability of this signal will depend in part on the overall context. If the security is caught in a significant downtrend, then it might continue trading at an oversold level for quite some time. Traders in that situation might delay buying until they see other confirmatory signals.

    IF PREVIOUS RSI > 30 AND CURRENT RSI < 30 ==> BUY SIGNAL
    IF PREVIOUS RSI < 70 AND CURRENT RSI > 70 ==> SELL SIGNAL

    andAlthough using sma to get RSI may tell us the correct buy_signal at some point in time, it may also tell us the wrong sell_signal at some point.And using ewma to get RSI is more secure

    goog_data[goog_data['rsi_20']>50].count(axis=0)['rsi_20'] /\
    goog_data[goog_data['rsi_20']<=50].count(axis=0)['rsi_20']

     

    RSI_7 and RSI_14

    n_days_7=7
    n_days_14=14
    
    # # close_-1_d — this is the price difference between time t and t-1
    goog_data['close_-1_s'] = goog_data['Close'].shift(1)
    d = goog_data['close_-1_d'] = goog_data['Close']-goog_data['close_-1_s']
     
    goog_data['closepm'] = ( d+d.abs() )/2  # if d>0: (d+d)/2= d, if d<0, (d+(-d))/2= 0 
    goog_data['closenm'] = ( -d+d.abs() )/2 # if d>0: (-d+d)/= 0, if d<0, ((-d)+(-d))/2= -d (>0)
    
    for n_days in (n_days_7, n_days_14):
        p_ema = goog_data['closepm'].ewm( com = n_days - 1,
                                          min_periods=0, # default 0
                                          adjust=True,
                                        ).mean()
        n_ema = goog_data['closenm'].ewm( com = n_days - 1,
                                          min_periods=0,
                                          adjust=True,
                                        ).mean()
        rs_column_name = 'rs_{}'.format(n_days)
        rsi_column_name = 'rsi_{}'.format(n_days)
        goog_data['p_ema'] = p_ema
        goog_data['n_ema'] = n_ema
        goog_data[rs_column_name] = rs = p_ema / n_ema
        goog_data[rsi_column_name] = 100 - 100 / (1.0 + rs)   
     
    goog_data=goog_data.drop(['close_-1_s', 'close_-1_d', 'closepm', 'closenm'], axis=1)
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(15,10) )
    
    ax1 = fig.add_subplot( 211 )
    ax1.plot( goog_data.index.values, close_price, color='k', lw=2., label='ClosePrice' )
    ax1.set_ylabel( 'Google price in $', fontsize=12 )
    ax1.legend()
    
    ax3 = fig.add_subplot( 212 )
    ax3.plot( goog_data.index.values, goog_data['rsi_7'], color='b', lw=2., label='rsi_7' )
    ax3.plot( goog_data.index.values, goog_data['rsi_14'], color='g', lw=2., label='rsi_14' )
    ax3.axhline( y=30, lw=2, color='0.7') # Line for oversold threshold
    ax3.axhline( y=50, lw=2, linestyle='--', color='0.8' ) # Neutral RSI
    ax3.axhline( y=70, lw=2, color='0.7') # Line for overbought threshold
    
    ax3.set_ylabel( 'RSI', fontsize=12 )
    ax3.legend()
    
    from matplotlib.dates import DateFormatter
    
    for ax in(ax1, ax3):
        ax.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
        # or plt.autoscale(enable=True, axis='x', tight=True)
        ax.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
        ax.margins(0,0.05) # move all curves to up
    
        ax.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
        plt.setp( ax.get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    plt.subplots_adjust( hspace=0.3 ) # space between axes   
    plt.show()

     RSI的变动范围在0—100之间,

    国内单边做多的股市:强弱指标值一般分布在20—80。

    80-100 极强 卖出

    50-80 强 买入

    20-50 弱 观望

    0-20 极弱 买入

    国内期货/国际伦敦金/外汇等双向交易市场:强弱指标值一般分布在30-70.

    70-100 超买区 做空

    30-70 观望慎入区

    0-30 超卖区 做多

    Standard deviation

         Standard deviation, which will be referred to as STDEV, is a basic measure of price volatility that is used in combination with a lot of other technical analysis indicators to improve them. We'll explore that in greater detail in this section.

         Standard deviation is a standard measure that is computed by measuring the squared deviation of individual prices from the mean price, and then finding the average of all those squared deviation values. This value is known as variance, and the standard deviation is obtained by taking the square root of the variance. Larger STDEVs are

    • a mark of more volatile markets or
    • larger expected price moves,
    • so trading strategies need to factor that increased volatility into risk estimates and other trading behavior.

    To compute standard deviation, first we compute the variance:
    Then, standard deviation is simply the square root of the variance:

    SMA : Simple moving average over n time periods.

    Implementing standard derivatives 

         Let's have a look at the following code, which demonstrates the implementation of standard derivatives.

         We are going to import the statistics and the math library we need to perform basic mathematical operations. We are defining the loopback period with the variable time_period , and we will store the past prices in the list history, while we will store the SMA and the standard deviation in sma_values and stddev_values . In the code, we calculate the variance, and then we calculate the standard deviation. To finish, we append to the goog_data data frame that we will use to display the chart:

    import yfinance as yf
    import pandas as pd
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data2.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        goog_data2 = yf.download( 'goog', start=start_date, end=end_date) 
        goog_data2.to_pickle( SRC_DATA_FILENAME )
        
    goog_data=goog_data2.tail(620)

    import statistics as stats
    import math as math
    import matplotlib.ticker as ticker
    from matplotlib.dates import DateFormatter
    
    close = goog_data['Close']
    time_period = 20 # look back period
    
    history = []    # history of prices
    sma_values = [] # to track moving average values for visualization purposes
    stddev_values = [] # history of computed stddev values
    
    for close_price in close:
        history.append( close_price )
        if len(history) >time_period: # we track at most ' time_period' number of prices
            del (history[0])
        
        sma = stats.mean(history)
        sma_values.append( sma )
        
        variance = 0 # variance is square of standard deviation
        for hist_price in history:
            variance += ( (hist_price-sma)**2 )
            
        stddev = math.sqrt( variance/len(history) )
        stddev_values.append( stddev )
        
    goog_data = goog_data.assign( ClosePrice = pd.Series( close,
                                                          index=goog_data.index
                                                        )
                                )
    goog_data = goog_data.assign( StandardDeviationOver_20Days = pd.Series( stddev_values,
                                                                            index=goog_data.index
                                                                          )
                                )
    close_price = goog_data['ClosePrice']
    stddev = goog_data['StandardDeviationOver_20Days']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(10,6) )
    
    ax1 = fig.add_subplot( 211 )
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.set_ylabel('Google price in $', fontsize=12)
    ax1.legend()
    
    ax2 = fig.add_subplot( 212 )
    ax2.plot( goog_data.index.values, stddev, color='b', lw=2., label='StandardDeviationOver_20Days' )
    ax2.axhline( y=stddev.mean(), color='k', ls='--' )
    ax2.set_xlabel('Date')
    ax2.set_ylabel('Stddev in $')
    ax2.legend()
    
    for ax in (ax1, ax2):
        ax.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
        # or plt.autoscale(enable=True, axis='x', tight=True)
        ax.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
        ax.margins(0,0.05) # move all curves to up
    
        ax.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
        plt.setp( ax.get_xticklabels(), rotation=30, horizontalalignment='right' )
        
    plt.subplots_adjust( hspace=0.3 )    
    plt.show()

         From the output, it seems like volatility measure(Standard deviation (STDEV)) ranges from somewhere between $8 over 20 days to $40 over 20 days, with $15 over 20 days being the average

     

         Here, the standard deviation quantifies the volatility in the price moves during the last 20 days. Volatility spikes when the Google stock prices spike up飙升 or spike down下跌 or go through large changes over the last 20 days. We will revisit the standard deviation as an important volatility measure in later chapters.

    use pandas' rolling().std() to get the volatility

    time_period = 20 # look back period
    
    
    goog_data['std_20']= ( goog_data['Close'] ).rolling( window=time_period,
                                                         min_periods=1,
                                                       ).std()                  
    goog_data.head(25)

          After using rolling() to shift backward by one line, the first value of std_20 is NaN (missing), which also leads to the following deviation with StandardDeviationOver_20Days

    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(10,6) )
    
    ax1 = fig.add_subplot( 211 )
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.set_ylabel('Google price in $', fontsize=12)
    ax1.legend()
    
    ax2 = fig.add_subplot( 212 )       ###         
    ax2.plot( goog_data.index.values, goog_data['std_20'], color='b', lw=2., label='std_20days_volatility' )
    ax2.set_xlabel('Date')
    ax2.set_ylabel('Stddev in $')
    ax2.legend()
    
    for ax in (ax1, ax2):
        ax.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
        # or plt.autoscale(enable=True, axis='x', tight=True)
        ax.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
        ax.margins(0,0.05) # move all curves to up
    
        ax.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
        plt.setp( ax.get_xticklabels(), rotation=30, horizontalalignment='right' )
        
    plt.subplots_adjust( hspace=0.3 )    
    plt.show()

     

    Momentum

         Momentum, also referred to as MOM, is an important measure of speed and magnitude of price moves. This is often a key indicator of trend/breakout-based trading algorithms.

         In its simplest form, momentum is simply the difference between the current price and price of some fixed time periods in the past. Consecutive periods of positive momentum values indicate an uptrend; conversely, if momentum is consecutively negative, that indicates a downtrend. Often, we use simple/exponential moving averages of the MOM indicator, as shown here, to detect sustained trends:

    Here, the following applies:
    : Price at time t
    : Price n time periods before time t

    import yfinance as yf
    import pandas as pd
    
    start_date = '2014-01-01'
    end_date = '2018-01-01'
    SRC_DATA_FILENAME = 'goog_data2.pkl'
    
    try:
        goog_data2 = pd.read_pickle( SRC_DATA_FILENAME )
        print( 'File found...reading GOOG data')
    except:
        print( 'File not found...downloading GOOG data')
        goog_data2 = yf.download( 'goog', start=start_date, end=end_date) 
        goog_data2.to_pickle( SRC_DATA_FILENAME )
        
    goog_data=goog_data2.tail(620)

     

    Implementation of momentum 

    Now, let's have a look at the code that demonstrates the implementation of momentum:

    time_period = 20 # how far to look back to find reference price to compute momentum
    history = []     # history of observed prices to use in momentum calculation
    mom_values = []  # track momentum values for visualization purposes
    
    for close_price in close:
        history.append( close_price )
        if len(history) > time_period: # history is at most 'time_period' number of observations
            del (history[0])
        mom = close_price - history[0]
        mom_values.append( mom )

         This maintains a list history of past prices and, at each new observation, computes the momentum to be the difference between the current price and the price time_period days ago, which, in this case, is 20 days:

    goog_data = goog_data.assign( ClosePrice=pd.Series( close,
                                                        index=goog_data.index
                                                      )
                                )
    goog_data = goog_data.assign( MomentumFromPrice_20DaysAgo=pd.Series( mom_values,
                                                                         index = goog_data.index
                                                                       )
                                )
    close_price = goog_data['ClosePrice']
    mom = goog_data['MomentumFromPrice_20DaysAgo']
    
    import matplotlib.pyplot as plt
    
    fig = plt.figure( figsize=(12,6))
    
    ax1 = fig.add_subplot( 211 )
    ax1.set_ylabel('Google price in $')
    ax1.plot( goog_data.index.values, close_price, color='g', lw=2., label='ClosePrice' )
    ax1.legend()
    
    ax2 = fig.add_subplot( 212 )
    ax2.set_ylabel('Momentum in $')
    ax2.plot( goog_data.index.values, mom, color='b', lw=2., label='MomentumFromPrice_20DaysAgo')
    ax2.legend()
    
    for ax in (ax1, ax2):
        ax.xaxis.set_major_locator(ticker.MaxNLocator(12)) # 24%12=0: we need 10 xticklabels and 12 is close to 10
        # or plt.autoscale(enable=True, axis='x', tight=True)
        ax.autoscale(enable=True, axis='x', tight=True) # move all curves to left(touch y-axis)
        ax.margins(0,0.05) # move all curves to up
    
        ax.xaxis.set_major_formatter( DateFormatter('%Y-%m') ) # 2015-08-30 ==> 2015-08
        plt.setp( ax.get_xticklabels(), rotation=30, horizontalalignment='right' )
        
    plt.subplots_adjust( hspace=0.3 )    
    plt.show()

     The plot for momentum shows us the following:

    • Momentum values peak when the stock price changes by a large amount as compared to the price 20 days ago.
    • Here, most momentum values are positive, mainly because, as we discussed in the previous section, Google stock has been increasing in value over the course of its lifetime and has large upward momentum values from time to time.
    • During the brief periods where the stock prices drop in value, we can observe negative momentum values.

         In this section, we learned how to create trading signals based on technical analysis. In the next section, we will learn how to implement advanced concepts, such as seasonality, in trading instruments. 

    Implementing advanced concepts, such as seasonality, in trading instruments

         In trading, the price we receive is a collection of data points at constant time intervals called time series. They are time dependent and can have increasing or decreasing trends and seasonality trends, in other words, variations specific to a particular time frame. Like any other retail products, financial products follow trends and seasonality during different seasons. There are multiple seasonality effects: weekend, monthly, and holidays.

    In this section, we will use the GOOG data from 2001 to 2018 to study price variations
    based on the months.

    1. We will write the code to re-group the data by months, calculate and return the monthly returns, and then compare these returns in a histogram. We will observe that GOOG has a higher return in October:
      import yfinance as yf
      import pandas as pd
      import matplotlib.pyplot as plt
      
      start_date = '2001-01-01'
      end_date = '2018-01-01'
      SRC_DATA_FILENAME = 'goog_data_large.pkl'
      
      try:
          goog_data = pd.read_pickle( SRC_DATA_FILENAME )
          print( 'File found...reading GOOG data')
      except:
          print( 'File not found...downloading GOOG data')
          goog_data = yf.download( 'goog', start=start_date, end=end_date) 
          goog_data.to_pickle( SRC_DATA_FILENAME )

      goog_monthly_return = goog_data['Adj Close'].pct_change().groupby([
                              goog_data['Adj Close'].index.year,
                              goog_data['Adj Close'].index.month,
                            ]).mean()
      goog_monthly_return

       

      goog_monthly_return_list = []
      for ym_idx in range( len(goog_monthly_return) ):
                                            # goog_monthly_return.index[ym_idx]:  (2004, 8) or (2004, 9) or ....
          goog_monthly_return_list.append( {'month':goog_monthly_return.index[ym_idx][1],
                                            'monthly_return':goog_monthly_return[goog_monthly_return.index[ym_idx]]
                                           }
                                         )
      goog_monthly_return_list = pd.DataFrame( goog_monthly_return_list,
                                               columns=('month','monthly_return')
                                             )
      goog_monthly_return_list

       

      goog_monthly_return_list.boxplot( column=['monthly_return'],
                                        by='month', # Column in the DataFrame to pandas.DataFrame.groupby()
                                        figsize=(10,5),
                                        fontsize=12,
                                      )
      ax = plt.gca()
      labels = [ item.get_text() 
                 for item in ax.get_xticklabels()
               ]
      labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun','Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
      ax.set_xticklabels( labels )
      ax.set_ylabel('GOOG return')
      ax.set_title('GOOG Monthly return 2001-2018')
      
      plt.suptitle("")
      
      plt.show()

           The preceding code will return the following output. The following screenshot represents the GOOG monthly return:

           In this screenshot, we observe that there are repetitive patterns(For example, in September, October, and December, the average return of more than 25% (Q1>0) of the year is positive, and the median of the average return in October is the highest). The month of October is the month when the return seems to be the highest(see median value in the box), unlike November, where we observe a drop in the return.
      #################

      goog_y_m_return_list = []
      for ym_idx in range( len(goog_monthly_return) ):
                                            # goog_monthly_return.index[ym_idx]:  (2004, 8) or (2004, 9) or ....
          goog_y_m_return_list.append( { 'year':goog_monthly_return.index[ym_idx][0],
                                         'month':goog_monthly_return.index[ym_idx][1],
                                         'monthly_return':goog_monthly_return[goog_monthly_return.index[ym_idx]]
                                       }
                                         )
      goog_y_m_return_list = pd.DataFrame( goog_y_m_return_list,
                                           columns=('year','month','monthly_return')
                                         )
      goog_y_m_return_list[:17]

      plt.figure( figsize=(10,10) )
      
      import seaborn as sns
      sns.barplot( x='month', y='monthly_return',hue='year',
                   linewidth=1, edgecolor='w',
                   data=goog_y_m_return_list[5:]
                 )
      plt.show()

           It can be seen that from 2005 to 2017, the average return of stocks was basically positive for a few months, while the average return for those months was basically negative.
      #################

    2. Since it is a time series, we will study the stationary (mean, variance remain constant over time). In the following code, we will check this property because the following time series models work on the assumption that time series are stationary:
      Constant mean
      Constant variance
      Time-independent autocovariance

      # Displaying rolling statistics
      def plot_rolling_statistics_ts( ts, titletext, ytext, window_size=12 ):
          ts.plot( color='red', label='Original', lw=0.5 )
          ts.rolling( window_size ).mean().plot( color='blue', label='Rolling Mean' )
          ts.rolling( window_size ).std().plot( color='black', label='Rolling Std' )
          
          plt.legend( loc='best' )
          plt.ylabel( ytext )
          plt.xlabel( 'Date')
          plt.setp( plt.gca().get_xticklabels(), rotation=30, horizontalalignment='right' )
          plt.title( titletext )
          plt.show( block=False )
      plot_rolling_statistics_ts( goog_monthly_return[1:], 
                                  'GOOG prices rolling mean and standard deviation',
                                  'Monthly return'
                                )
      
      plot_rolling_statistics_ts( goog_data['Adj Close'], 
                                 ' GOOG prices rolling mean and standard deviation',
                                 'Daily prices',
                                 365
                                )

           The preceding code will return the following two charts, where we will compare the difference using two different time series.
      * One shows the GOOG daily prices, and the other one shows the GOOG monthly return.
      * We observe that the rolling average and rolling standard deviation are not constant when using the daily prices instead of using the daily return( The daily return measures the dollar change in a stock’s price as a percentage of the previous day’s closing price. A positive return means the stock has grown in value, while a negative return means it has lost value.A stock with lower positive and negative daily returns is typically less risky than a stock with higher daily returns, which create larger swings in value.).

      #                          the daily historical log returns
      plot_rolling_statistics_ts( np.log(goog_data['Adj Close']/goog_data['Adj Close'].shift(1) ), 
                                 ' GOOG prices rolling mean and standard deviation',
                                 'Daily prices',
                                )

      #                          the daily historical returns
      plot_rolling_statistics_ts( goog_data['Adj Close']/goog_data['Adj Close'].shift(1), 
                                 ' GOOG prices rolling mean and standard deviation',
                                 'Daily prices',
                                )


      * This means that the first time series representing the daily prices is not stationary. Therefore, we will need to make this time series stationary.
      * The non-stationary for a time series can generally be attributed to two factors: trend and seasonality.

      The following plot shows GOOG daily prices

      When observing the plot of the GOOG daily prices, the following can be stated:
         We can see that the price is growing over time; this is a trend.
         The wave effect we are observing on the GOOG daily prices comes from seasonality(see previous boxplot).
         When we make a time series stationary, we remove the trend and seasonality by modeling and removing them from the initial data.
         Once we find a model predicting future values for the data without seasonality and trend, we can apply back the seasonality and trend values to get the actual forecasted data.

           The following plot shows the GOOG monthly return:
           

         For the data using the GOOG daily prices, we can just remove the trend by subtracting the moving average from the daily prices in order to obtain the following screenshot:

    plot_rolling_statistics_ts( goog_data['Adj Close']-goog_data['Adj Close'].rolling(365).mean(), 
                               'GOOG daily price without trend',
                               'Daily prices',
                               365
                              )

     

    • We can now observe the trend disappeared.
    • Additionally, we also want to remove seasonality; for that, we can apply differentiation.
    • For the differentiation, we will calculate the difference between two consecutive days; we will then use the difference as data points.
       

         We recommend that you read a book on time series to go deeper in an analysis of the same: Practical Time Series Analysis: Master Time Series Data Processing, Visualization, and Modeling Using Python, Packt edition

    3 .To confirm our observation, in the code, we use the popular statistical test: the augmented Dickey-Fuller test:

    • This determines the presence of a unit root in time series.
    • If a unit root is present, the time series is not stationary.
    • The null hypothesis of this test is that the series has a unit root.
    • If we reject the null hypothesis, this means that we don't find a unit root.
    • If we fail to reject the null hypothesis, we can say that the time series is non-stationary:
    conda install -c conda-forge statsmodels

     

    statsmodels.tsa.stattools.adfuller(xmaxlag=Noneregression='c'autolag='AIC'store=Falseregresults=False)https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html

    Augmented Dickey-Fuller unit root test.

    The Augmented Dickey-Fuller test can be used to test for a unit root in a univariate process in the presence of serial correlation.

    Returns

    • adf  float
      The test statistic.
    • pvalue float
      MacKinnon’s approximate p-value based on MacKinnon (1994, 2010).
    • usedlag int
      The number of lags used.
    • nobs int
      The number of observations used for the ADF regression and calculation of the critical values.
    • critical values dict
      Critical values for the test statistic at the 1 %, 5 %, and 10 % levels. Based on MacKinnon (2010).
    • icbest  float
      The maximized information criterion if autolag is not None.
    • resstore ResultStoreoptional
      A dummy class with results attached as attributes.

    Parameters

    autolag  {“AIC”, “BIC”, “t-stat”, None}

             Method to use when automatically determining the lag length among the values 0, 1, …, maxlag.

    If “AIC” (default,  Akaike information criterion Akaike information criterion ) or “BIC”( Bayesian information criterionBayesian information criterion), then the number of lags滞后 is chosen to minimize the corresponding information criterion.
    https://blog.csdn.net/Linli522362242/article/details/105973507

    • n is the number of instances,  the number of data points in X, the number of observations, or equivalently, the sample size;
    • k is the number of parameters learned by the model,  the number of parameters estimated by the model. For example, in multiple linear regression, the estimated parameters are the intercept, the  slope parameters, and the constant variance of the errors; thus ,
    •  is the maximized value of the likelihood function of the model M. i.e. , where ​ are the parameter values that maximize the likelihood function,  = the observed data;
      ​Figure 9-20. A model’s parametric function (top left), and some derived functions: a PDF(lower left), a likelihood function (top right), and a log likelihood function (lower right)

           To estimate the probability distribution of a future outcome x, you need to set the model parameter θ. For example, if you set θ to 1.3 (the horizontal line), you get the probability density function f(x; θ=1.3) shown in the lower-left plot. Say you want to estimate the probability that x will fall between –2 and +2. You must calculate the integral of the PDF on this range (i.e., the surface of the shaded region).

           But what if you don’t know θ, and instead if you have observed a single instance x=2.5 (the vertical line in the upper-left plot)? In this case, you get the likelihood function ℒ(θ|x=2.5)=f(x=2.5; θ), represented in the upper-right plot.https://blog.csdn.net/Linli522362242/article/details/96480059

           In short, the PDF is a function of x (with θ fixed), while the likelihood function is a function of θ (with x fixed). It is important to understand that the likelihood function is not a probability distribution: if you integrate a probability distribution over all possible values of x, you always get 1; but if you integrate the likelihood function over all possible values of θ, the result can be any positive value.

           Given a dataset X, a common task is to try to estimate the most likely values for the model parameters. To do this, you must find the values that maximize the likelihood function, given X. In this example, if you have observed a single instance x=2.5, the maximum likelihood estimate (MLE) of θ is ​ . If a prior probability distribution g over θ exists, it is possible to take it into account by maximizing ℒ(θ|x)g(θ) rather than just maximizing ℒ(θ|x). This is called maximum a-posteriori (MAP) estimation. Since MAP constrains the parameter values, you can think of it as a regularized version of MLE.
    • AIC和BIC主要用于模型的选择,AIC、BIC越小越好。
      在对不同模型进行比较时,AIC、BIC降低越多,说明该模型的拟合效果越好;选择最优模型的指导思想是从两个方面去考察:一个是似然函数最大化,另一个是模型中的未知参数个数最小化。似然函数值越大说明模型拟合的效果越好,但是我们不能单纯地以拟合精度来衡量模型的优劣,这样回导致模型中未知参数k越来越多,模型变得越来越复杂,会造成过拟合。所以一个好的模型应该是拟合精度和未知参数个数的综合最优化配置。
    • 当两个模型之间存在较大差异时,差异主要体现在似然函数项,当似然函数差异不显著时,上式第一项,即模型复杂度则起作用,从而参数个数少的模型是较好的选择。

      AIC: 一般而言,当模型复杂度提高(k增大)时,似然函数也会增大,从而使AIC变小,但是k过大时,似然函数增速减缓,导致AIC增大,模型过于复杂容易造成过拟合现象。目标是选取AIC最小的模型,AIC不仅要提高模型拟合度(极大似然),而且引入了惩罚项,使模型参数尽可能少,有助于降低过拟合的可能性。

      AIC和BIC均引入了与模型参数个数相关的惩罚项,BIC的惩罚项比AIC的大,考虑了样本数量,样本数量过多时,可有效防止模型精度过高造成的模型复杂度过高(kln(n)惩罚项在维数过大且训练样本数据相对较少的情况下,可以有效避免出现维度灾难现象。)

           Both BIC and AIC penalize models that have more parameters to learn (e.g., more clusters), and reward models thBoth the BIC and the AIC penalize models that have more parameters to learn (e.g., more clusters) and reward models that fit the data well. They often end up selecting the same model. When they differ, the model selected by the BIC tends to be simpler(fewer parameters, 考虑了样本数量,样本数量过多时,可有效防止模型精度过高造成的模型复杂度过高) than the one selected by the AIC, but tends to not fit the data quite as well (this is especially true for larger datasets).

     “t-stat” based choice of maxlag. Starts with maxlag and drops a lag until the t-statistic on the last lag length is significant using a 5%-sized test. https://blog.csdn.net/Linli522362242/article/details/91037961

    • autolag If None, then the number of included lags is set to maxlag.

    from statsmodels.tsa.stattools import adfuller
    
    def test_stationarity( timeseries ):
        print( "Results of Dickey-Fuller Test:" )
        df_test = adfuller( timeseries[1:], autolag='AIC' )
        print(df_test)
        df_output = pd.Series( df_test[0:4], index=['Test Statistic',
                                                    'p-value',
                                                    "#Lags Used",
                                                    "Number of Observations Used"
                                                   ]
                             )
        print( df_output )
    test_stationarity( goog_data['Adj Close'])
    • This determines the presence of a unit root in time series.
    • If a unit root is present, the time series is not stationary.
    • The null hypothesis of this test is that the series has a unit root.
    • If we reject the null hypothesis, this means that we don't find a unit root.
    • If we fail to reject the null hypothesis, we can say that the time series is non-stationary:

      This test returns a p-value of 0. 996 .Therefore, the time series is not stationary. 

    4. Let's have a look at the test:

    test_stationarity( goog_monthly_return[1:] )

         当p-值足够小时,即小于置信水平 ( assuming that the null hypothesis is true, the probability of the test statistic Z in the Rejection Region)时,我们可以拒绝零假设。 

      
         This test returns a p-value of less than 0.05 . Therefore, we cannot say that the time series is not stationary. We recommend using daily returns when studying financial products. In the example of stationary, we could observe that no transformation is needed.

    test_stationarity( np.log(goog_data['Adj Close']/goog_data['Adj Close'].shift(1)) )

    5. The last step of the time series analysis is to forecast the time series. We have two possible scenarios: 

    • A strictly stationary series without dependencies among values. We can use a regular linear regression to forecast values.
    • A series with dependencies among values. We will be forced to use other statistical models. In this chapter, we chose to focus on using the Auto-Regression Integrated Moving Averages (ARIMA) model. This model has three parameters:
      • Autoregressive (AR) term (p)—lags of dependent variables. Example for 3, the predictors for x(t) is x(t-1) + x(t-2) + x(t-3).
      • Moving average (MA) term (q)—lags for errors in prediction. Example for 3, the predictor for x(t) is e(t-1) + e(t-2) + e(t-3), where e(i) is the difference between the moving average value and the actual value.
      • Differentiation (d)— This is the d number of occasions where we apply differentiation between values, as was explained when we studied the GOOG daily price. If d=1, we proceed with the difference between two consecutive values.

         The parameter values for AR(p) and MA(q) can be respectively found by using the autocorrelation function (ACF) and the partial偏 autocorrelation function (PACF)

    from statsmodels.graphics.tsaplots import plot_acf
    from statsmodels.graphics.tsaplots import plot_pacf
    import matplotlib.pyplot as plt
    from matplotlib import pyplot
    
    plt.figure()
    
    plt.subplot(211)
    plot_acf( goog_monthly_return[1:], ax=pyplot.gca(), lags=10 )
    # plt.yticks([0,0.25,0.5,0.75,1])
    plt.autoscale(enable=True, axis='y', tight=True)
    
    plt.subplot(212)
    plot_pacf( goog_monthly_return[1:], ax=pyplot.gca(), lags=10 )
    plt.autoscale(enable=True, axis='y', tight=True)
    
    plt.subplots_adjust( hspace=0.5 )
    plt.show()

    https://www.statsmodels.org/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html?highlight=acfPlot the autocorrelation function

    Plots lags on the horizontal and the correlations on vertical axis.


         When we observe the two preceding diagrams, we can draw the confidence interval on either side of 0. We will use this confidence interval to determine the parameter values for the AR(p) and MA(q).

    • q: The lag value is q=1 when the ACF plot crosses the upper confidence interval for the first time.
    • p: The lag value is p=1 when the PACF chart crosses the upper confidence interval for the first time.

    6. These two graphs suggest using q=1 and p=1. We will apply the ARIMA model in the following code:Chapter 8 ARIMA models | Forecasting: Principles and Practice (2nd ed)

    https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMA.html

    endog array_likeoptional

    The observed time-series process y.

    exog array_likeoptional

    Array of exogenous regressors.

    order tupleoptional

         The (p,d,q) order of the model for the autoregressive, differences, and moving average components. d is always an integer, while p and q may either be integers or lists of integers.

    from statsmodels.tsa.arima.model import ARIMA
    
    model = ARIMA( goog_monthly_return[1:], order=(2,0,2) )
    
    fitted_results = model.fit()
    goog_monthly_return[1:].plot()
    fitted_results.fittedvalues.plot( color='red' )
    
    plt.setp( plt.gca().get_xticklabels(), rotation=30, horizontalalignment='right' )
    
    plt.show()

     

    Summary

         In this chapter, we explored concepts of generating trading signals, such as support and resistance, based on the intuitive ideas of supply and demand that are fundamental forces
    that drive market prices. We also briefly explored how you might use support and resistance to implement a simple trading strategy. Then, we looked into a variety of technical analysis indicators, explained the intuition behind them, and implemented and visualized their behavior during different price movements. We also introduced and implemented the ideas behind advanced mathematical approaches, such as Autoregressive (AR), Moving Average (MA), Differentiation (D), AutoCorrelation Function (ACF), and Partial Autocorrelation Function (PACF) for dealing with non-stationary time series datasets. Finally, we briefly introduced an advanced concept such as seasonality, which explains how there are repeating patterns in financial datasets, basic time series analysis and concepts of stationary or non-stationary time series, and how you may model financial data that displays that behavior.

         In the next chapter, we will review and implement some simple regression and classification methods and understand the advantages of applying supervised statistical learning methods to trading.

    展开全文
  • 1000+常用Python库一览

    2021-03-27 00:13:22
    statsmodels,Python的统计建模和计量经济学。 astropy,天文学界的Python库。 orange,橙色,数据挖掘,数据可视化,通过可视化编程或Python脚本学习机分析。 RDKit,化学信息学和机器学习的软件。 Open Babel,巴贝尔...
  • 弥补了 NumPy 和 SciPy 库的缺陷,能够执行统计检验和假设检验。 提供了 R-style 公式的实现,以便更好地进行统计分析。统计人员可以沿用 R 语言。 由于它能够广泛地支持统计计算,因此通常可用于实现广义线性模型...
  • # 导入第三方模块 from statsmodels import api as sms # 为自变量X添加常数列1,用于拟合截距项 X_train2 = sms.add_constant(X_train) X_test2 = sms.add_constant(X_test) # 构建多元线性回归模型 linear = sms....
  • 使用R的话就更加简单 plot(ecdf(data)) 在Python中则要引用一些辅助的包: from statsmodels.distributions.empirical_distribution import ECDF import matplotlib.pyplot as plt ecdf = ECDF(data) plt.plot(ecdf....
  • statsmodels,Python的统计建模和计量经济学。 astropy,天文学界的Python库。 orange,橙色,数据挖掘,数据可视化,通过可视化编程或Python脚本学习机分析。 RDKit,化学信息学和机器学习的软件。 Open Babel,巴贝尔...
  • Tensorflow(1)Tensorflow的安装与配置(2)核心高阶API _ tf.keras1.机器学习原理-线性回归 (1)Tensorflow的安装与配置 点击前往Tensorflow的安装与...(2)核心高阶API _ tf.keras 1.机器学习原理-线性回归 ...
  • 由于Anaconda的软件源设计缺陷,其缺少正常发行版软件源所包含的签名校验功能,任何非官方网站提供的软件包都有可能被篡改过,产生安全隐患。[1] 另根据Anaconda软件源上的说明,Anaconda和Miniconda是Anaconda, In...
  • 时间序列分析

    万次阅读 多人点赞 2017-03-22 17:04:51
    from statsmodels.tsa.stattools import adfuller def test_stationarity(timeseries): #Determing rolling statistics rolmean = pd.rolling_mean(timeseries, window=12) rolstd = pd.rolling_std(timeseries...
  • Python金融大数据分析-正态性检验

    万次阅读 2016-12-14 21:00:12
    1.话题引入 我们在线性回归做假设检验,在时间序列分析做自回归检验,那么我们如何检验一个分布是否是正态分布的呢? 首先,我们定义一个用来生成价格路径的函数。...import statsmodels.api as sm
  • 我们也不太可能把所有的random_state遍历一遍,而交叉验证法正好弥补了这个缺陷,它的工作原理导致它要对多次拆分进行评分再取平均值,这样就不会出现我们前面所说的问题了. 文章引自 : 《深入浅出python机器学习》
  • 算法模型---时间序列模型

    万次阅读 多人点赞 2018-01-16 09:04:58
    1}+\frac{x_t -x_{t-k}}{k} st​=k1​n=0∑k−1​xt−n​=kxt​+xt−1​+…+xt−k+1​​=st−1​+kxt​−xt−k​​ 这样的方法存在明显的缺陷,当k比较小时,预测的数据平滑效果不明显,而且突出反映了数据最近的变化;...
  • 多元线性回归模型的基本假设之一就是模型的随机干扰项相互独立或不相关。如果模型的随机感染项违背了相互独立的基本假设,则称为存在序列相关性(自相关性)。...import statsmodels.api as sm import s
  • Ljung-Box检验:Box-Ljung检验在小样本量下不太精确,Ljung-Box检验弥补了这一缺陷。同Box-Ljung检验,p值大于显著性水平如0.05,不能拒绝原假设,序列为白噪声序列。 详细白噪声检验方法可以看下“白噪声检验”。 ...
  • 3.1.1自相关分析 在Python中,可使用statsmodels.graphics.tasplots模块下的plot_acf函数来分析时间序列的自相关性。 表 plot_acf函数定义及参数说明 函数定义 plot_acf(x,ax=None,lags=None,alpha=0.5,use_vlines=...
  • 收入时间序列——之模型探索篇

    千次阅读 2019-01-08 09:41:37
    import statsmodels.tsa.stattools as st order = st.arma_order_select_ic(income_data,max_ar=10,max_ma=10,ic=['aic', 'bic', 'hqic']) print(order.aic_min_order) print(order) 结果跑了半个多钟头,虽然...
  • 【数据分析】双因素方差分析

    千次阅读 2020-03-15 18:49:41
    end 运行结果: Python求解 import pandas as pd from statsmodels.formula.api import ols from statsmodels.stats.anova import anova_lm df = pd.read_csv('D:\Data\ex_2way_annova.csv') print(df) model = ols...
  • 不仅有强大的内置库,还有各种各样的第三方库(伸手党的福利 :p),如 视觉相关:OpenCV、Face Recognition、EasyOCR、Open3D、kornia、moviepy 人工智能:pytorch、tensorflow、xgboost、gym、statsmodels web相关...
  • 摘要 上市公司作为我国企业中的特殊群体和证券市场的基石,其经营质量和经营业绩的优劣直接影响证券市场的建设和发展,进而间接影响着我国国民经济体系的稳定和健康发展。随着我国经济的快速发展,财务风险对上市...
  • 摘要缺陷检测和缺陷预测器:高效、稳定、性能良好; 4)AutoML:自动超参数优化和模型选择; 5)实用的异常检测器后处理规则:使异常分数更可解释,同时也降低了假阳性率(FP); 6)易使用的集成模型:组合多个模型...
  •  模型预测 import statsmodels.api as sm # 引入线性回归模型评估相关库 X2 = sm.add_constant(X) est = sm.OLS(Y, X2).fit() print(est.summary()) ​ 通过上图可以看出,由于数据量较少的原因,R-squared的值只有...
  • ols最小二乘回归

    2021-08-15 07:16:16
    : , 1: ]#可以进行哑变量的转换,缺陷是列名没改,prefix可以解决这个缺陷(会根据下标自动命名)。注意要删除一列,除了null那列不能删 data_drop = data_raw.drop("edu_class", axis = 1)#删除edu_class列原始数据后...
  • 而Attention的特点刚好可以弥补这一缺陷,它是一种模仿人类注意力的网络架构,可以同时聚焦多个细节部分。这样可以使得框架预测的结果更加全面、准确。 四、未来发展   新算法将用于预测任何已测序生物的结构化...
  • 使用以下代码导入所有需要的库: import pandas as pd import numpy as np import yfinance as yf import seaborn as sns import scipy.stats as scs import statsmodels.api as sm import statsmodels.tsa.api as ...
  • 估计系数联合检验(Wald tests via statsmodels) 13. Python 与 STATA 时间日期转换问题 14. Python执行面板数据的Hausman检验 15. Fama Macbeth回归、滚动回归等等 16. 事件研究法 event study 生 创 创 如 注 K ...

空空如也

空空如也

1 2 3 4 5 ... 13
收藏数 255
精华内容 102
关键字:

statsmodels缺陷