• cnn lstm预测 背景：LSTM与CNN (Background: LSTMs vs. CNNs) An LSTM (long-short term memory network) is a type of recurrent neural network that allows for the accounting of sequential dependencies in a ...
cnn lstm预测 背景：LSTM与CNN (Background: LSTMs vs. CNNs)
An LSTM (long-short term memory network) is a type of recurrent neural network that allows for the accounting of sequential dependencies in a time series. LSTM(长期短期记忆网络)是一种递归神经网络，可以考虑时间序列中的顺序依存关系。
Given that correlations exist between observations in a given time series (a phenomenon known as autocorrelation), a standard neural network would treat all observations as independent, which is erroneous and would generate misleading results. 假设给定时间序列中的观测值之间存在相关性(一种称为自相关的现象)，则标准神经网络会将所有观测值视为独立的，这是错误的，并且会产生误导性的结果。
A convolutional neural network is one that applies a process known as convolution in determining the relationships between two functions. e.g. given two functions f and g, the convolution integal expresses how the shape of one function is modified by the other. Such networks are traditionally used for image classification, and do not account for sequential dependencies in the way that a recurrent neural network is able to do. 卷积神经网络是在确定两个函数之间的关系时应用称为卷积的过程的网络。 例如给定的两种功能FA ND g，则卷积integal表示如何一个函数的形状是由其他改性。 传统上，此类网络用于图像分类，并且不像递归神经网络能够做到的那样考虑顺序依赖性。
However, the main advantage of CNNs that make them suited to forecasting time series is that of dilated convolutions - or the ability to use filters to compute dilations between each cell. That is to say, the size of the space between each cell, which in turn allows the neural network to better understand the relationships between the different observations in the time series. 但是，使CNN适于预测时间序列的主要优点是膨胀卷积的优点   -或使用过滤器计算每个像元之间的膨胀的能力。 也就是说，每个单元格之间的空间大小可以使神经网络更好地理解时间序列中不同观测值之间的关系。
For this reason, LSTM and CNN layers are often combined when forecasting a time series. This allows for the LSTM layer to account for sequential dependencies in the time series, while the CNN layer further informs this process through the use of dilated convolutions. 因此，在预测时间序列时，通常会合并LSTM和CNN层。 这允许LSTM层考虑时间序列中的顺序依赖性，而CNN层则通过使用膨胀卷积进一步通知此过程。
With that being said, standalone CNNs are increasingly being used for time series forecasting, and the combination of several Conv1D layers can actually produce quite impressive results — rivalling that of a model which uses both CNN and LSTM layers. 话虽这么说，独立的CNN越来越多地用于时间序列预测，并且多个Conv1D层的组合实际上可以产生令人印象深刻的结果-与使用CNN和LSTM层的模型相媲美。
How is this possible? Let’s find out! 这怎么可能？ 让我们找出答案！
The below example was designed using a CNN template from the Intro to TensorFlow for Deep Learning course from Udacity — this particular topic is found in Lesson 8: Time Series Forecasting by Aurélien Géron. 下面的示例是使用Udacity的Intro to TensorFlow深度学习课程中的CNN模板设计的—该特殊主题可以在AurélienGéron的第8课：时间序列预测中找到。
我们的时间序列问题 (Our Time Series Problem)
The below analysis is based on data from Antonio, Almeida and Nunes (2019): Hotel booking demand datasets. 以下分析基于Antonio，Almeida和Nunes(2019)的数据：酒店预订需求数据集 。
Imagine this scenario. A hotel is having difficulty in forecasting hotel booking cancellations on a day-to-day basis. This is leading to difficulty in forecasting revenues and also in the efficient allocation of hotel rooms. 想象一下这种情况。 旅馆每天都难以预测旅馆预订的取消情况。 这导致难以预测收入以及有效分配酒店房间。
The hotel would like to solve this problem by building a time series model that can forecast the fluctuations in daily hotel cancellations with reasonably high accuracy. 饭店希望通过建立一个时间序列模型来解决此问题，该模型可以以相当高的准确性预测每日饭店取消的波动。
Here is a time series plot of the fluctuations in daily hotel cancellation bookings: 这是每日酒店取消预订波动的时间序列图：
Source: Jupyter Notebook Output 资料来源：Jupyter Notebook输出  型号配置 (Model Configuration)
The neural network is structured as follows: 神经网络的结构如下：
Source: Image Created By Author 资料来源：作者创作的图片 Here are the important model parameters that must be accounted for. 这是必须考虑的重要模型参数。
内核大小 (Kernel Size)
The kernel size is set to 3, meaning that each output is calculated based on the previous three time steps. 内核大小设置为3，这意味着每个输出都是基于前三个时间步长计算的。
Here is a rough illustration: 这是一个大概的例子：
Source: Image Created by Author. Template adopted from Udacity — Intro to TensorFlow for Deep Learning: Time Series Forecasting 来源：作者创建的图像。 Udacity采用的模板-TensorFlow简介以进行深度学习：时间序列预测 Setting the correct kernel size is a matter of experimentation, as a low kernel size risks poor model performance, while a high kernel size risks overfitting. 设置正确的内核大小是一个实验问题，因为低内核大小可能会带来不良的模型性能，而高内核大小可能会导致过度拟合。
As can be seen from the diagram, three input time steps are taken and used to generate a separate output. 从该图可以看出，采取了三个输入时间步长并用于生成单独的输出。
In this instance, causal padding is used in order to ensure that the output sequence has the same length as the input sequence. In other words, this ensures that the network “pads” time steps from the left side of the series in order to ensure that future values on the right side of the series are not being used in generating the forecast — this will quite obviously lead to false results and we will end up overestimating the accuracy of our model. 在这种情况下，使用因果填充以确保输出序列与输入序列具有相同的长度。 换句话说，这可以确保网络从序列的左侧“填充”时间步长，以确保在生成预测时不使用序列右侧的未来值，这显然会导致错误的结果，我们最终会高估模型的准确性。
大步前进 (Strides)
The stride length is set to one, which means that the filter slides forward by one time step at a time when forecasting future values. 步长设置为1，这意味着在预测未来值时，过滤器每次向前滑动一个时间步长。
However, this could be set higher. For instance, setting the stride length to two would mean that the output sequence would be approximately half the length of the input sequence. 但是，可以将其设置得更高。 例如，将步幅长度设置为2意味着输出序列将约为输入序列长度的一半。
A long stride length would mean that the model might potentially discard valuable data in generating the forecast, but increasing the stride length can be useful when it comes to capturing longer-term trends and smoothing out noise in the series. 较长的步幅将意味着该模型可能会在生成预测时丢弃有价值的数据，但是如果要捕获长期趋势并消除序列中的噪声，则增加步幅会很有用。
Here is the model configuration: 这是模型配置：
model = keras.models.Sequential([  keras.layers.Conv1D(filters=32, kernel_size=3,                      strides=1, padding="causal",                      activation="relu",                      input_shape=[None, 1]),  keras.layers.LSTM(32, return_sequences=True),  keras.layers.Dense(1),  keras.layers.Lambda(lambda x: x * 200)])lr_schedule = keras.callbacks.LearningRateScheduler(    lambda epoch: 1e-8 * 10**(epoch / 20))optimizer = keras.optimizers.SGD(lr=1e-8, momentum=0.9)model.compile(loss=keras.losses.Huber(),              optimizer=optimizer,              metrics=["mae"]) 结果 (Results)
Firstly, let’s make forecasts using the above model on different window sizes. 首先，让我们使用上述模型对不同的窗口大小进行预测。
It is important that the window size is large enough to account for the volatility across time steps. 重要的是，窗口大小必须足够大以考虑跨时间步长的波动性。
window_size = 5 (window_size = 5)
The training loss is as follows: 训练损失如下：
plt.semilogx(history.history["lr"], history.history["loss"])plt.axis([1e-8, 1e-4, 0, 30])Source: Jupyter Notebook Output 资料来源：Jupyter Notebook输出 Here is a visual of the forecasts versus actual daily cancellation values: 这是预测与实际每日取消值的对比：
rnn_forecast = model_forecast(model, series[:,  np.newaxis], window_size)rnn_forecast = rnn_forecast[split_time - window_size:-1, -1, 0]plt.figure(figsize=(10, 6))plot_series(time_valid, x_valid)plot_series(time_valid, rnn_forecast)Source: Jupyter Notebook Output 资料来源：Jupyter Notebook输出 The mean absolute error is calculated: 平均绝对误差计算如下：
>>> keras.metrics.mean_absolute_error(x_valid, rnn_forecast).numpy()9.113908With a mean of 19.89 across the validation set, the model accuracy is reasonable. However, we do see from the diagram above that the model falls short in terms of forecasting more extreme values. 整个验证集的平均值为19.89，模型准确性是合理的。 但是，从上图确实可以看出，该模型在预测更多极端值方面是不足的。
window_size = 30 (window_size = 30)
What if the window size was increased to 30? 如果窗口大小增加到30，该怎么办？
The mean absolute error decreases slightly: 平均绝对误差略有下降：
>>> keras.metrics.mean_absolute_error(x_valid, rnn_forecast).numpy()7.377962As mentioned, the stride length can be set higher if we wish to smooth out the forecast — with the caveat that such a forecast (the output sequence) will have less data points than that of the input sequence. 如前所述，如果我们希望对预测进行平滑处理，则可以将步长设置得更高一些，但要注意的是，这种预测(输出序列)的数据点将少于输入序列的数据点。
没有LSTM层的预测 (Forecasting without LSTM layer)
Unlike an LSTM, a CNN is not recurrent, which means that it does not retain memory of previous time series patterns. Instead, it can only train based on the data that is inputted by the model at a particular time step. 与LSTM不同，CNN不会重复出现，这意味着它不会保留先前时间序列模式的记忆。 相反，它只能基于模型在特定时间步长输入的数据进行训练。
However, by stacking several Conv1D layers together, it is in fact possible for a convolutional neural network to effectively learn long-term dependencies in the time series. 但是，通过将几个Conv1D层堆叠在一起，卷积神经网络实际上可以有效地学习时间序列中的长期依存关系。
This can be done using a WaveNet architecture. Essentially, this means that the model defines every layer as a 1D convolutional layer with a stride length of 1 and a kernel size of 2. The second convolutional layer uses a dilation rate of 2, which means that every second input timestep in the series is skipped. The third layer uses a dilation rate of 4, the fourth layer uses a dilation rate of 8, and so on. 这可以使用WaveNet体系结构来完成。 本质上，这意味着该模型将每一层定义为步长为1且内核大小为2的一维卷积层。第二个卷积层使用了2的膨胀率，这意味着该系列中的每个第二输入时间步长为跳过。 第三层使用4的膨胀率，第四层使用8的膨胀率，依此类推。
The reason for this is that it allows the lower layers to learn short-term patterns in the time series, while the higher layers learn longer-term patterns. 这样做的原因是，它允许较低的层学习时间序列中的短期模式，而较高的层则学习较长时间的模式。
The WaveNet model is defined as follows: WaveNet模型的定义如下：
model = keras.models.Sequential()model.add(keras.layers.InputLayer(input_shape=[None, 1]))for dilation_rate in (1, 2, 4, 8, 16, 32):    model.add(      keras.layers.Conv1D(filters=32,                          kernel_size=2,                          strides=1,                          dilation_rate=dilation_rate,                          padding="causal",                          activation="relu")    )model.add(keras.layers.Conv1D(filters=1, kernel_size=1))optimizer = keras.optimizers.Adam(lr=3e-4)model.compile(loss=keras.losses.Huber(),              optimizer=optimizer,              metrics=["mae"])model_checkpoint = keras.callbacks.ModelCheckpoint(    "my_checkpoint.h6", save_best_only=True)early_stopping = keras.callbacks.EarlyStopping(patience=50)history = model.fit(train_set, epochs=500,                    validation_data=valid_set,                    callbacks=[early_stopping, model_checkpoint])A window size of 64 is used in training the model. In this instance, we are using a larger window size than was used with the CNN-LSTM model, in order to ensure that the CNN model picks up longer-term dependencies. 训练模型使用的窗口大小为64。 在这种情况下，我们使用的窗口尺寸要大于CNN-LSTM模型所用的窗口尺寸，以确保CNN模型能够获得较长期的依赖性。
Note that early stopping is used when training the neural network. The purpose of this is to ensure that the neural network halts training at the point where further training would result in overfitting. Determining this manually is quite an arbitrary process, so early stopping can greatly assist with this. 注意提前停止   在训练神经网络时使用。 这样做的目的是确保神经网络在进一步训练会导致过度拟合的点停止训练。 手动确定此过程是一个任意过程，因此尽早停止可对此提供很大帮助。
Let’s now generate forecasts using the standalone CNN model that we just built. 现在，让我们使用刚刚构建的独立CNN模型来生成预测。
cnn_forecast = model_forecast(model, series[..., np.newaxis], window_size)cnn_forecast = cnn_forecast[split_time - window_size:-1, -1, 0]Here is a plot of the forecasted vs. actual data. 这是预测数据与实际数据的曲线图。
Source: Jupyter Notebook Output 资料来源：Jupyter Notebook输出 The mean absolute error came in slightly higher at 7.49. 平均绝对误差略高于7.49。
Note that for both models, the Huber loss was used as the loss function. This type of loss tends to be more robust to outliers, in that it is quadratic for smaller errors and linear for larger ones. 请注意，对于这两种模型，都将Huber损耗用作损耗函数。 这种类型的损失倾向于对异常值更健壮，因为对于较小的误差它是二次方的，对于较大的误差是线性的。
This type of loss is suitable for this scenario, as we can see that some outliers are present in the data. Using MSE (mean squared error) would overly inflate the forecast error yielded by the model, whereas MAE on its own would likely underestimate the size of the error by not taking such outliers into account. The use of a Huber loss function allows for a happy medium. 这种类型的丢失适用于这种情况，因为我们可以看到数据中存在一些异常值。 使用MSE(均方误差)会过分夸大模型产生的预测误差，而MAE本身可能会由于不考虑这些离群值而低估了误差的大小。 使用Huber损失函数可得出满意的结果。
>>> keras.metrics.mean_absolute_error(x_valid, cnn_forecast).numpy()7.490844Even with a slightly higher MAE, the CNN model has performed quite well in forecasting daily hotel cancellations, without having to be combined with an LSTM layer in order to learn long-term dependencies. 即使具有较高的MAE，CNN模型在预测酒店的每日取消中也表现良好，而无需与LSTM层组合即可了解长期依赖关系。
结论 (Conclusion)
In this example, we have seen: 在此示例中，我们看到了：
The similarities and differences between CNNs and LSTMs in forecasting time series CNN和LSTM在预测时间序列上的异同 How dilated convolutions assist CNNs in forecasting time series 膨胀卷积如何帮助CNN预测时间序列 Modification of kernel size, padding and strides in forecasting a time series with CNN 修改CNN预测时间序列中的内核大小，填充和步幅 Use of a WaveNet architecture to conduct a time series forecast using stand-alone CNN layers 使用WaveNet架构通过独立的CNN层进行时间序列预测 In particular, we saw how a CNN can produce similarly strong results compared to a CNN-LSTM model through the use of dilation. 特别是，我们看到了CNN通过使用扩张方法与CNN-LSTM模型相比可以产生相似的结果。
Many thanks for your time, and any questions, suggestions or feedback are greatly appreciated. 非常感谢您的宝贵时间，对于任何问题，建议或反馈，我们深表感谢。
As mentioned, this topic is also covered in the Intro to TensorFlow for Deep Learning course from Udacity course — I highly recommend the chapter on Time Series Forecasting for further detail on this topic. 如前所述， Udacity课程的TensorFlow深度学习入门课程中也涵盖了该主题-我强烈建议有关时间序列预测的章节以获取有关此主题的更多详细信息。
You can also find the full Jupyter Notebook that I used for running this example on hotel cancellations here. 你也可以找到完整的Jupyter笔记本电脑，我用于运行在酒店取消这个例子在这里 。
The original Jupyter Notebook (Copyright 2018, The TensorFlow Authors) can also be found here. 也可以在此处找到原始的Jupyter Notebook(版权所有2018，TensorFlow Authors)。
翻译自: https://towardsdatascience.com/cnn-lstm-predicting-daily-hotel-cancellations-e1c75697f124cnn lstm预测
展开全文
• lstm预测股票When trying to look at examples of LSTMs in Keras, I’ve found a lot that focus on using them to predict stock prices in the future. Most are pretty bare-bones though, consisting of little...
lstm预测股票When trying to look at examples of LSTMs in Keras, I’ve found a lot that focus on using them to predict stock prices in the future. Most are pretty bare-bones though, consisting of little more than a basic LSTM network and a quick plot of the prediction. Though I think the utility of these models is a little questionable, it brought a question into my head: how accurate are the predictions made by a model trained on one stock if it’s predicting on another stock?当尝试查看Keras中的LSTM的示例时，我发现有很多重点放在使用它们预测未来的股价上。 不过，大多数工具只是一个简单的系统，仅由基本的LSTM网络和快速的预测图组成。 尽管我认为这些模型的实用性存在一些疑问，但它使我想到一个问题：如果模型预测一只股票，则该模型对另一只股票进行预测的准确性如何？
The full code can be found here. 完整的代码可以在这里找到。
问题描述
(Problem Description)
Stocks are correlated with each other to varying degrees, so the behaviors of any given pair of stocks may or may not track each other. The correlation between stocks is usually measured as the correlation of their returns (or at least, that’s what I’ve seen), and it’s easy to compute those yourself.股票彼此之间有不同程度的关联，因此任何给定的股票对的行为可能相互追踪，也可能不相互追踪。 股票之间的相关性通常以回报率的相关性来衡量(至少，这就是我所看到的)，而且自己计算也很容易。
In addition, there are an immense number of posts and such about predicting stock prices with neural networks. These examples usually don’t go too deep, though, and they invariably train and check the model using data from the same stock. That’s reasonable enough, but it raises the question of how generalizable these models are. It doesn’t seem likely that the models would create good predictions if there was weak correlation between the stock they were trained on and the one it’s predicting on, but maybe it would work well enough for stocks that are more strongly correlated. 此外，还有大量的职位，例如关于使用神经网络预测股票价格的职位。 这些示例通常不会太深入，它们总是使用相同库存中的数据来训练和检查模型。 这足够合理，但是提出了这些模型的通用性问题。 如果他们所训练的股票与所预测的股票之间的相关性较弱，那么这些模型似乎不太可能产生良好的预测，但是对于相关性更高的股票而言，它可能会很好地起作用。
So the goal here is:- Get data on a large number of stocks (preferably hundreds).- Compute the correlations between the stocks.- Train an LSTM on a single, reference stock.- Make predictions for the other stocks using that LSTM model.- See how some error metric varies with correlation. 因此，这里的目标是：-获取大量股票(最好是数百个)的数据。-计算股票之间的相关性。-在单个参考股票上训练LSTM。-使用该LSTM模型对其他股票进行预测.-了解一些误差度量如何随相关性变化。
获取数据
(Getting the Data)
Since I’m aiming to get data on a few hundred stocks, the first list that jumps to mind is the S&P 500. There are actually 505 tickers on there, but that’s because five of the companies have multiple share classes. I just discarded one class for each stock with multiple share classes — the list I ended up using is in the GitHub repo for this post.由于我的目标是获取几百只股票的数据，因此，我想到的第一个列表是标准普尔500指数。实际上，有505种股票在此收盘，但这是因为其中有五家公司拥有多种股票类别。 我只是为具有多个股票类别的每只股票放弃了一个类别–我最终使用的列表在此帖子的GitHub存储库中。
I downloaded the data from Tiingo via the pandas_datareader library. Tiingo limits free accounts to 500 unique symbols per month, so it’s feasible to grab this all at once, although you won’t to be able get data for any other ticker with that account for the remainder of the month. 我是通过pandas_datareader库从Tiingo下载数据的。 Tiingo每月将免费帐户限制为500个唯一符号，因此尽管在该月的剩余时间内您将无法使用该帐户获取任何其他报价器的数据，但一次捕获全部是可行的。
This will take several minutes to execute. If you’re running this code yourself, I recommend saving the data immediately afterward — the file that my run produced was almost 300 megabytes and contained about 2.3 million rows, so it’s not something you want to repeatedly download. 这将需要几分钟的时间来执行。 如果您自己运行此代码，我建议之后立即保存数据-我的运行产生的文件将近300兆字节，包含约230万行，因此您不想重复下载该文件。
选择和缩放数据
(Selecting & Scaling Data)
Since we’re dealing with an LSTM, we’d like to have data scaled down to a range that’s better handled by the LSTM inputs. And since the scales of the stocks differ, we need individual scalers for each stock. However, even though only the reference stock will have its data used for training, I want to ensure that all of the stocks have complete data for the same timeframe, since scaling the other stocks on just their test data would exaggerate how big some of the movements were in the stocks.由于我们正在处理LSTM，因此我们希望将数据缩小到LSTM输入可以更好地处理的范围。 而且由于存货规模不同，我们需要为每个存货单独定标器。 但是，即使仅参考股票将用于训练的数据，我还是要确保所有股票在同一时间范围内都有完整的数据，因为仅根据测试数据缩放其他股票会夸大某些股票的规模。股票走势。
It turns out that complete data from 2001–01–01 to 2019–12–31 exists for 370 of the stocks, so I opted to just filter down to those. 事实证明，其中有370只股票从2001-01-01到2019-12-31有完整的数据，因此我选择只过滤掉这些股票。
adj_closes = sp500_data["adjClose"].unstack("symbol")

INPUT_LENGTH = 200
OUTPUT_LENGTH = 40
REFERENCE_STOCK = "ALL"It didn’t matter to me what the reference stock was, so I just picked one using random.choice() from Python’s standard library. The result was ALL (The Allstate Corporation), so we can set that as a constant, along with the lengths of the inputs and outputs for the network.对我而言，参考股票是什么无关紧要，所以我只是使用Python标准库中的random.choice()选择了一个股票。 结果是ALL(Allstate Corporation)，因此我们可以将其以及网络输入和输出的长度设置为一个常数。
For scaling the data, most of the posts I saw used sklearn.preprocessing.MinMaxScaler() on the data to get it to a scale that the LSTM would work better with. I took it one step further, though — it seemed like monitoring the stock’s value changes in terms of percent change was a bit more consistent than using the absolute price. For example, if we consider the somewhat extreme case of Apple (AAPL): 为了缩放数据，我看到的大多数文章都在数据上使用了sklearn.preprocessing.MinMaxScaler()以使其达到LSTM可以更好使用的比例。 但是，我又迈出了一步-似乎从百分比变化的角度监控股票的价值变化比使用绝对价格更加一致。 例如，如果我们考虑苹果公司(AAPL)的某种极端情况：
Daily changes in AAPL value by absolute change (difference in price) and percent change. 通过绝对变化(价格差异)和百分比变化得出的APL值的每日变化。To deal with this, I made a child class of the MinMaxScaler() which takes the logarithm of the data before applying the usual MinMaxScaler() functionality. As a result, percentage changes are now absolute changes. 为了解决这个问题，我创建了MinMaxScaler()的子类，该子类在应用常规MinMaxScaler()功能之前获取数据的对数。 结果，百分比变化现在是绝对变化。
from sklearn.preprocessing import MinMaxScaler

class LogMinMaxScaler(MinMaxScaler):
"""
Essentially a modified version of the MinMaxScaler, where fitting
the scaler includes taking the base-10 logarithm of the data.
"""

def fit(self, X, output_length, **fit_params):
log_X = np.log10(X)
log_X = log_X[:-(2*output_length),:]   # scale only the data used for training
return super().fit(log_X, y=None, **fit_params)

def transform(self, X):
log_X = np.log10(X)
return super().transform(log_X)

def fit_transform(self, X, output_length, **fit_params):
return self.fit(X, output_length, **fit_params).transform(X)

def inverse_transform(self, X):
log_X = super().inverse_transform(X)
return np.power(10, log_X)

lmms = LogMinMaxScaler()
scaled_data = lmms.fit_transform(full_adj_closes.drop(REFERENCE_STOCK, axis=1).values, OUTPUT_LENGTH)Making a child of MinMaxScaler() has several advantages over coding your own. The biggest for me is that MinMaxScaler() already does independent scaling on each column of the data and stores all the necessary information. That’s exactly what’s needed for these few hundred stocks, and this way I don’t need to try to reimplement that myself.与编写自己的代码相比，让MinMaxScaler()的孩子具有多个优点。 对我来说，最大的好处是MinMaxScaler()已经对数据的每一列进行了独立缩放，并存储了所有必要的信息。 这正是这几百只股票所需要的，因此我无需尝试重​​新实现这一点。
We also need the correlation matrix. Thankfully, pandas has pandas.DataFrame.corr() for this, so we just need to calculate the returns and remove the correlation for the reference stock. 我们还需要相关矩阵。 幸运的是，pandas为此具有pandas.DataFrame.corr()，因此我们只需要计算收益并删除参考股票的相关性即可。
def get_return_correlations(adj_close_df, ticker):
correlations = returns.corr()
correlations = correlations.loc[correlations.index != ticker, ticker]
return correlations

np.quantile(correlations, [0, 0.25, 0.5, 0.75, 1])
## array([0.10957719, 0.29965404, 0.35403168, 0.42932382, 0.65370964])The correlations do vary a decent amount, although I would describe the bulk of stocks as just being mildly correlated. The fact that they’re all positive probably reflects the general tendency for the market to go up over time, especially in the time window we’re considering here.尽管我将大部分股票描述为温和相关，但它们的相关性确实变化很大。 它们都是积极的事实可能反映了市场随着时间的推移总体上呈上升趋势，特别是在我们正在考虑的时间范围内。
Finally, create the arrays to hold the training data and the other stock data to predict on. 最后，创建数组以保存训练数据和其他要预测的股票数据。
def make_input_output_data(data_series, history_length, future_length):
shifted_data = {}
for i in range(-future_length, history_length):
shifted_data[f"d_{-1*i}"] = data_series.shift(periods=i)
data_df = pd.DataFrame(shifted_data).dropna()
data_df = data_df.iloc[:,::-1]
input_data = data_df.iloc[:-(future_length),:history_length].copy()
output_data = data_df.iloc[:-(future_length), history_length:].copy()
test_input = data_df.iloc[-1, :history_length].copy()
test_output = data_df.iloc[-1, history_length:].copy()
return input_data.values, output_data.values, test_input.values, test_output.values

def get_test_data(data_array, input_length, output_length):
inputs = data_array[-(input_length + output_length):-(output_length)]
outputs = data_array[-output_length:,:]
return inputs, outputs

ref_scaler = LogMinMaxScaler()
scaled_reference = ref_scaler.fit_transform(reference_closes.values.reshape([-1,1]), OUTPUT_LENGTH)
scaled_reference = pd.Series(scaled_reference.reshape([-1]))

train_in, train_out, ref_test_in, ref_test_out = make_input_output_data(scaled_reference,
INPUT_LENGTH, OUTPUT_LENGTH)
train_in = train_in.reshape([*train_in.shape, 1])
test_in, test_out = get_test_data(scaled_data, INPUT_LENGTH, OUTPUT_LENGTH)LSTM模型(The LSTM Model)
LSTM models in the posts I saw typically used 50 nodes per hidden layer with two to four hidden layers. But they also only predicted one point at a time, and I wanted to see how well a sequence could be predicted. So I made the following model, largely based on an example from here.我看到的帖子中的LSTM模型通常每个隐藏层使用50个节点，其中包含2到4个隐藏层。 但是他们一次也只能预测一个点，我想看看序列可以被预测得多么好。 因此，我主要根据此处的示例制作了以下模型。
from keras import layers, Input
from keras.models import Sequential

def make_stock_model(L1, L2):
model = Sequential([
Input(shape=[INPUT_LENGTH,1]),
layers.LSTM(L1, return_sequences=False),
layers.RepeatVector(OUTPUT_LENGTH),
layers.LSTM(L2, return_sequences=True),
layers.TimeDistributed(layers.Dense(1))
])
return modelThe combination of RepeatVector() and TimeDistributed() is what allows the prediction of multiple points. The predictions don’t feed back into the model, though, so every point in the prediction is based on the same data.RepeatVector()和TimeDistributed()的组合可以预测多个点。 但是，预测不会反馈到模型中，因此预测中的每个点都基于相同的数据。
Since I was a little unsure about the sizes of the LSTM layers in the model, I tried doing some grid search cross-validation. (I know random hyperparameter searches are more efficient, but since I only have two variables I didn’t think it would make much difference.) Of course, since this is temporal data, we need to split the data appropriately, lest data leaks confuse things. 由于我不确定模型中LSTM层的大小，因此我尝试进行一些网格搜索交叉验证。 (我知道随机超参数搜索会更有效，但是由于我只有两个变量，所以我认为不会有太大的区别。)当然，由于这是时间数据，因此我们需要适当地分割数据，以免数据泄漏造成混淆。东西。
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV, TimeSeriesSplit

parameters = {"L1":[80,100,120,140], "L2":[80,100,120,140]}
model = KerasRegressor(build_fn=make_stock_model, epochs=8, batch_size=200)
ts_split = TimeSeriesSplit(n_splits=5)
grid_search = GridSearchCV(estimator=model, param_grid=parameters, cv=ts_split)

grid_search.fit(train_in, train_out)The best model in the grid search had 140 neurons in each LSTM layer, so that’s what the final model uses.网格搜索中最好的模型在每个LSTM层中都有140个神经元，因此这就是最终模型所使用的模型。
from tensorflow.random import set_seed as tf_seed

tf_seed(1)
final_model = make_stock_model(140, 140)
final_model.fit(train_in, train_out, epochs=8, batch_size=200)做出预测(Making Predictions)
Making predictions on future stock prices means predicting the actual price, not just a scaled version of it. As such, the error metric shouldn’t be distorted by having some stock prices in the tens of dollars and others in the hundreds or thousands. Mean absolute percentage error seemed like a good metric that fit this requirement.对未来股票价格进行预测意味着预测实际价格，而不仅仅是预测价格。 因此，错误度量不应因数十美元的股票价格和数十万美元的股票价格而失真。 平均绝对百分比误差似乎是适合此要求的良好指标。
So first, a check on the reference stock. How well did the model do on it? 因此，首先要检查参考库存。 该模型对此做得如何？
def mean_absolute_percentage_error(y_true, y_pred, axis=None):
return np.mean(np.abs((y_true - y_pred) / y_true), axis=axis) * 100

ref_prediction = final_model.predict(ref_test_in.reshape([1,-1,1]))
mean_absolute_percentage_error(ref_prediction.reshape([-1]), ref_test_out.reshape([-1]))
## 1.771431502350249It’s okay — it’s off by 1–2% for most of these estimations, which isn’t too bad. Of course, the important bit is how accurate it is when the data isn’t scaled.没关系-对于大多数此类估计，它都降低了1-2％，这还不错。 当然，重要的一点是不缩放数据时的准确性。
unscaled_prediction = ref_scaler.inverse_transform(ref_prediction.reshape([-1,1]))
unscaled_test = ref_scaler.inverse_transform(ref_test_out.reshape([-1,1]))
mean_absolute_percentage_error(unscaled_prediction, unscaled_test)
## 4.092220035446507Once it’s unscaled, we end up with about a 4.1% MAPE. I’m not sure if that’s particularly good or not, but it brings us to the main question: How do the other stocks fare?一旦无法缩放，我们最终将获得约4.1％的MAPE。 我不确定这是否特别好，但这使我们想到了一个主要问题：其他股票的价格如何？
test_in_array = test_in.transpose()
test_in_array = test_in_array.reshape([*test_in_array.shape, 1])
test_out_array = test_out.transpose()
test_out_predictions = final_model.predict(test_in_array)
test_out_mape = mean_absolute_percentage_error(test_out_array, test_out_predictions[:,:,0], axis=1)It’s hard to tell exactly what is or isn’t there. It seems like there are fewer extreme MAPE values at both high and low correlations, but maybe that’s just because there are fewer points out there. We can try something a little stricter, by binning the data based on correlation and running an ANOVA on the bins (with a boxplot for visualization purposes).很难确切地知道那里是什么。 似乎在高相关性和低相关性下都有极低的MAPE值，但这也许是因为那里的点较少。 我们可以通过基于相关性对数据进行分箱并在分箱上运行ANOVA(具有用于可视化目的的箱线图)来尝试更严格的方法。
import scipy.stats as stats  # for one-way ANOVA

levels = ["<0.2", "0.2-0.3", "0.3-0.4", "0.4-0.5", "0.5-0.6", ">0.6"]
cuts = [0,0.2,0.3,0.4,0.5,0.6,1]
correlations_boxed = pd.cut(correlations, bins=cuts, labels=levels)

corr_df = pd.DataFrame({"Correlation":correlations_boxed, "MAPE":test_out_mape})
groups = [pd.DataFrame(x)["MAPE"] for _, x in corr_df.groupby("Correlation", as_index=False)]
stats.f_oneway(*groups)
## F_onewayResult(statistic=0.859307579038562, pvalue=0.5086123809619022)A p-value of around 0.51 is much larger than any typical significance level, so it looks like there’s no statistically discernible differences between the above groups, despite the boxplot suggesting otherwise. But this is all on the scaled data. What if it’s unscaled?p值大约为0.51，比任何典型的显着性水平大得多，因此，尽管箱形图表明并非如此，但上述两组之间似乎没有统计学上的可分辨差异。 但这全都在缩放数据上。 如果不缩放怎么办？
unscaled_out = lmms.inverse_transform(test_out_array.transpose())
unscaled_predictions = lmms.inverse_transform(test_out_predictions[:,:,0].transpose())
unscaled_mape = mean_absolute_percentage_error(unscaled_out, unscaled_predictions, axis=0)The main takeaway from this — which could be seen on the reference stock — is that unscaling the data increases the MAPE by a lot, to somewhat worrying levels in a lot of cases. There’s still nothing very strong looking in this plot, especially with the MAPE values being considerably more spread out than before. It again looks like the higher correlations might not have as much spread, but it’s still tenuous.从这一点上可以得出的主要结论(可以在参考股票上看到)是，对数据进行无比例缩放会在很多情况下使MAPE大大提高，甚至达到令人担忧的水平。 在此图中，仍然没有任何看起来很强的外观，尤其是MAPE值比以前散布得多。 再次看起来，较高的相关性可能没有那么大的扩展，但是仍然很脆弱。
unscaled_corr_df = pd.DataFrame({"Correlation":correlations_boxed, "MAPE":unscaled_mape})
unscaled_groups = [pd.DataFrame(x)["MAPE"] for _, x in unscaled_corr_df.groupby("Correlation", as_index=False)]
stats.f_oneway(*unscaled_groups)
## F_onewayResult(statistic=2.274528720161754, pvalue=0.046747350218964395)The ANOVA is a lot more suggestive this time around, though. With p=0.047, this would be statistically significant for some common significance levels (including 0.05), though not all (it’s still above 0.01, for instance).不过，这次方差分析更具启发性。 在p = 0.047的情况下，对于某些常见的显着性水平(包括0.05)，这在统计上是显着的，尽管并非全部(例如，仍高于0.01)。
结论
(Conclusions)
With this basic LSTM model, there might be some relationship between prediction error and stock correlation. Given how the MAPE values for the unscaled predictions on the non-reference stocks looked, there’s clearly work to be done on creating a more accurate model. Running this code a number of times would also be necessary to get a strong picture of the truth, given the random initialization that comes with neural networks.使用此基本LSTM模型，预测误差和股票相关性之间可能存在某些关系。 考虑到非参考股票的非标度预测的MAPE值看起来如何，显然需要做一些工作来创建更准确的模型。 考虑到神经网络附带的随机初始化，多次运行此代码对于了解真实情况也很有必要。
翻译自: https://medium.com/@gjanesch/stock-correlation-versus-lstm-prediction-error-5ca96a110336lstm预测股票
展开全文
• tensorFlow使用LSTM预测caipiao
• stock_price_prediction_LSTM 使用LSTM预测股票价格。
• 使用空乘客数据集训练LSTM预测乘客量
• 3.Multi-Step LSTM预测（shampoo-sales） 1_1.Multi-Step LSTM预测1 1.静态模型预测 1_2.Multi-Step LSTM预测2 1.多步预测的LSTM网络 二、LSTM_Fly(airline-passengers) 1.LSTM回归网络(1→1) 2.移动...
• 先汇总资料，后续完工 LSTM 一种改进的RNN，解决RNN记忆长期先验失效问题。...Alexandre Xavier， 验证 | 单纯用LSTM预测股价，结果有多糟（附代码） 使用LSTM-RNN建立股票预测模型 Jakob Aun...

先汇总资料，后续完工

LSTM
一种改进的RNN，解决RNN记忆长期先验失效问题。
具体见：《初次理解LSTM本质》

通过每日股价预测次日股价

References

TensorFlow LSTM
LSTM预测股价的APP
Alexandre Xavier， 验证 | 单纯用LSTM预测股价，结果有多糟（附代码）
使用LSTM-RNN建立股票预测模型
Jakob Aungiers， 通过LSTM神经网络预测股市
LSTM运用在股票数据
Keras 实现 LSTM

网上比较好的案例资料就是这些了，后续将其一一实现看看效果


展开全文
• https://blog.csdn.net/xspyzm/article/details/105367729 lstm预测新冠疫情所用到的数据
• LSTM预测库存 该项目包括使用LSTM对库存数据进行培训和预测的过程。 特点是同伴： 简洁模块化 支持pytorch，keras和tensorflow的三个主流深度学习框架 参数，模型和框架可以高度定制和修改 支持增量培训 支持同时...
• matlab深度学习工具箱之LSTM， 采用历史序列进行预测， MATLAB应用实例 直接采用工具箱进行序列预测
• 我试图使用LSTM预测Google股票价格 长短期记忆（LSTM）单元（或块）是递归神经网络（RNN）层的构建单元。 由LSTM单元组成的RNN通常称为LSTM网络。 常见的LSTM单元由单元，输入门，输出门和忘记门组成。 该单元负责在...
• Multi-Step LSTM预测（2） 教程原文链接 多步预测的LSTM网络 数据准备 1、变成具有稳定性数据 2、缩放数据 Python时间序列LSTM预测系列教程（2）-单变量 LSTM模型预测过程 1、数据预测处理，准备数据 2...

Multi-Step LSTM预测（2）

教程原文链接

关联教程：
Python时间序列LSTM预测系列教程（10）-多步预测
Python时间序列LSTM预测系列教程（11）-多步预测

多步预测的LSTM网络

数据准备
1、变成具有稳定性数据
2、缩放数据
Python时间序列LSTM预测系列教程（2）-单变量

LSTM模型预测过程

1、数据预测处理，准备数据
2、定义模型
3、训练模型
4、预测
5、数据逆变换
6、评估

代码

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sqrt
from matplotlib import pyplot
from numpy import array

def parser(x):
return datetime.strptime('190'+x, '%Y-%m')

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols, names = list(), list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
# put it all together
agg = concat(cols, axis=1)
agg.columns = names
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return Series(diff)

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
# extract raw values
raw_values = series.values
# transform data to be stationary
diff_series = difference(raw_values, 1)
diff_values = diff_series.values
diff_values = diff_values.reshape(len(diff_values), 1)
# rescale values to -1, 1
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_values = scaler.fit_transform(diff_values)
scaled_values = scaled_values.reshape(len(scaled_values), 1)
# transform into supervised learning problem X, y
supervised = series_to_supervised(scaled_values, n_lag, n_seq)
supervised_values = supervised.values
# split into train and test sets
train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
return scaler, train, test

# fit an LSTM network to training data
def fit_lstm(train, n_lag, n_seq, n_batch, nb_epoch, n_neurons):
# reshape training into [samples, timesteps, features]
X, y = train[:, 0:n_lag], train[:, n_lag:]
X = X.reshape(X.shape[0], 1, X.shape[1])
# design network
model = Sequential()
# fit network
for i in range(nb_epoch):
model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
model.reset_states()
return model

# make one forecast with an LSTM,
def forecast_lstm(model, X, n_batch):
# reshape input pattern to [samples, timesteps, features]
X = X.reshape(1, 1, len(X))
# make forecast
forecast = model.predict(X, batch_size=n_batch)
# convert to array
return [x for x in forecast[0, :]]

# evaluate the persistence model
def make_forecasts(model, n_batch, train, test, n_lag, n_seq):
forecasts = list()
for i in range(len(test)):
X, y = test[i, 0:n_lag], test[i, n_lag:]
# make forecast
forecast = forecast_lstm(model, X, n_batch)
# store the forecast
forecasts.append(forecast)
return forecasts

# invert differenced forecast
def inverse_difference(last_ob, forecast):
# invert first forecast
inverted = list()
inverted.append(forecast[0] + last_ob)
# propagate difference forecast using inverted first value
for i in range(1, len(forecast)):
inverted.append(forecast[i] + inverted[i-1])
return inverted

# inverse data transform on forecasts
def inverse_transform(series, forecasts, scaler, n_test):
inverted = list()
for i in range(len(forecasts)):
# create array from forecast
forecast = array(forecasts[i])
forecast = forecast.reshape(1, len(forecast))
# invert scaling
inv_scale = scaler.inverse_transform(forecast)
inv_scale = inv_scale[0, :]
# invert differencing
index = len(series) - n_test + i - 1
last_ob = series.values[index]
inv_diff = inverse_difference(last_ob, inv_scale)
# store
inverted.append(inv_diff)
return inverted

# evaluate the RMSE for each forecast time step
def evaluate_forecasts(test, forecasts, n_lag, n_seq):
for i in range(n_seq):
actual = [row[i] for row in test]
predicted = [forecast[i] for forecast in forecasts]
rmse = sqrt(mean_squared_error(actual, predicted))
print('t+%d RMSE: %f' % ((i+1), rmse))

# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
# plot the entire dataset in blue
pyplot.plot(series.values)
# plot the forecasts in red
for i in range(len(forecasts)):
off_s = len(series) - n_test + i - 1
off_e = off_s + len(forecasts[i]) + 1
xaxis = [x for x in range(off_s, off_e)]
yaxis = [series.values[off_s]] + forecasts[i]
pyplot.plot(xaxis, yaxis, color='red')
# show the plot
pyplot.show()

# configure
n_lag = 1
n_seq = 3
n_test = 10
n_epochs = 1500
n_batch = 1
n_neurons = 1
# prepare data
scaler, train, test = prepare_data(series, n_test, n_lag, n_seq)
# fit model
model = fit_lstm(train, n_lag, n_seq, n_batch, n_epochs, n_neurons)
# make forecasts
forecasts = make_forecasts(model, n_batch, train, test, n_lag, n_seq)
# inverse transform forecasts and test
forecasts = inverse_transform(series, forecasts, scaler, n_test+2)
actual = [row[n_lag:] for row in test]
actual = inverse_transform(series, actual, scaler, n_test+2)
# evaluate forecasts
evaluate_forecasts(actual, forecasts, n_lag, n_seq)
# plot forecasts
plot_forecasts(series, forecasts, n_test+2)


展开全文
• 根据股票历史数据中的最低价、最高价、开盘价、收盘价、交易量、交易额、跌涨幅等...单因素输入特征及RNN、LSTM的介绍请戳上一篇 Tensorflow实例：利用LSTM预测股票每日最高价（一）导入包及声明常量import pandas a
• 简单的深度学习LSTM预测黑子活动 笔记本： LSTM-Sunpots-tn_b.ipynb：该笔记本为月度黑子数据建模 LSTM-Sunpots-tn_c.ipynb：该笔记本为每日黑子数据建模 LSTM-Sunpots-tn_f.ipynb：此笔记本为月黑子数据建模 LSTM-...
• ## pytorch_LSTM预测股票行情

千次阅读 多人点赞 2021-04-29 22:28:58
7.8 用LSTM预测股票行情 7.8.1 导入数据 # Tushare是一个免费、开源的python财经数据接口包。主要实现对股票等金融数据从数据采集、清洗加工 到 数据存储的过程 import tushare as ts cons = ts.get_apis() #获取...
• Multi-Step LSTM预测（1） 教程原文链接 数据集 Python时间序列LSTM预测系列教程（1）-单变量 数据准备与模型评估 1、拆分成训练和测试数据。 训练数据=前两年香皂销售数据 测试数据=剩下一年的香皂销售...
• 使用LSTM预测时间序列数据 文章目录背景结论代码实验结果RNN和DNN的区别RNN和LSTM的区别 背景 复现 @“使用Keras进行LSTM实战” https://blog.csdn.net/u012735708/article/details/82769711 中的实验 熟悉用LSTM...
• 基于改进萤火虫算法的LSTM预测模型，韩宪斌，亓峰，本文针对LSTM神经网络预测时存在的收敛慢、超参数调整困难等缺陷，提出了通过萤火虫算法优化神经网络结构以提高流量预测性能的模��
• 使用LSTM预测用户的质量 使用用户的页面点击行为数据，预测用户的好坏 代码： import numpy as np from numpy.random import seed seed(1) from tensorflow import set_random_seed set_random_seed(2) import pdb ...
• SFO-landings-EDA-and-LSTM 旧金山国际机场 (SFO) 空中交通着陆统计的 EDA 和 LSTM 预测
• lstm预测单词As part of my summer internship with Linagora’s R&D team, I was tasked with developing a next word prediction and autocomplete system akin to that of Google’s Smart Compose. In this ...
• LSTM LSTM网络 long short term memory，即我们所称呼的LSTM，是为了解决长期以来问题而专门设计出来的，所有的RNN都具有一种重复神经网络模块的链式形式。在标准RNN中，这个重复的结构模块只有一个非常简单的结构，...
• Time series forecasting using LSTM.
• 一、简介 1 名称由来 灰色模型（Gray Model）,邓聚龙教授1982年提出。 常见系统分类： 白色系统是指一个系统的内部特征是完全 已知的，即系统的信息是完全充分的。 黑色系统是指一个系统的内部信息对外界来说是...
• 多变量LSTM预测模型（3） 前置教程： Python时间序列LSTM预测系列教程（7）-多变量 Python时间序列LSTM预测系列教程（8）-多变量 定义&训练模型 1、数据划分成训练和测试数据 本教程用第一年数据做训练，...
• Tensorflow中的LSTM自动编码器和LSTM未来预测器。 这是基于本文的简单实现： : 要求 Tensorflow 1.4.0 的Python 3.5.4 Python软件包：numpy，matplotlib，os，argparse，scipy 用法 数据生成后实施重建或未来预测...
• lstm 做航迹预测预测Note: This is an update to my previous article Forecasting Average Daily Rate Trends for Hotels Using LSTM. I since recognised a couple of technical errors in the original analysis,...

...