
High dimension sparse matrix
20201205 15:52:13True</code> flag you can print high dimension Confusion Matrix more suitable with ignoring rows and columns fully filled with 0s.</li></ul>该提问来源于开源项目：sepandhaghighi/pycm</p></div> 
High dimension spaces with constraints
20201208 23:11:07Unfortunately, because of the high dimension, a trivial execution of pymultinest always gives points outside the valid region. Basically, because of the high dimension, pymultinest can never find the... 
High Dimension Similarity Search
20180729 22:54:581、支持高维向量的相似度计算 2、有前端界面，对用户友好。 
integration by part in high dimension
20160315 20:02:20let's look at a question first. how can we derive (2.10) from (2.9)?...Give the definition of integration by part in high dimension from wiki first. {[from https://en.wikipedia.org/wiki/Integralet's look at a question first.
how can we derive (2.10) from (2.9)?
Give the definition of integration by part in high dimension from wiki first.
{[from https://en.wikipedia.org/wiki/Integration_by_parts]
Higher dimensions[edit]
The formula for integration by parts can be extended to functions of several variables. Instead of an interval one needs to integrate over an ndimensional set. Also, one replaces the derivative with apartial derivative.
 .
More specifically, suppose Ω is an open bounded subset of ℝ^{n} with a piecewise smooth boundary Γ. If u and v are two continuously differentiable functions on the closure of Ω, then the formula for integration by parts is
where is the outward unit surface normal to Γ, is its ith component, and i ranges from 1 to n.
Replacing v in the above formula with v_{i} and summing over i gives the vector formula
where v is a vectorvalued function with components v_{1}, ..., v_{n}.
Setting u equal to the constant function 1 in the above formula gives the divergence theorem
For where , one gets
which is the first Green's identity.
}Then we give the relationship between the gradient and directional derivative:
{[from math guidebook for graduate entrance examination]
}
At the end, the whole derivation process will be shown:

Optimize print in high dimension matrix
20201126 01:12:51<div><p>该提问来源于开源项目：sepandhaghighi/pycm</p></div> 
Clarity List Types HighDimension Behavior
20201208 18:39:01</li><li>Every entry in the list must be of the same type</li><li>Highdimension lists are "square" in that the max length of the toplevel list is also the maxlength for any child lists.... 
Dimension and Step Damage Identification for High Rise Frame Structure
20150605 17:29:46Dimension and Step Damage Identification for High Rise Frame Structure 
Similarity Search in High Dimension via Hashing
20170816 17:32:48局部敏感哈希 
Weird variance change with mirrors in high dimension
20201225 23:54:00This happens only with selective delayed mirrors and diagonal and active update and large population size and large dimension but irrespectively of the stepsize adaptation method.</p><p>该提问来源于... 
huge memory cost in initialization with high dimension input
20201122 05:24:41<p>I am trying cvxpylayer on a problem with dimension 4k. It takes quite a while to initialize the layer. Moreover, the memory cost is also huge with 400G consumption. The memory is NOT released after... 
the empirical idea about the performance of Bayesian optimization for high dimension (parameters).
20201208 23:16:14<div><p>I am going to test high dimension (20~40 parameters, or more) objective function. Do you have some empirical idea to me how is BO learning process for high dimension? From my review about ... 
Quantum secure direct communication with high dimension quantum superdense coding
20200227 23:47:37利用高维量子密集编码的量子直接安全通讯方案，王川， 邓富国，本文提出了一个利用高维量子密集编码的量子直接通讯方案.本方案结合了块传输、量子乒乓直接通讯和量子密集编码的思想。这个方案� 
CMF download fails for newform with very high dimension
20201209 02:58:14<div><p>The download all data link for 983.2.c.a throws a server error (the other download links seem to work) <p>...LMFDB/lmfdb</p></div> 
Simbody CMAES Optimizer get stuck for high dimension problem
20201128 19:51:34<div><p>I am trying to perform Anderson F.C., el al., A Dynamic Optimization Solution for Vertical Jumping, and to find a solution through optimization. There is something that me and some students ... 
【论文笔记】Deep Neural Networks for High Dimension, Low Sample Size Data
20210203 23:22:39Deep Neural Networks for High Dimension, Low Sample Size DatacodedatasetIntroductionRelated WorkDNP ModelDNP for High DimensionalityDNP for Small Sample SizeStagewise vs StepwiseTime ...Deep Neural Networks for High Dimension, Low Sample Size Data
Publication: IJCAI’17: Proceedings of the 26th International Joint Conference on Artificial IntelligenceAugust 2017code
GBFS算法：http://www.cse.wustl.edu/˜xuzx/research/code/GBFS.zip（已连不上）
HSICLasso code: http://www.makotoyamadaml.com/software.html（页面中已过期）dataset
Biological datasets: http://featureselection.asu.edu/datasets.php
Introduction
In bioinformatics, gene expression data suffers from the growing challenges of high dimensionality and low sample size. This kind of high dimension, low sample size (HDLSS) data is also vital for scientific discoveries in other areas such as chemistry, financial engineering, and etc [Fan and Li, 2006]. When processing this kind of data, the severe overfitting and highvariance gradients are the major challenges for the majority of machine learning algorithms [Friedman et al., 2000].
Feature selection has been widely regarded as one of the most powerful tools to analyze the HDLSS data. However, selecting the optimal subset of features is known to be NPhard [Amaldi and Kann, 1998]. Instead, a large body of compromised methods for feature selection have been proposed.
 Lasso [Tibshirani, 1996] pursue sparse linear models：sparse linear models ignore the nonlinear inputoutput relations and interactions among features.
 nonlinear feature selection via kernel methods [Li et al., 2005; Yamada et al., 2014] or gradient boosted tree：address the curse of dimensionality under the blessing of large sample size.
The deep neural networks (DNN) methods light up new scientific discoveries. DNN has achieved breakthroughs in modeling nonlinearity in wide applications. The deeper architecture of a DNN is, the more complex relations it can model. DNN has harvested initial successes in bioinformatics for modeling splicing [Xiong et al., 2015] and sequence specificity [Alipanahi et al., 2015]. Estimating a huge amount of parameters for DNN using abundant samples may suffer from severe overfitting, not to mention the HDLSS setting.
To address the challenges of the HDLSS data, we propose an endtoend DNN model called Deep Neural Pursuit (DNP). DNP simultaneously selects features and learns a classifier to alleviate severe overfitting caused by high dimensionality. By averaging over multiple dropouts, DNP is robust and stable to highvariance gradients resulting from the small sample size. From the perspective of feature selection, the DNP model selects features greedily and incrementally, similar to the matching pursuit [Pati et al., 1993]. More concretely, starting from an empty subset of features and a bias, the proposed DNP method incrementally selects an individual feature according to the backpropagated gradients. Meantime, once more features are selected, DNP is updated using the backpropagation algorithm.
The main contribution of this paper is to tailor the DNN for the HDLSS setting using feature selection and multiple dropouts.
Related Work
we discuss feature selection methods that are used to analyze the HDLSS data including linear, nonlinear and incremental methods.
 sparsityinducing regularizer is one of the dominating feature selection methods for the HDLSS data.
Lasso [Tibshirani, 1996] minimizes the objective function penalized by the l_1 norm of feature weights, leading to a sparse model. Unfortunately, Lasso ignores the nonlinearity and interactions among features.  (1) Kernel methods are often used for nonlinear feature selection.
Feature Vector Machine (FVM) [Li et al., 2005]；
HSICLasso [Yamada et al., 2014] improves FVM by allowing different kernel functions for features and labels；
LAND [Yamada et al., 2016] further accelerates HSICLasso for data with large sample size via kernel approximation and distributed computation
(2) Decision tree models are also qualified for modeling nonlinear inputoutput relations.
random forests [Breiman, 2001]
Gradient boosted feature selection (GBFS) [Xu et al., 2014]
The aforementioned nonlinear methods, including FVM, random forests and GBFS, require training data with large sample size.
HSICLasso and LAND fits the HDLSS setting. However, compared to the proposed DNP model which is endtoend, HSICLasso and LAND are twostage algorithms which separate feature selection from the classification  Besides DNP method, there exist other greedy and incremental feature selection algorithms.
**SpAM：**sequentially selects an individual feature in an additive manner, thereby missing important interactions among features.
Grafting method & convex neural network ：only consider single hidden layer；differ from DNP in the motivation.（Grafting focuses on the acceleration of algorithms and convex neural network focuses on the theoretical understanding of neural networks.）  Deep feature selection (DFS)：selects features in the context of DNN；However, according to our experiments, DFS fails to achieve sparse connections when facing the HDLSS data.
DNP Model
introduce notations：$FϵR^{d}$ ——input feature space in the ddimension；
$X=(X_{1},X_{2},…,X_{n})$，$y=(y_{1},y_{2},…,y_{n})^{T}$—— data matrix of n samples and their corresponding labels（d≫n）
$f(XW)$——a feedforward neural network whose weights of all connections are denoted by W
$W_{F}$——the input weights which are the weights of connections between the input layer and the first hidden layer
$G_{F}$——the corresponding gradients
Figure 1: (1) The selected features and the corresponding subnetwork.
(2) The selection of a single feature.
(3) Calculate gradients with lower variance via multiple dropouts.DNP for High Dimensionality
detail the DNP model for feature selection which alleviates overfitting caused by the high dimensionality.
For a feedforward neural network, we select a specific input feature if at least one of the connections associated with that feature has nonzero weight.
To achieve this goal, we place the $l_{p,1}$ norm to constrain the input weights, i.e.,$W_{F} _{(p,1)}$
$W_{F_{j} }$——the weights associated with the jth input node in $W_{F}$
define the $l_{p,1}$ norm of the input weights as $W_{F} _{(p,1)}={\sum}_{j}W_{F_{j}} _{p}$ , where $·_{p}$ is the $l_{p}$ norm on a vector.
we assume that weights in $W_{F_{j} }$ form a group.
A general form of the objective function for training the feedforward network in formulated as:
we only consider the binary classification problem and use the logistic loss in problem (1). (Extensions to multiclass classification, regression or unsupervised reconstruction are very easy.)To directly optimize problem (1) over the HDLSS data is highly tricky for two reasons：(1) directly minimizing the l_(p,1)constrained problem is difficult for the back propagation algorithm. direct optimization using all features easily gets stuck in a local optimum which suffers from severe overfitting.
Instead, we optimize problem (1) in a greedy and incremental manner.The main idea of the proposed DNP：we optimize problem (1) over a small subnetwork containing a small subset of features, which is less difficult. The information obtained during the training process, in turn, guides us to incorporate more features, and the subnetwork serves as the initialization for a larger subnetwork with more features involved.
The DNP method enjoys two advantages：(1) the optimization improves to a large extent.
(2) DNP simultaneously selects features and minimizes the training loss over the labeled data in an endtoend manner; selection process is not independent of the learning processThe whole process of the feature selection in the DNP:
We maintain two sets, i.e., a selected set $S$ and a candidate set $C$, with $S ∪ C = F$.
(Step 7) how to select features using $G_{F}$?the gradient’s magnitude implies how much the objective function may decrease by updating the corresponding weight; the norm of a group of gradients infers how much the loss may decrease by updating this group of weights together
we assume that the larger the $G_{F_{j}}_{q}$ is, the more jth feature contributes to minimizing problem (1). Consequently, we select the feature with the maximum $G_{F_{j}}_{q}$
DNP for Small Sample Size
we present the use of multiple dropouts to handle highvariance gradients caused by the small sample size.
Multiple dropouts could improve our DNP method in the following two algorithmic aspects：(1) step 6: DNP randomly drops neurons multiple times, computes $G_{F_{c}}$ based on the remaining neurons and connections, and averages multiple $G_{F_{c}}$. Such obtains averaged gradients with low variance.
(2) multiple dropouts empower DNP with the stable feature selection. Multiple dropouts combine selected features over many random subnetworks to make the DNP method more stable and powerful.Stagewise vs Stepwise
Updating input weights $W_{S}$ in step 5 of Algorithm 1 has two choices, i.e., the stagewise and stepwise approaches.
We combine both approaches. We dynamically adapt the learning rate for each weight according to the Adagrad. As a result, like the stepwise approach, all selected weights $W_{S}$ enjoy updates but, like the stagewise approach, newly selected features $G_{F_{j}}$ enjoy more.Time Complexity
The time complexity of DNP is dominated by the backpropagation which is O(hknd), where h is a constant decided by the network structure of DNP. The time complexity grows linearly with respect to the number of selected features k, the sample size n, and the feature dimension d.
Experiments
General experimental part
We compare the proposed DNP method with three representative feature selection algorithms, including $l_{1}$penalized logistic regression (LogR$l_{1}$), gradient boosted feature selection (GBFS) [Xu et al., 2014], and HSICLasso [Yamada et al., 2014].
evaluation standard: the F1 score of correct selection of true features (identify features that the labels truly depend on); the test AUC score (learn an accurate classifier based on selected features).
Experiments on Synthetic Data
We first synthesize highly complex and nonlinear data to investigate the performance of different algorithms.
generate the synthetic data: we firstly draw input samples X from the uniform distribution U(1,1) (feature dimension d is fixed to be 10,000); Afterwards, we obtain the corresponding labels by passing X into the feedforward neural network with {50, 30, 15, 10} ReLU hidden units in four hidden layers. Input weights connecting with the first m dimensions, i.e., $W_{F_{1…m}}$, are randomly sampled from a Gaussian distribution N(0,0.5). The remaining connections are kept zero. (first m features are the true features that decide the label). In order to add noises into data, we randomly flip 5% labels. For each setting of m, we generate 800 training samples, 200 validation samples, and 7,500 test samples (sample sizes ≪d)
When m = 2, we visualize the decision boundaries learned by different algorithms:
Figure 2: Decision boundaries learned by different algorithms based on 10,000dimensional synthetic data with two true features. The xaxis and yaxis denote the two true features.
Figures (a) and (b) plot the positive samples with black and ©(f) plot the predicted positive samples with black.LogR$l_{1}$ only learns a linear decision boundary which is insufficient for highly complex and nonlinear data. The GBFS uses the regression tree as a base learner, thereby achieving an axisparallel decision boundary. The HSICLasso and the proposed DNP not only model the nonlinear decision boundaries but also exactly identify the two true features.
Table 1: Performance of classification and feature selection on synthetic datasets with different numbers of true features. The statistically best performance is shown in bold.In terms of the test AUC score, DNP and HSICLasso both show superior performance over others. DNP performs best on all the datasets and significantly outperforms HSICLasso when m = 10 in terms of the ttest (pvalue < 0.05).
In terms of the F1 score for feature selection, DNP performs the best on all datasets and it even outperforms the most competitive baseline, HSICLasso, by 8.65% on average.
GBFS consistently performs worst in terms of both classification and feature selection.
Experiments on RealWorld Biological Datasets
To investigate the performance of DNP on the realworld datasets, we use six public biological datasets, all of which suffer from the HDLSS problem.
We report the average results for 10 times random split with 80% data for training, 10% for validation, and 10% for testing.
In Fig. 3, we investigate the average test AUC scores with respect to the number of selected features.
We use a circle as an indicator when DNP is outperformed by the best baseline and a star when DNP outperforms the best baseline significantly (ttest, pvalue < 0.05).On all six datasets, test AUC scores of DNP converge quickly within fewer than 10 iterations.
On the leukemia dataset, the proposed DNP method significantly outperforms the best baseline no matter how many features are selected.
For the ALLAML and Prostate GE dataset, LogRl_1 serves as a competitive baseline as it outperforms other methods when few features are selected. However, DNP achieves a comparable test AUC score when more features are involved.
For the other three datasets, DNP outperforms GBFS significantly and performs comparable to LogRl_1 and HSICLasso.
On average across six realworld datasets, DNP outperforms the most competitive baseline, HSICLasso, by 2.53% in terms of the average test AUC score.
In summary, DNP can achieve comparable or improved performance over baselines on the six realworld datasets.
Unique experiments in the improved model
the effect of the size of the training data
we compare DNP with the baselines by varying the sample size for training, while the sample sizes for validation and test are kept fixed.
Fig. 4 shows the average test AUC scores across six realworld datasets.
All the methods in comparison suffer as the training sample size decreases. GBFS, designed for large sample size, suffers the most. LogR$l_{1}$, HSICLasso, and DNP perform similarly in small sample size. However, when only 10% or 30% training samples are used, DNP slightly outperforms other baselines.the role of multiple dropouts
we compare the performance of DNP with and without multiple dropouts.
we can see that multiple dropouts can improve the test AUC score on five out of six datasets. We measure the stability of the algorithms with the Tanimoto distance [Kalousis et al., 2007].we measure the stability of DNP by averaging the similarities calculated from all pairs of training sets generated from 10fold cross validation. A higher stability score implies a more stable algorithm.
DNP with multiple dropouts is clearly more stable than DNP without dropout on the all six datasets.
how the hyperparameters influence the performance of DNP
For DNP model with the specific number of hidden layers, we calculate the average test AUC score of 10 times random split.
Table 3: The best number of hidden layers for DNP
On five realworld datasets, DNP with three or four hidden layers outperforms that with one, two or five hidden layers. The results coincide with our motivation that deeper neural networks are more qualified for complex datasets. Meantime, due to the small sample size, training DNNs with more hidden layers is extremely challenging, which incurs inferior performances.Conclusions
We propose a DNP model tailored for the high dimension, low sample size data.
DNP can select features in a nonlinear way. With an incremental manner to select features, DNP is robust to high dimensionality.
By using the multiple dropouts technique, DNP can learn from a small number of samples and is stable for feature selection.
Moreover, the training of DNP is endtoend. Empirical results verify its good performance in both classification and feature selection.
In the future, we plan to use sophisticated network architectures in replace of a simple multilayer perceptron and apply DNP to more domains that suffer from the HDLSS problem.
（有该论文的自己总结版的汇报PPT需要私聊）

6531507365520760highresolutiononedimension.rar
20210507 15:06:30目标运动对高分辨率一维距离像的影响研究 
Integer overflow for random DSCC matrices with a high dimension.
20201229 20:56:43<div><p>When trying to create a random sparse matrix of size 10^6 x 10^6, a bug occurs based on an integer overflow. <p>Test to validate: <pre><code> public void hugeDimensionalMatrix() { ... 
Algorithm: Principle Component Analysis for High Dimension Reduction Data
20190924 09:06:38Instead, as mentioned in the lectures, you can implement PCA in a more efficient manner, which we call "PCA for high dimensional data" (PCA_high_dim). Below are the steps for performing PCA for ...The data preprocessing as standarlization or feature Scaling:
https://en.wikipedia.org/wiki/Feature_scaling
Before we implement PCA, we will need to do some data preprocessing. In this assessment, some of them will be implemented by you, others we will take care of. However, when you are working on real world problems, you will need to do all these steps by yourself!
The preprocessing steps we will do are
 Convert unsigned interger 8 (uint8) encoding of pixels to a floating point number between 01.
 Subtract from each image the mean μμ.
 Scale each dimension of each image by 1σ1σ where σσ is the stardard deviation.
1. PCA
Now we will implement PCA. Before we do that, let's pause for a moment and think about the steps for performing PCA. Assume that we are performing PCA on some dataset XX for MM principal components. We then need to perform the following steps, which we break into parts:
 Data normalization (
normalize
).  Find eigenvalues and corresponding eigenvectors for the covariance matrix SS. Sort by the largest eigenvalues and the corresponding eigenvectors (
eig
).
After these steps, we can then compute the projection and reconstruction of the data onto the spaced spanned by the top n eigenvectors.
Recall that the principle basis is the vector related the max eigenvalue of the covariance matrix.
Code：
# PACKAGE: DO NOT EDIT THIS CELL import numpy as np import timeit # PACKAGE: DO NOT EDIT THIS CELL import matplotlib as mpl mpl.use('Agg') import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') from ipywidgets import interact from load_data import load_mnist MNIST = load_mnist() images, labels = MNIST['data'], MNIST['target'] # GRADED FUNCTION: DO NOT EDIT THIS LINE def normalize(X): """Normalize the given dataset X Args: X: ndarray, dataset Returns: (Xbar, mean, std): tuple of ndarray, Xbar is the normalized dataset with mean 0 and standard deviation 1; mean and std are the mean and standard deviation respectively. Note: You will encounter dimensions where the standard deviation is zero, for those when you do normalization the normalized data will be NaN. Handle this by setting using `std = 1` for those dimensions when doing normalization. """ #mu = np.zeros(X.shape[1]) # < EDIT THIS, compute the mean of X mu = np.mean(X, axis = 0, keepdims = True) std = np.std(X, axis = 0, keepdims = True) std_filled = std.copy() std_filled[std==0] = 1. #Xbar = X # < EDIT THIS, compute the normalized data Xbar Xbar = (X  mu) / std_filled return Xbar, mu, std def eig(S): """Compute the eigenvalues and corresponding eigenvectors for the covariance matrix S. Args: S: ndarray, covariance matrix Returns: (eigvals, eigvecs): ndarray, the eigenvalues and eigenvectors Note: the eigenvals and eigenvecs should be sorted in descending order of the eigen values """ eVals, eVecs = np.linalg.eig(S) order = np.argsort(eVals)[::1] # sort the eigenvals in descending order. eVals = eVals[order] eVecs = eVecs[:, order] return (eVals, eVecs) # < EDIT THIS to return the eigenvalues and corresponding eigenvectors def projection_matrix(B): """Compute the projection matrix onto the space spanned by `B` Args: B: ndarray of dimension (D, M), the basis for the subspace Returns: P: the projection matrix """ P = B @ np.linalg.inv(B.T @ B) @ B.T return P # < EDIT THIS to compute the projection matrix def PCA(X, num_components): """ Args: X: ndarray of size (N, D), where D is the dimension of the data, and N is the number of datapoints num_components: the number of principal components to use. Returns: X_reconstruct: ndarray of the reconstruction of X from the first `num_components` principal components. """ # your solution should take advantage of the functions you have implemented above. N, D = X.shape X_normalized, mu, std = normalize(X) X_normalized.shape # covariance matrix with mean 0 S = (X_normalized.T @ X_normalized) / N code, onb = eig(S) code = code[:num_components] onb = onb[:, :num_components] # P with the dimension(D, D) P = projection_matrix(onb) X_projection = P @ X.T return X_projection.T # < EDIT THIS to return the reconstruction of X ## Some preprocessing of the data NUM_DATAPOINTS = 1000 X = (images.reshape(1, 28 * 28)[:NUM_DATAPOINTS]) / 255. Xbar, mu, std = normalize(X) for num_component in range(1, 20): from sklearn.decomposition import PCA as SKPCA # We can compute a standard solution given by scikitlearn's implementation of PCA pca = SKPCA(n_components=num_component, svd_solver='full') sklearn_reconst = pca.inverse_transform(pca.fit_transform(Xbar)) reconst = PCA(Xbar, num_component) np.testing.assert_almost_equal(reconst, sklearn_reconst) print(np.square(reconst  sklearn_reconst).sum())
Result：
(8.5153870005e24+0j) (8.09790151532e24+0j) (9.61487939311e24+0j) (6.39164394758e24+0j) (1.19817697147e23+0j) (9.18939009489e24+0j) (2.46356799263e23+0j) (2.04450491509e23+0j) (2.35281327024e23+0j) (2.33297802189e22+0j) (9.45193136857e23+0j) (9.82734807213e23+0j) (1.596514124e22+0j) (7.20916435378e23+0j) (2.9098190907e23+0j) (3.7462168164e23+0j) (3.22053322424e23+0j) (2.71427239921e23+0j) (1.11240190546e22+0j)
Calculate the MSE for data set
def mse(predict, actual): """Helper function for computing the mean squared error (MSE)""" return np.square(predict  actual).sum(axis=1).mean() loss = [] reconstructions = [] # iterate over different number of principal components, and compute the MSE for num_component in range(1, 100): reconst = PCA(Xbar, num_component) error = mse(reconst, Xbar) reconstructions.append(reconst) print('n = {:d}, reconstruction_error = {:f}'.format(num_component, error)) loss.append((num_component, error)) reconstructions = np.asarray(reconstructions) reconstructions = reconstructions * std + mu # "unnormalize" the reconstructed image loss = np.asarray(loss) import pandas as pd # create a table showing the number of principal components and MSE pd.DataFrame(loss).head() fig, ax = plt.subplots() ax.plot(loss[:,0], loss[:,1]); ax.axhline(100, linestyle='', color='r', linewidth=2) ax.xaxis.set_ticks(np.arange(1, 100, 5)); ax.set(xlabel='num_components', ylabel='MSE', title='MSE vs number of principal components'); @interact(image_idx=(0, 1000)) def show_num_components_reconst(image_idx): fig, ax = plt.subplots(figsize=(20., 20.)) actual = X[image_idx] # concatenate the actual and reconstructed images as large image before plotting it x = np.concatenate([actual[np.newaxis, :], reconstructions[:, image_idx]]) ax.imshow(np.hstack(x.reshape(1, 28, 28)[np.arange(10)]), cmap='gray'); ax.axvline(28, color='orange', linewidth=2) @interact(i=(0, 10)) def show_pca_digits(i=1): """Show the i th digit and its reconstruction""" plt.figure(figsize=(4,4)) actual_sample = X[i].reshape(28,28) reconst_sample = (reconst[i, :] * std + mu).reshape(28, 28) plt.imshow(np.hstack([actual_sample, reconst_sample]), cmap='gray') plt.show()
2. PCA for highdimensional datasets
Sometimes, the dimensionality of our dataset may be larger than the number of samples we have. Then it might be inefficient to perform PCA with your implementation above. Instead, as mentioned in the lectures, you can implement PCA in a more efficient manner, which we call "PCA for high dimensional data" (PCA_high_dim).
Below are the steps for performing PCA for high dimensional dataset
# GRADED FUNCTION: DO NOT EDIT THIS LINE ### PCA for high dimensional datasets def PCA_high_dim(X, n_components): """Compute PCA for small sample size but highdimensional features. Args: X: ndarray of size (N, D), where D is the dimension of the sample, and N is the number of samples num_components: the number of principal components to use. Returns: X_reconstruct: (N, D) ndarray. the reconstruction of X from the first `num_components` pricipal components. """ N, D = X.shape S_prime = (X @ X.T) / N code_prime, onb_prime = eig(S_prime) code_prime = code_prime[:n_components] onb_prime = onb_prime[:, :n_components] # calculate the principle subspace U U = X.T @ onb_prime # (D, N) @ (N, n_components) P = projection_matrix(U) X_projection = P @ X.T return X_Projection.T # < EDIT THIS to return the reconstruction of X
Test CASE
np.testing.assert_almost_equal(PCA(Xbar, 2), PCA_high_dim(Xbar, 2))
Time Complexity Analysis:
Now let's compare the running time between
PCA
andPCA_high_dim
.Tips for running benchmarks or computationally expensive code:
When you have some computation that takes up a nonnegligible amount of time. Try separating the code that produces output from the code that analyzes the result (e.g. plot the results, comput statistics of the results). In this way, you don't have to recompute when you want to produce more analysis.
The next cell includes a function that records the time taken for executing a function
f
by repeating it forrepeat
number of times. You do not need to modify the function but you can use it to compare the running time for functions which you are interested in knowing the running time.def time(f, repeat=10): times = [] for _ in range(repeat): start = timeit.default_timer() f() stop = timeit.default_timer() times.append(stopstart) return np.mean(times), np.std(times)
times_mm0 = [] times_mm1 = [] # iterate over datasets of different size for datasetsize in np.arange(4, 784, step=20): XX = Xbar[:datasetsize] # select the first `datasetsize` samples in the dataset # record the running time for computing X.T @ X mu, sigma = time(lambda : XX.T @ XX) times_mm0.append((datasetsize, mu, sigma)) # record the running time for computing X @ X.T mu, sigma = time(lambda : XX @ XX.T) times_mm1.append((datasetsize, mu, sigma)) times_mm0 = np.asarray(times_mm0) times_mm1 = np.asarray(times_mm1) fig, ax = plt.subplots() ax.set(xlabel='size of dataset', ylabel='running time') bar = ax.errorbar(times_mm0[:, 0], times_mm0[:, 1], times_mm0[:, 2], label="$X^T X$ (PCA)", linewidth=2) ax.errorbar(times_mm1[:, 0], times_mm1[:, 1], times_mm1[:, 2], label="$X X^T$ (PCA_high_dim)", linewidth=2) ax.legend()
Benchmark for PCA and PCA high dimension
times0 = [] times1 = [] # iterate over datasets of different size for datasetsize in np.arange(4, 784, step=100): XX = Xbar[:datasetsize] npc = 2 mu, sigma = time(lambda : PCA(XX, npc), repeat=10) times0.append((datasetsize, mu, sigma)) mu, sigma = time(lambda : PCA_high_dim(XX, npc), repeat=10) times1.append((datasetsize, mu, sigma)) times0 = np.asarray(times0) times1 = np.asarray(times1) fig, ax = plt.subplots() ax.set(xlabel='number of datapoints', ylabel='run time') ax.errorbar(times0[:, 0], times0[:, 1], times0[:, 2], label="PCA", linewidth=2) ax.errorbar(times1[:, 0], times1[:, 1], times1[:, 2], label="PCA_high_dim", linewidth=2) ax.legend();

Highdimension experimental tomography of a pathencoded photon quantum state
20210222 06:08:14Quantum information protocols often rely on tomographic techniques to determine the state of the system. A popular method of encoding information is on the different paths a photon may take, e.g., ... 
Fatal Exception: java.lang.OutOfMemoryError when using high dimension images as icons
20201204 13:04:20<p>is invoked and the app crashes when high dimension images are used in production (in debug mode some of the images turn gray). <p><strong>To Reproduce 1) Create a map 2) Add 500+ points with ... 
Similarity Search in High Dimension via Hashing LSH 原始算法详解
20161220 13:30:22摘要 最近邻查询或近邻查询问题出现在大量的数据库应用中，通常在相似性搜索的上下文中。 最近，对建立用于对高维数据执行相似性搜索的搜索索引结构，例如图像数据库，文档集合， 时间序列数据库和基因组数据库。...参考：http://wenku.baidu.com/link?url=gWqvPnuc16G3zZSAXJLvTdnRw81RBFNqoj1gCGB9DZxLYMC10fGqqoYiDCsgjCrZakyd6yR9PmYQHuMKdWha0pVSE_Udperc0AQc3vuu
LSH原文：www.people.csail.mit.edu/indyk/

High Cardinality与Line Item Dimension之原理分析
20130319 18:16:005、对于类似于“Sales Number”这样高Cardinality值的Line Item Dimension，同时将其设置成为High Cardinality维度。 转载于:https://www.cnblogs.com/hanmos/archive/2013/03/19/2969710.html 
DIMENSION NUMERICAL MODELING OF SRRS EFFECTS ON BEAM DISTRIBUTION IN NEAR FIELD AFTER ICF HIGH POWER...
20210206 17:32:59STUDY ON FOURDIMENSION NUMERICAL MODELING OF SRRS EFFECTS ON BEAM DISTRIBUTION IN NEAR FIELD AFTER ICF HIGH POWER ULTRAVIOLET LASER PROPAGATING THROUGH A LONG AIR PATH 
sep(a)CMAES does escape the solution basin on fsphere in high dimension with gradient injection
20201203 04:36:02<div><p>First, comparing sepaCMAES with and without gradient injection (GI), GI allows fast convergence: The difference in initial value is due to not showing iteration 0 (will be ticketed and ... 
论文研究Control technology of pore dimension in the photoelectrochemical etching for high aspect ...
20190817 01:15:43高长径比宏孔硅阵列光电化学腐蚀中孔径控制技术，王国政，王蓟，高长径比宏孔硅阵列（MSA）在光子晶体、硅微通道板、MEMS 器件等领域应用前景广阔，引起人们广泛关注。为制备理想的MSA结构，本文开� 
Lag in the Asteroids dimension
20201229 13:38:38<div><p>I think the amount of mob spawns in this dimension is a bit much. Since there's no time and it's always "night", so mobs can constantly spawn. Eventually the amount of mobs at ... 
convert dimension for original video
20201226 01:31:13<p>Our video resolution has a high resolution(2048*1536) , which decreases after entering the process of finding objects. After finding the location of the objects, we will convert the coordinates of ... 
Mismatch of dimension in the CNNpolicy
20210107 04:35:31((processed_observations  ob_space.low) / (ob_space.high  ob_space.low)) <p>The tensor "processed_observations" has shape (, 8) but the pandas variable ob_space.low has shape (8,). The ... 
IIP crashes on images large in one dimension
20201202 15:19:17If we ask for a thumbnail of 20 pixels high, for example, it works fine on the same image because this only gives us an image of 712px high. If we ask for one that is 30px, however, it tries to ...