Skip to content

Cython linking LAPACK

Updated Jan 14, 2019
  

Currently, I'm working to remove the overhead between Python and Fortran LAPACK code via directly using the Cython interface. I will be wrapping:

  1. [S,D]LACPY (fast copy X into A --> Scipy uses Numpy's primitive asfortranarray copy ==> very slow. In fact, bottleneck of large matrix operations is within the copy!) [I tested scipy's SGESDD vs new --> approx 5 to 30% faster, but on super small problems (n < 100, p < 10), 50% slower]

  2. [S,D]GESDD, [S,D]GESVD. Have you ever realised MemoryErrors? Did you know Scipy was copying your data 2 times? https://github.com/scipy/scipy/issues/9682. Currently fixing. Test: N=1,000,000 P=100 --> Old scipy uses 450MB running in 3.25s. I use a minuscue 30MB and runs in 2.05s

  3. [S,D]GEQRF. For Randomized SVD, only the Q Orthogonal factor is needed. A separate function will be made.

  4. [S,D]GEQRT3: For tall skinny matrices, a recursive QR algo might be faster since X @ Q is needed.

  5. [S,D]SYEVR: MRRR Eigenvector algo using O(N^2) FLOPS.

  6. [S,D]SYEVD: Divide n Conquer O(N^3) Eigenvector algo

  7. [S,D]GETRF: LU Factorization but only L permuted factor. Uses [S,D]LASWP as well. Wraps Scipy's LU into 1 nice fast function to get only the L factor used in Randomized SVD.

  8. [S,D]POTRF: Cholesky Factorization. Separate chained Epsilon Jitter Cholesky. Reduces overhead.

  9. [S,D]POTRS: Cholesky Solve for XTX, XTy. Since Fortran array is faster, will use XTy for 1 array, but (y.T @ X).T for multi ys {fortran array output)

  10. [S,D]POTRI: Cholesky Inverse. Used in PINVC.

  11. Probably [C,Z]HEEVD / [C,Z]HEEVR / [C,Z]GESDD / [C,Z]GESVD for Dynamic Mode Decomposition.

Time Series Modelling

Updated Aug 28, 2018
  
  1. MA Moving Average Models
  2. AR AutoRegressive Models
  3. ARMA Models
  4. ARIMA Models
  5. VAR Vector AR models
  6. VARMA models
  7. Ridge penalised
  8. FFT based

Simultaneous Inference

Updated Aug 28, 2018
  
  1. Bonferonni Adjustment
  2. Holm-Bonferonni Adjustment
  3. Sidak Adjustment / Test
  4. Wald Tests using Chi2 instead of T tests

Statistical Inference on Models

Updated Aug 28, 2018
  

Confidence, Prediction Intervals, PRESS, AIC, BIC

  1. Linear Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic

  2. Ridge Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic RidgeCV --> AIC, LOOCV based

  3. Logistic Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic Hessian Matrix for CI, PI

  4. Softmax Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic Maybe use Uber's Pyro for Bayesian Credible Intervals etc

Discriminant Analysis

Updated Aug 28, 2018
  

LDA & QDA

  1. Linear Discriminant Analysis
  2. Quadratic Discriminant Analysis
  3. LDA Inference

Dimensionality Reduction

Updated Aug 28, 2018
  
  1. PCA using new SVD
  2. Fast PCA using new Truncated SVD
  3. Port UMAP to HyperLearn
  4. PyTorch autoencoder drop in replacement
  5. LDA based reduction
  6. PCA Biplots, inference.

Least Squares Solving

Updated Aug 28, 2018
  
  1. PyTorch does not have lstsq, so Numba instead.
  2. Batch Sequential lstsq?
  3. PyTorch Gradient Descent solving for regression
  4. PyTorch Gradient Descent solving for classification
You can’t perform that action at this time.