Currently, I'm working to remove the overhead between Python and Fortran LAPACK code via directly using the Cython interface. I will be wrapping:
-
[S,D]LACPY (fast copy X into A --> Scipy uses Numpy's primitive asfortranarray copy ==> very slow. In fact, bottleneck of large matrix operations is within the copy!) [I tested scipy's SGESDD vs new --> approx 5 to 30% faster, but on super small problems (n < 100, p < 10), 50% slower]
-
[S,D]GESDD, [S,D]GESVD. Have you ever realised MemoryErrors? Did you know Scipy was copying your data 2 times? https://github.com/scipy/scipy/issues/9682. Currently fixing. Test: N=1,000,000 P=100 --> Old scipy uses 450MB running in 3.25s. I use a minuscue 30MB and runs in 2.05s
-
[S,D]GEQRF. For Randomized SVD, only the Q Orthogonal factor is needed. A separate function will be made.
-
[S,D]GEQRT3: For tall skinny matrices, a recursive QR algo might be faster since X @ Q is needed.
-
[S,D]SYEVR: MRRR Eigenvector algo using O(N^2) FLOPS.
-
[S,D]SYEVD: Divide n Conquer O(N^3) Eigenvector algo
-
[S,D]GETRF: LU Factorization but only L permuted factor. Uses [S,D]LASWP as well. Wraps Scipy's LU into 1 nice fast function to get only the L factor used in Randomized SVD.
-
[S,D]POTRF: Cholesky Factorization. Separate chained Epsilon Jitter Cholesky. Reduces overhead.
-
[S,D]POTRS: Cholesky Solve for XTX, XTy. Since Fortran array is faster, will use XTy for 1 array, but (y.T @ X).T for multi ys {fortran array output)
-
[S,D]POTRI: Cholesky Inverse. Used in PINVC.
-
Probably [C,Z]HEEVD / [C,Z]HEEVR / [C,Z]GESDD / [C,Z]GESVD for Dynamic Mode Decomposition.
- MA Moving Average Models
- AR AutoRegressive Models
- ARMA Models
- ARIMA Models
- VAR Vector AR models
- VARMA models
- Ridge penalised
- FFT based
- Bonferonni Adjustment
- Holm-Bonferonni Adjustment
- Sidak Adjustment / Test
- Wald Tests using Chi2 instead of T tests
Confidence, Prediction Intervals, PRESS, AIC, BIC
-
Linear Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic
-
Ridge Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic RidgeCV --> AIC, LOOCV based
-
Logistic Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic Hessian Matrix for CI, PI
-
Softmax Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic Maybe use Uber's Pyro for Bayesian Credible Intervals etc
LDA & QDA
- Linear Discriminant Analysis
- Quadratic Discriminant Analysis
- LDA Inference
- PCA using new SVD
- Fast PCA using new Truncated SVD
- Port UMAP to HyperLearn
- PyTorch autoencoder drop in replacement
- LDA based reduction
- PCA Biplots, inference.
- PyTorch does not have lstsq, so Numba instead.
- Batch Sequential lstsq?
- PyTorch Gradient Descent solving for regression
- PyTorch Gradient Descent solving for classification