danielhanchen / hyperlearn

Currently, I'm working to remove the overhead between Python and Fortran LAPACK code via directly using the Cython interface. I will be wrapping:

[S,D]LACPY (fast copy X into A --> Scipy uses Numpy's primitive asfortranarray copy ==> very slow. In fact, bottleneck of large matrix operations is within the copy!) [I tested scipy's SGESDD vs new --> approx 5 to 30% faster, but on super small problems (n < 100, p < 10), 50% slower]
[S,D]GESDD, [S,D]GESVD. Have you ever realised MemoryErrors? Did you know Scipy was copying your data 2 times? https://github.com/scipy/scipy/issues/9682. Currently fixing. Test: N=1,000,000 P=100 --> Old scipy uses 450MB running in 3.25s. I use a minuscue 30MB and runs in 2.05s
[S,D]GEQRF. For Randomized SVD, only the Q Orthogonal factor is needed. A separate function will be made.
[S,D]GEQRT3: For tall skinny matrices, a recursive QR algo might be faster since X @ Q is needed.
[S,D]SYEVR: MRRR Eigenvector algo using O(N^2) FLOPS.
[S,D]SYEVD: Divide n Conquer O(N^3) Eigenvector algo
[S,D]GETRF: LU Factorization but only L permuted factor. Uses [S,D]LASWP as well. Wraps Scipy's LU into 1 nice fast function to get only the L factor used in Randomized SVD.
[S,D]POTRF: Cholesky Factorization. Separate chained Epsilon Jitter Cholesky. Reduces overhead.
[S,D]POTRS: Cholesky Solve for XTX, XTy. Since Fortran array is faster, will use XTy for 1 array, but (y.T @ X).T for multi ys {fortran array output)
[S,D]POTRI: Cholesky Inverse. Used in PINVC.
Probably [C,Z]HEEVD / [C,Z]HEEVR / [C,Z]GESDD / [C,Z]GESVD for Dynamic Mode Decomposition.

Confidence, Prediction Intervals, PRESS, AIC, BIC

Linear Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic
Ridge Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic RidgeCV --> AIC, LOOCV based
Logistic Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic Hessian Matrix for CI, PI
Softmax Regression: CI, PI, AIC, BIC, PRESS, R2, Adj R2, Hypothesis Tests, P-Values, F-Statistic Maybe use Uber's Pyro for Bayesian Credible Intervals etc

LDA & QDA