Not AI
I like big .vimrc and I cannot lie
- Sofia, Bulgaria
-
15:00
(UTC +03:00) - https://ggerganov.com
- @ggerganov
- user/ggerganov
Block or Report
Block or report ggerganov
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
1,098 contributions in the last year
Less
More
Contribution activity
March 2023
Created 128 commits in 3 repositories
Created 1 repository
-
ggerganov/llama.cpp
C
•
Built by
Created a pull request in ggerganov/llama.cpp that received 12 comments
Reduce memory usage and allocate enough memory for largest context
Utilize ggml scratch buffers to reduce memory usage (see ggerganov/whisper.cpp#431 for more info)
Disable BLAS for matrix multiplications where src0…
+307
−80
•
12
comments
Opened 11 other pull requests in 2 repositories
ggerganov/llama.cpp
8
merged
1
closed
- Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON
- Retire the ggml_mul_mat() branch for transposed src0
- Immediately start processing the prompt before user input has been provided
- Avoid the "non-contiguous X" branch in the Z = X * Y matrix multiplication
- IMPORTANT: Introduce C-style API - Major Refactoring
- Add tokenizer test + revert to C++11
- Add tokenizer test + revert to C++11
- Use vdotq_s32 to improve performance
- Add Github CI
ggerganov/whisper.cpp
1
merged
1
open
Reviewed 72 pull requests in 2 repositories
ggerganov/llama.cpp
25 pull requests
- Be more strict about converting float to double
- Refactor quantized processing functions
- (Windows) Set console to UTF-8 on init
- CMake / CI additions
- Add AVX2 implementation of dequantize_row_q4_1
- Add timings for the prompt evaluation
- Add support for file load progress reporting callbacks
- additional optimizations for POWER9
- Support calling mlock() on loaded model data on Linux and macOS
- Fix quantize script not finding models in parent directory
- Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode
- Generate library with CMake
- Proof of concept TCP server mode
- Deduplicate q4 quantization functions
- fix: add POSIX functionality for Linux compilation
- cmake: make llama an actual library
- Add a Package.swift for SwiftPM support
- Add embedding mode with arg flag. Currently working
- fix perplexity after c-api refactor
- IMPORTANT: Introduce C-style API - Major Refactoring
- Fix color codes emitting mid-UTF8 code.
- Importer for GPTQ quantized LLaMA models
- Added script to invoke alpaca model
- Add chatLLaMa script
- Compute perplexity over prompt
- Some pull request reviews not shown.
ggerganov/whisper.cpp
3 pull requests
Created an issue in ggerganov/llama.cpp that received 41 comments
Create a logo
We should probably make a logo for this project. Like an image of a
41
comments
Opened 14 other issues in 3 repositories
ggerganov/llama.cpp
9
open
2
closed
- Help populating the examples README.md files
- Create "instruct" example
- Move the Flake scripts to a separate repository
- 2-bit integer quantization
-
Eliminate
ggml_forward_mul_mat_xxx()branch for non-contiguoussrc0 - Investigate alternative approach for Q4 quantization
- Add proper instructions for using Alpaca models
- Update the convert-gptq-to-ggml.py with the new tokenizer output
- Add instructions for using Alpaca
- Study how LM Evaluation Harness works and try to implement it
- Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs
ggerganov/whisper.cpp
2
open
antimatter15/alpaca.cpp
1
closed
Started 2 discussions in 1 repository
ggerganov/llama.cpp
ggerganov/llama.cpp
-
Roadmap (short-term)
This contribution was made on Mar 24
-
Inference at the edge
This contribution was made on Mar 16






