Not AI
I like big .vimrc and I cannot lie
- Sofia, Bulgaria
-
05:04
(UTC +03:00) - https://ggerganov.com
- @ggerganov
- user/ggerganov
Block or Report
Block or report ggerganov
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
1,123 contributions in the last year
Less
More
Contribution activity
March 2023
Created 153 commits in 3 repositories
Created 1 repository
-
ggerganov/llama.cpp
C
•
Built by
Created a pull request in ggerganov/llama.cpp that received 12 comments
Reduce memory usage and allocate enough memory for largest context
Utilize ggml scratch buffers to reduce memory usage (see ggerganov/whisper.cpp#431 for more info)
Disable BLAS for matrix multiplications where src0…
+307
−80
•
12
comments
Opened 12 other pull requests in 2 repositories
ggerganov/llama.cpp
8
merged
1
closed
- Attempt to SIMD-ify dequantize_row_q4_0() for ARM_NEON
- Retire the ggml_mul_mat() branch for transposed src0
- Immediately start processing the prompt before user input has been provided
- Avoid the "non-contiguous X" branch in the Z = X * Y matrix multiplication
- IMPORTANT: Introduce C-style API - Major Refactoring
- Add tokenizer test + revert to C++11
- Add tokenizer test + revert to C++11
- Use vdotq_s32 to improve performance
- Add Github CI
ggerganov/whisper.cpp
2
merged
1
open
Reviewed 85 pull requests in 3 repositories
ggerganov/llama.cpp
25 pull requests
- CI: Re-enable AVX512 testing (Windows-MSVC)
- Create chat-13B.bat
- Use the same batch size threshold for enabling OpenBLAS and disabling ggml threading
- Be more strict about converting float to double
- CI: fix subdirectory path globbing
- Revert 7e53955 (#542) and fix properly
- Add AVX2 implementation of quantize_row_q4_1
- Add script to convert old ggml files to newer version
- Converting GGML back to Torch checkpoint for HuggingFace/Pytorch consumption/training/finetuning
- Refactored code for reduced memory usage and improved readability
- Refactor quantized processing functions
- (Windows) Set console to UTF-8 on init
- CMake / CI additions
- Add AVX2 implementation of dequantize_row_q4_1
- Add timings for the prompt evaluation
- Add support for file load progress reporting callbacks
- additional optimizations for POWER9
- Support calling mlock() on loaded model data on Linux and macOS
- Fix quantize script not finding models in parent directory
- Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode
- Generate library with CMake
- Proof of concept TCP server mode
- Deduplicate q4 quantization functions
- fix: add POSIX functionality for Linux compilation
- cmake: make llama an actual library
- Some pull request reviews not shown.
ggerganov/whisper.cpp
6 pull requests
ggerganov/ggml
1 pull request
Created an issue in ggerganov/llama.cpp that received 42 comments
Create a logo
We should probably make a logo for this project. Like an image of a
42
comments
Opened 17 other issues in 3 repositories
ggerganov/llama.cpp
3
closed
10
open
- Update the convert-unversioned-ggml-to-ggml.py script to support GPT4All ggml models
- Fix failing CI test using thread sanitizer
- Help populating the examples README.md files
- Create "instruct" example
- Move the Flake scripts to a separate repository
- 2-bit integer quantization
-
Eliminate
ggml_forward_mul_mat_xxx()branch for non-contiguoussrc0 - Investigate alternative approach for Q4 quantization
- Add proper instructions for using Alpaca models
- Update the convert-gptq-to-ggml.py with the new tokenizer output
- Add instructions for using Alpaca
- Study how LM Evaluation Harness works and try to implement it
- Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs
ggerganov/whisper.cpp
3
open
antimatter15/alpaca.cpp
1
closed
Started 2 discussions in 1 repository
ggerganov/llama.cpp
ggerganov/llama.cpp
-
Roadmap (short-term)
This contribution was made on Mar 24
-
Inference at the edge
This contribution was made on Mar 16






