New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash improvements #15941
base: master
Are you sure you want to change the base?
Hash improvements #15941
Conversation
The benchmarks were reusing the same input buffer over and over again, giving a chance to the optimizer for computing the hash only once, and multiplying the output by the number of iterations. After this, the results are *very* different (not in a good way, but at least, they are closer to reality). Also add xxhash and siphash. xxhash hash() function had a different API than all other std.hash function that forced the key to be 0. Make it consistent with other functions, especially since this is not a cryptographic hash function. Remove some useless `pub` and declarations from siphash by the way.
| const std = @import("std"); | ||
|
|
||
| /// Abstracts buffer management for unaligned input blocks for hash and crypto functions. | ||
| pub fn StreamingCache(comptime round_length: usize) type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should rename. This isn't really a cache, more just temporary storage for streaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamingBuffer? Afaiu, you are using a temporary buffer for streaming.
The explanation does not give any hints what round_length represents and what update does with it.
To me this looks like a ringbuffer being filled on every update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is very confusing. How about HashBuffer or something like that?
That being said, maybe there's an easier and more efficient approach: simply add an hashSlice(comptime N: usize, m: [N]u8) function, since small keys generally have a compile-known size.
The optimizer may then be able to removes all the unnecessary steps for a given size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that approach, and it seems to be working pretty well.
xxhash32 before: 5049 MiB/s [1ffbad07bda51000]
xxhash32 after : 14895 MiB/s
Just by adding a hashSlice() function like this:
pub fn hashSlice(comptime T: usize, seed: u32, input: *const [T]u8) u32 {
var hasher = @call(.always_inline, init, .{seed});
@call(.always_inline, update, .{ &hasher, input });
return @call(.always_inline, final, .{&hasher});
}@ Before xxhash64 iterative: 10459 MiB/s [cca8095733931502] small keys: 10409 MiB/s [aaa83a86d42ed224] xxhash32 iterative: 10021 MiB/s [9c53377900000000] small keys: 10036 MiB/s [faea3b5b3e000000] @ After xxhash64 iterative: 10669 MiB/s [cca8095733931502] small keys: 10662 MiB/s [aaa83a86d42ed224] xxhash32 iterative: 10275 MiB/s [9c53377900000000] small keys: 10270 MiB/s [faea3b5b3e000000]
@ Before xxhash64 iterative: 10620 MiB/s [cca8095733931502] small keys: 1762 MiB/s [3bbbd0444f8570d5] xxhash32 iterative: 10186 MiB/s [9c53377900000000] small keys: 2100 MiB/s [22220afec948fc00] @ After xxhash64 iterative: 10724 MiB/s [cca8095733931502] small keys: 4771 MiB/s [3bbbd0444f8570d5] xxhash32 iterative: 10273 MiB/s [9c53377900000000] small keys: 5423 MiB/s [22220afec948fc00]
SMHasher uses 262144 so increase so we can test with the same key-size.
4dd2c35
to
aa76d18
Compare
| @@ -1,3 +1,5 @@ | |||
| pub const StreamingCache = @import("hash/streaming_cache.zig").StreamingCache; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be public.
This updates to the new v4 version, so existing hashes will fail and this is not backwards compatible. @ Before wyhash iterative: 14715 MiB/s [bbdf2d8ce988cc64] small keys: 10440 MiB/s [b77f63adf023efb5] @ After wyhash iterative: 25814 MiB/s [b87f2e67dc918d9a] small keys: 14909 MiB/s [4fcbb3814468e1d0]
This is a bit of clean-up on the hashing functions and some performance improvements.
The core of this PR is abstracting the internal buffer-management for hash/crypto functions which require blocks of data to process on. Currently we naively call these via the
init,update,finalsequence from mosthashcalls. This is not ideal however when data is small (less than internal block size) as it results in unnecessary memory copies which slows these down enourmously.The intended approach is to bypass this in the
hashfunction of a hasher to improve small key performance. This is usually trivially done by calling the internal round component directly and moving thefinalfunction to a small shim which computes with the remaining buffer. This way we can call with the known buffer (in the case of small keys) or defer to the internal buffer (if hashing long keys/streaming data).There is additional improvement on an individual hashing level that can be done for small keys, these is focusing on more generic improvements.