Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash improvements #15941

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Hash improvements #15941

wants to merge 10 commits into from

Conversation

tiehuis
Copy link
Member

@tiehuis tiehuis commented Jun 3, 2023

This is a bit of clean-up on the hashing functions and some performance improvements.

The core of this PR is abstracting the internal buffer-management for hash/crypto functions which require blocks of data to process on. Currently we naively call these via the init, update, final sequence from most hash calls. This is not ideal however when data is small (less than internal block size) as it results in unnecessary memory copies which slows these down enourmously.

The intended approach is to bypass this in the hash function of a hasher to improve small key performance. This is usually trivially done by calling the internal round component directly and moving the final function to a small shim which computes with the remaining buffer. This way we can call with the known buffer (in the case of small keys) or defer to the internal buffer (if hashing long keys/streaming data).

There is additional improvement on an individual hashing level that can be done for small keys, these is focusing on more generic improvements.

The benchmarks were reusing the same input buffer over and over again,
giving a chance to the optimizer for computing the hash only once,
and multiplying the output by the number of iterations.

After this, the results are *very* different (not in a good way, but
at least, they are closer to reality).

Also add xxhash and siphash.

xxhash hash() function had a different API than all other std.hash
function that forced the key to be 0. Make it consistent with other
functions, especially since this is not a cryptographic hash function.

Remove some useless `pub` and declarations from siphash by the way.
lib/std/crypto/benchmark.zig Outdated Show resolved Hide resolved
const std = @import("std");

/// Abstracts buffer management for unaligned input blocks for hash and crypto functions.
pub fn StreamingCache(comptime round_length: usize) type {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rename. This isn't really a cache, more just temporary storage for streaming.

Copy link
Contributor

@matu3ba matu3ba Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamingBuffer? Afaiu, you are using a temporary buffer for streaming.
The explanation does not give any hints what round_length represents and what update does with it.

To me this looks like a ringbuffer being filled on every update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is very confusing. How about HashBuffer or something like that?

That being said, maybe there's an easier and more efficient approach: simply add an hashSlice(comptime N: usize, m: [N]u8) function, since small keys generally have a compile-known size.

The optimizer may then be able to removes all the unnecessary steps for a given size.

Copy link
Contributor

@jedisct1 jedisct1 Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that approach, and it seems to be working pretty well.

xxhash32 before: 5049 MiB/s [1ffbad07bda51000]
xxhash32 after : 14895 MiB/s

Just by adding a hashSlice() function like this:

    pub fn hashSlice(comptime T: usize, seed: u32, input: *const [T]u8) u32 {
        var hasher = @call(.always_inline, init, .{seed});
        @call(.always_inline, update, .{ &hasher, input });
        return @call(.always_inline, final, .{&hasher});
    }

@ Before

xxhash64
   iterative: 10459 MiB/s [cca8095733931502]
  small keys: 10409 MiB/s [aaa83a86d42ed224]
xxhash32
   iterative: 10021 MiB/s [9c53377900000000]
  small keys: 10036 MiB/s [faea3b5b3e000000]

@ After

xxhash64
   iterative: 10669 MiB/s [cca8095733931502]
  small keys: 10662 MiB/s [aaa83a86d42ed224]
xxhash32
   iterative: 10275 MiB/s [9c53377900000000]
  small keys: 10270 MiB/s [faea3b5b3e000000]
@ Before

xxhash64
   iterative: 10620 MiB/s [cca8095733931502]
  small keys:  1762 MiB/s [3bbbd0444f8570d5]
xxhash32
   iterative: 10186 MiB/s [9c53377900000000]
  small keys:  2100 MiB/s [22220afec948fc00]

@ After

xxhash64
   iterative: 10724 MiB/s [cca8095733931502]
  small keys:  4771 MiB/s [3bbbd0444f8570d5]
xxhash32
   iterative: 10273 MiB/s [9c53377900000000]
  small keys:  5423 MiB/s [22220afec948fc00]
SMHasher uses 262144 so increase so we can test with the same key-size.
@@ -1,3 +1,5 @@
pub const StreamingCache = @import("hash/streaming_cache.zig").StreamingCache;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be public.

@dweiller dweiller mentioned this pull request Jun 3, 2023
3 tasks
This updates to the new v4 version, so existing hashes will fail and
this is not backwards compatible.

@ Before

wyhash
   iterative: 14715 MiB/s [bbdf2d8ce988cc64]
  small keys: 10440 MiB/s [b77f63adf023efb5]

@ After

wyhash
   iterative: 25814 MiB/s [b87f2e67dc918d9a]
  small keys: 14909 MiB/s [4fcbb3814468e1d0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants