-
-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle empty memoryviews of bytearrays when (de)serializing #6576
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 7h 21m 38s ⏱️ + 3m 11s For more details on these failures, see this check. Results for commit fbe4dc0. ± Comparison against base commit 96c13a5. ♻️ This comment has been updated with latest results. |
|
Merging EOD unless there are other comments |
|
Sorry Ben I'll try to look, but it may not happen right away. Have some other high priority things that needed to be resolved first |
|
LGTM, thx @quasiben |
|
One issue that may come up here is the handling of arrays with multiple dimensions, one of which is In [1]: import numpy
...:
...: from distributed.protocol import deserialize, serialize
In [2]: mv = numpy.ones((2, 0), dtype="f8").data
In [3]: deserialize(*serialize(mv))Side note: This doesn't require NumPy. One could use Current Behavior (error):---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 deserialize(*serialize(mv))
File ~/miniconda/envs/dask/lib/python3.9/site-packages/distributed/protocol/serialize.py:416, in deserialize(header, frames, deserializers)
411 raise TypeError(
412 "Data serialized with %s but only able to deserialize "
413 "data with %s" % (name, str(list(deserializers)))
414 )
415 dumps, loads, wants_context = families[name]
--> 416 return loads(header, frames)
File ~/miniconda/envs/dask/lib/python3.9/site-packages/distributed/protocol/serialize.py:58, in dask_loads(header, frames)
56 typ = pickle.loads(header["type-serialized"])
57 loads = dask_deserialize.dispatch(typ)
---> 58 return loads(header["sub-header"], frames)
File ~/miniconda/envs/dask/lib/python3.9/site-packages/distributed/protocol/serialize.py:786, in _deserialize_memoryview(header, frames)
784 else:
785 out = memoryview(b"".join(frames))
--> 786 out = out.cast(header["format"], header["shape"])
787 return out
TypeError: memoryview: cannot cast view with zeros in shape or stridesThis would not be reconstructed correctly during deserialization. How should we handle this case?
With this change, we are doing 3. Though maybe we should be doing one of the others instead (silent failures can be difficult to debug). Edit: Updated the example above to show how serialization comes into play (and how it errors currently). |
|
Without this change we are still silently doing nothing.
Also, is there a usecase where passing empty memoryviews of numpy arrays comes up ? At the moment, I'm assuming not because I don't think Dask has correctly handled this case in the past. Is this right ? |
|
Not exactly. Without this change we are raising an error, it is just indiscriminate in nature. We could narrow it for example. We could also move it from Think @adamklein would be better able to answer those questions than I. Being a user of this functionality (serialization of |
This case is different enough from the `array` test that it is worth handling on its own.
It is not possible to correctly deserialize an empty `memoryview` with multiple dimensions using builtin Python functionality alone. So raise when trying to serialize such an object to provide this information the user as close to the error producing object as possible (and not when it is deserialized somewhere else). Should help the user find this and either flatten the `memoryview` themselves, switching to using NumPy where this is handled correctly, or do something else.
Unlike NumPy arrays, this works with `memoryview`s. So check whether it is trivial or not this way.
bed24db
to
77ab6fc
Compare
This was done for us before with `cast`. However it is not handled in this branch. So do that ourselves.
| if not obj and obj.ndim > 1: | ||
| raise ValueError("Cannot serialize empty non-1-D `memoryview`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can't properly reconstruct these during deserialization, just raise here. This should be as close as possible to user code. So should hopefully make it clearer what data triggered this error.
| def test_ser_empty_nd_memoryview(): | ||
| mv = memoryview(b"12").cast("B", (1, 2))[:0] | ||
|
|
||
| # serialize empty `memoryview` | ||
| with pytest.raises(TypeError): | ||
| serialize(mv, on_error="raise") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the error case
|
Ok pushed some changes above. Please let me know what you think :) |
| out = out.cast(header["format"], header["shape"]) | ||
| else: | ||
| out = out.cast(header["format"]) | ||
| assert out.shape == header["shape"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assertion necessary ? I'm thinking about what happens if a user gets this ? Is the intent to inform the user something in serializing/deserializing has gone awry ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the .cast effectively does this check for us. However it is not handled in the else case so added an assert
If this assertion raises, it means something has gone horribly wrong. For example we expected a non-empty memoryview, but got an empty one. IOW data was lost in communication somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based off offline discussion, it sounds like this is ok. Though happy to discuss further if needed
|
Added a single comment but planning to merge tomorrow before Friday's release cc @jsignell |
|
Thanks Ben! 🙏 |
Closes #6542
pre-commit run --all-files@jakirkham can you review and take over if anything else is needed ?
cc @adamklein