Supporting out-of-band buffers with pickle protocol 5 #5472
Labels
enhancement
Feature requests and improvements
feat / serialize
Feature: Serialization, saving and loading
help wanted
Contributions welcome!
Feature description
Typically pickling in Python creates a large
bytes
object with types, functions, and data all packed in to allow easy reconstruction later. Originally pickling was focused on reading/writing to disk. However these days it is increasingly using as a serialization protocol for objects on the wire. In this case the copies of data required to put everything in a singlebytes
object hurts performance and doesn't offer much (as the data could be shipped along in separate buffers without copying).For these reasons, Python added support for out-of-band buffers in pickle, which allows the user to flag buffers of data for pickle to extract and send alongside the typical
bytes
object (thus avoiding unneeded copying of data). This was submitted and accepted as PEP 574 and is part of Python 3.8 (along with a backport package for Python 3.5, 3.6, and 3.7). On the implementation side this just comes down to implementing__reduce_ex__
instead of__reduce__
(basically the same with aprotocol
version argument) and placing anybytes
-like data (like NumPy arrays andmemoryview
s) intoPickleBuffer
objects. For older pickle protocols this step can simply be skipped. Here's an example. The rest is on libraries using protocol 5 (like Dask) to implement and use.Could the feature be a custom component or spaCy plugin?
If so, we will tag it as
project idea
so other users can take it on.I don't think so as this relies on changing the pickle implementations of spaCy objects. Though I could be wrong :)
The text was updated successfully, but these errors were encountered: