Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the hdf5 videos store as int8 format #1559

Merged
merged 4 commits into from
Oct 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@
John Smith <john at example.com> Example Inc.

Jeremy Delahanty <jdelahanty@salk.edu> The Salk Institute for Biological Studies

Lili Karashchuk <lili.karashchuk@alleninstitute.org> Allen Institute of Neural Dynamics
20 changes: 15 additions & 5 deletions sleap/io/video.py
Original file line number Diff line number Diff line change
Expand Up @@ -1443,19 +1443,29 @@ def to_hdf5(

def encode(img):
_, encoded = cv2.imencode("." + format, img)
return np.squeeze(encoded)
return np.squeeze(encoded).astype("int8")

# pad with zeroes to guarantee int8 type in hdf5 file
frames = []
for i in range(len(frame_numbers)):
frames.append(encode(frame_data[i]))

max_frame_size = max([len(x) for x in frames])
Comment on lines +1448 to +1453
Copy link

@coderabbitai coderabbitai bot Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code creates a list of encoded frames and calculates the maximum frame size. This is a change from the previous implementation where each frame was encoded and directly stored in the HDF5 dataset. This change could potentially increase memory usage if the number of frames is large. Consider a more memory-efficient approach if this is a concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as @coderabbitai says, this will store the frames into RAM to find the maximum frame size. I think this is rarely a concern for the number of frames that most people deal with, including all the example SLEAP datasets.

Note that the current SLEAP code already stores the full frames in memory anyways, through the get_frames() function.

In the future, if someone has a lot more frames that don't fit into memory, you could do a 2-pass of encoding, which will be 2 times slower but not consume any more memory than the previous code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lambdaloop, thank you for your response. You're correct that the current implementation already stores the full frames in memory, and for most use cases, this won't be a problem.

Your suggestion of a two-pass encoding process for larger datasets is a good one. This would involve a first pass to calculate the maximum frame size and a second pass to encode and store the frames. While this would indeed be slower, it would keep memory usage in check.

Here's a rough sketch of how that could be implemented:

# First pass to calculate max frame size
max_frame_size = 0
for i in range(len(frame_numbers)):
    encoded_frame = encode(frame_data[i])
    if len(encoded_frame) > max_frame_size:
        max_frame_size = len(encoded_frame)

# Create the dataset with the calculated max frame size
dset = f.create_dataset(
    dataset + "/video", (len(frame_numbers), max_frame_size),
    dtype ='int8'
)

# Second pass to encode and store the frames
for i in range(len(frame_numbers)):
    encoded_frame = encode(frame_data[i])
    dset[i, 0:len(encoded_frame)] = encoded_frame

This way, we only keep one frame in memory at a time during the encoding process. However, as you mentioned, this would double the encoding time. It's a trade-off between memory usage and processing time, and the best approach would depend on the specific requirements of the use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I didn't expect this bot to write the full code for me, that's awesome!

Anyhow, I don't think it's worth doing the 2 pass system at this time. The rest of the code should be refactored first anyway.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lambdaloop, I'm glad you found the code snippet helpful! You're right, it's important to prioritize refactoring and optimizing the existing code base before introducing new changes. The two-pass system can be considered in the future when handling larger datasets becomes a necessity. Feel free to reach out if you need further assistance or code reviews in the future. Happy coding!


dtype = h5.special_dtype(vlen=np.dtype("int8"))
dset = f.create_dataset(
dataset + "/video", (len(frame_numbers),), dtype=dtype
dataset + "/video",
(len(frame_numbers), max_frame_size),
dtype="int8",
compression="gzip",
)
dset.attrs["format"] = format
dset.attrs["channels"] = self.channels
dset.attrs["height"] = self.height
dset.attrs["width"] = self.width

for i in range(len(frame_numbers)):
dset[i] = encode(frame_data[i])
for i, frame in enumerate(frames):
dset[i, 0 : len(frame)] = frame

else:
f.create_dataset(
dataset + "/video",
Expand Down
Loading