-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 streaming has a bug #11
Comments
I have the same issue with my substreams tier2 instances. They would consume all available tcp_mem until the system crashed. I've been poking around trying to figure this out for a while, and I think I've found the issue, the reader.Body is not closed here: Line 335 in 3924b3b
I added a |
or am I wrong because it is closed later? |
It's closed later indeed, that the point of the We had a "sample" binary that was doing in loop |
It appears our streaming code for S3 (or maybe in the S3 library itself) has a bug that leads to weird file issue where the content is not read fully.
The bug seems to happen non-systematically, which makes me think it could be a wrong "error" handling when the stream closes unexpectedly.
This problem has been reported a few times over the past 1-2 years, against Ceph, S3 directly and SeaweedFS. The current workaround is to set
DSTORE_S3_BUFFERED_READ=true
which reads everything in one show in memory and then act as aio.Reader
. This however creates memory pressure as the full file is held in memory before being streamed.See streamingfast/firehose-core#15 for some details, and some logs from SeaweedFS. We can see there that SeaweedFS sees internal failure but those leads later to Firehose trying to read corrupted blocks:
Which means someone the "consumer" saw a end of the stream but the actual reading code failed due to some missing bytes.
The text was updated successfully, but these errors were encountered: