Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 streaming has a bug #11

Open
maoueh opened this issue Oct 2, 2023 · 3 comments
Open

S3 streaming has a bug #11

maoueh opened this issue Oct 2, 2023 · 3 comments

Comments

@maoueh
Copy link
Contributor

maoueh commented Oct 2, 2023

It appears our streaming code for S3 (or maybe in the S3 library itself) has a bug that leads to weird file issue where the content is not read fully.

The bug seems to happen non-systematically, which makes me think it could be a wrong "error" handling when the stream closes unexpectedly.

This problem has been reported a few times over the past 1-2 years, against Ceph, S3 directly and SeaweedFS. The current workaround is to set DSTORE_S3_BUFFERED_READ=true which reads everything in one show in memory and then act as a io.Reader. This however creates memory pressure as the full file is held in memory before being streamed.

See streamingfast/firehose-core#15 for some details, and some logs from SeaweedFS. We can see there that SeaweedFS sees internal failure but those leads later to Firehose trying to read corrupted blocks:

panic: unable to decode block #16651067 (fb80c53f0b9ad8a026d21cf9aab801e42ea6db209de86053fead6a751f8f6477) payload (kind: ETH, version: 3, size: 1047315, sha256: 0614c58482dfdd1ebfd10abda656531bd8b81e15852dc54138ad8e0f592e9f3c): unable to decode payload: proto: cannot parse invalid wire-format data

Payload: [OMITTING HUGE LINE OF BINARY DATA]

goroutine 345531 [running]:
github.com/streamingfast/bstream.(*Block).ToProtocol(0xc02376c500)
	/home/runner/go/pkg/mod/github.com/streamingfast/bstream@v0.0.2-0.20230510131449-6b591d74130d/block.go:246 +0x6e9
github.com/streamingfast/firehose-ethereum/transform.(*CombinedFilter).Transform(0xc0003b05f8?, 0x681d52?, {0x64c0f1?, 0x3476c60?})
	/home/runner/work/firehose-ethereum/firehose-ethereum/transform/combined_filter.go:185 +0x36
github.com/streamingfast/bstream/transform.(*Registry).BuildFromTransforms.func1(0xc01fe9be00)
	/home/runner/go/pkg/mod/github.com/streamingfast/bstream@v0.0.2-0.20230510131449-6b591d74130d/transform/builder.go:82 +0x1d3
github.com/streamingfast/bstream.(*FileSource).preprocess(0xc001bcf5c0, 0xc01fe9be00, 0xc0404bb860)
	/home/runner/go/pkg/mod/github.com/streamingfast/bstream@v0.0.2-0.20230510131449-6b591d74130d/filesource.go:506 +0x5b
created by github.com/streamingfast/bstream.(*FileSource).streamReader
	/home/runner/go/pkg/mod/github.com/streamingfast/bstream@v0.0.2-0.20230510131449-6b591d74130d/filesource.go:495 +0x68a

Which means someone the "consumer" saw a end of the stream but the actual reading code failed due to some missing bytes.

@johnkozan
Copy link

I have the same issue with my substreams tier2 instances. They would consume all available tcp_mem until the system crashed.

I've been poking around trying to figure this out for a while, and I think I've found the issue, the reader.Body is not closed here:

out, err = s.uncompressedReader(reader.Body)

I added a reader.Body.Close() and that seems to have fixed the issue, tcp_mem is down to nothing.

johnkozan added a commit to johnkozan/dstore that referenced this issue Oct 3, 2023
@johnkozan
Copy link

or am I wrong because it is closed later?

@maoueh
Copy link
Contributor Author

maoueh commented Oct 3, 2023

It's closed later indeed, that the point of the OpenObject. But your investigation pointa to something in that vein, there is definitely something not closed properly while the streaming is happening.

We had a "sample" binary that was doing in loop OpenObject then tried to read the merged block, I'll try to find it back and set up the infra to reproduce the error more easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants