-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird DataCloneError occuring on failed jobs #871
Comments
The DataCloneError doesn't happen in the CLI, which isn't so surprising. The final output state is fine, nothing wierd in it. Tomorrow I'll try and boil this down to a repro in the worker/engine and see if I can work out what's up. |
Yup. I assume it happens in the Worker when data is being sent out. |
Ok this is weird: Here are logs on prod:
I've redacted a bunch of axios stuff (although the fact that it's trying to log axios is surely part of the problem here) For the same job, here are logs locally:
Prod throws the DataCloneError, and local does not. Staging looks like my local btw. It looks like it goes through different error processing code? My local run doesn't have this bit:
Baffled by this. Why is the error tracing different in different environments? |
When the error is caught from the run, we look at the stack and try and decide if we think it's an adaptor error (basically, did the error come from vm code directly or something in the adaptor's path?). Locally, this test is failing to pass - the runtime does not think the error comes from the adaptor. In this case I think it's because the error comes from axios, with a path like So if I broaden this test so say "it's an adaptor error if the error came out of the repo", I can reproduce. I can't quite remember why I'm being so careful about this adaptor error thing. I think I'm just paranoid really - if it's not a vm error, it's probably an adaptor error, right? I just don't want to track runtime errors (because that should filter through to a crash). I don't really understand why this is different locally and in production. But I suppose if the local repo in production is structured differently - like if axios is saved to the adaptor, not to the top of the repo - it'll trip this. So I think it's coincidence. It depends on which adaptors were installed previously. So, anyway, two learnings:
|
Pausing for today. Struggling a bit to work out exactly how to reproduce this DataCloneError in the engine. First attempt to build a unit test around it did not seem to work :( |
Came back to try something - this should repro on common:
The message bit is important because this is what happens inside the engine:
Basically we call the runtime, catch any error it might throw, and pass What happens to my error classes with I'll have to do something to e.message to ensure it's safe. Or maybe I can just catch the error and say "error serializing error object" - because the real error stuff should be properly logged anyway |
Turn fails into crashes with this one weird trick:
We've been seeing these weird crashes occuring lately:
The text was updated successfully, but these errors were encountered: