Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird DataCloneError occuring on failed jobs #871

Closed
josephjclark opened this issue Feb 11, 2025 · 6 comments · Fixed by #873
Closed

Weird DataCloneError occuring on failed jobs #871

josephjclark opened this issue Feb 11, 2025 · 6 comments · Fixed by #873
Assignees

Comments

@josephjclark
Copy link
Collaborator

josephjclark commented Feb 11, 2025

Turn fails into crashes with this one weird trick:

fn((state) => {
  state.configuration =  {
      "apiVersion": "v2",
      "baseURL": "https://kf.kobotoolbox.org/",
  }
  return state;
})
getSubmissions({formId: 'aPY2nSgrcBaByvHsCMU6PG'}, state => {
  console.log(state.data);
  return state;
});

We've been seeing these weird crashes occuring lately:

DataCloneError: function httpAdapter(config) {
  return new Promise(function dispatchHttpRequest(resolvePromise, rejectPromise)...<omitted>...
} could not be cloned.
@josephjclark josephjclark self-assigned this Feb 11, 2025
@github-project-automation github-project-automation bot moved this to New Issues in v2 Feb 11, 2025
@josephjclark
Copy link
Collaborator Author

The DataCloneError doesn't happen in the CLI, which isn't so surprising. The final output state is fine, nothing wierd in it.

Tomorrow I'll try and boil this down to a repro in the worker/engine and see if I can work out what's up.

@doc-han
Copy link
Collaborator

doc-han commented Feb 11, 2025

Yup. I assume it happens in the Worker when data is being sent out.

@josephjclark
Copy link
Collaborator Author

Ok this is weird:

Here are logs on prod:

RTE Memory limit: 128mb
RTE Timeout: 60s
RTE Payload limit: 10mb
VER Versions:
    ▸ node.js                         22.13.0
    ▸ worker                          1.9.1
    ▸ @openfn/language-kobotoolbox    2.4.3
R/T Executing d574a360-690a-4d51-ac1d-d6d98012abe9
R/T Starting step Test Kobo API
R/T [linker] loading module @openfn/language-kobotoolbox
R/T [linker] Loading module @openfn/language-kobotoolbox from /tmp/openfn/worker/repo/node_modules/@openfn/language-kobotoolbox_2.4.3/dist/index.cjs
R/T Resolved adaptor @openfn/language-kobotoolbox to version 2.4.3
R/T Executing expression (2 operations)
R/T Starting operation 1
R/T Operation 1 complete in 0ms
R/T Starting operation 2
ADA The common.http.get function has been deprecated. This adaptor should migrate to use common.util.http instead.
ADA {
  "message": "Request failed with status code 401",
  "name": "AxiosError"
}
R/T Test Kobo API aborted with error (2.654s)
R/T Cleaning up state. Removing keys: configuration
R/T Error reported by "getSubmissions()" operation line 11:
R/T AxiosError: Request failed with status code 401
    @openfn/language-kobotoolbox_2.4.3/node_modules/axios/dist/node/axios.cjs:1268:12)
    @openfn/language-kobotoolbox_2.4.3/node_modules/axios/dist/node/axios.cjs:2446:11)
R/T Additional error details:
R/T {
  "code": "ERR_BAD_REQUEST",
  "config": {
    "auth": {},
    "env": {},
    "headers": {
      "Accept": "application/json, text/plain, */*",
      "Accept-Encoding": "gzip, deflate, br",
      "User-Agent": "axios/1.1.3"
    },
    "maxBodyLength": -1,
    "maxContentLength": -1,
    "method": "get",
    "timeout": 0,
    "transformRequest": [
      null
    ],
    "transformResponse": [
      null
    ],
    "transitional": {
      "clarifyTimeoutError": false,
      "forcedJSONParsing": true,
      "silentJSONParsing": true
    },
    "url": "https://kf.kobotoolbox.org/api/v2/assets/aPY2nSgrcBaByvHsCMU6PG/data/?format=json",
    "xsrfCookieName": "XSRF-TOKEN",
    "xsrfHeaderName": "X-XSRF-TOKEN"
  },
  "message": "Request failed with status code 401",
  "name": "AxiosError",
 -- REDACTED--
  "type": "AxiosError"
}
R/T Check state.errors.03efb3fd-d410-4138-8b27-3ee268008a24 for details
R/T Run complete with status: crash
DataCloneError: function httpAdapter(config) {
  return new Promise(function dispatchHttpRequest(resolvePromise, rejectPromise)...<omitted>...
} could not be cloned.

I've redacted a bunch of axios stuff (although the fact that it's trying to log axios is surely part of the problem here)

For the same job, here are logs locally:

RTE Memory limit: 128mb
RTE Timeout: 300s
RTE Payload limit: 10mb
VER Versions:
    ▸ node.js                         22.13.0
    ▸ worker                          1.9.1
    ▸ @openfn/language-kobotoolbox    2.4.3
R/T Executing 055fb935-380c-464d-a9a8-138a823e1ead
R/T Starting step New job
R/T [linker] loading module @openfn/language-kobotoolbox
R/T [linker] Loading module @openfn/language-kobotoolbox from /tmp/openfn/worker/repo/node_modules/@openfn/language-kobotoolbox_2.4.3/dist/index.cjs
R/T Resolved adaptor @openfn/language-kobotoolbox to version 2.4.3
R/T Executing expression (2 operations)
R/T Starting operation 1
R/T Operation 1 complete in 0ms
R/T Starting operation 2
ADA The common.http.get function has been deprecated. This adaptor should migrate to use common.util.http instead.
ADA {
  "message": "Request failed with status code 401",
  "name": "AxiosError"
}
R/T New job aborted with error (1.271s)
R/T Cleaning up state. Removing keys: configuration
R/T Request failed with status code 401
R/T Check state.errors.e66101dd-886b-4860-8766-91cfb2ca1909 for details
R/T Run complete with status: fail
JobError: Request failed with status code 401

Prod throws the DataCloneError, and local does not. Staging looks like my local btw.

It looks like it goes through different error processing code? My local run doesn't have this bit:

R/T Error reported by "getSubmissions()" operation line 11:
R/T AxiosError: Request failed with status code 401
    @openfn/language-kobotoolbox_2.4.3/node_modules/axios/dist/node/axios.cjs:1268:12)
    @openfn/language-kobotoolbox_2.4.3/node_modules/axios/dist/node/axios.cjs:2446:11)
R/T Additional error details:

Baffled by this. Why is the error tracing different in different environments?

@josephjclark
Copy link
Collaborator Author

When the error is caught from the run, we look at the stack and try and decide if we think it's an adaptor error (basically, did the error come from vm code directly or something in the adaptor's path?).

Locally, this test is failing to pass - the runtime does not think the error comes from the adaptor. In this case I think it's because the error comes from axios, with a path like /tmp/openfn/worker/repo/node_modules/axios/dist/node/axios.cjs:1268:12. Axios is not saved under the adaptor's node_modules, it's saved elsewhere.

So if I broaden this test so say "it's an adaptor error if the error came out of the repo", I can reproduce.

I can't quite remember why I'm being so careful about this adaptor error thing. I think I'm just paranoid really - if it's not a vm error, it's probably an adaptor error, right? I just don't want to track runtime errors (because that should filter through to a crash).

I don't really understand why this is different locally and in production. But I suppose if the local repo in production is structured differently - like if axios is saved to the adaptor, not to the top of the repo - it'll trip this. So I think it's coincidence. It depends on which adaptors were installed previously.

So, anyway, two learnings:

  • My "did this error come from an adaptor?" tests are too narrow, and need to change to "did this error come from the repo?"
  • The error itself include unserializable stuff, and that's what's triggering this clone error.

@josephjclark
Copy link
Collaborator Author

Pausing for today. Struggling a bit to work out exactly how to reproduce this DataCloneError in the engine. First attempt to build a unit test around it did not seem to work :(

@josephjclark
Copy link
Collaborator Author

Came back to try something - this should repro on common:

fn((state) => {
  throw { message: { fn: () => {} }}
})

The message bit is important because this is what happens inside the engine:

const run = (task: string, args: any[]) => {
  tasks[task](...args)
    .then((result) => {
      publish(ENGINE_RESOLVE_TASK, {
        result,
      });
    })
    .catch((e) => {
      publish(ENGINE_REJECT_TASK, {
        error: {
          severity: e.severity || 'crash',
          message: e.message,
          type: e.type || e.name,
        },
      });
    });
};

Basically we call the runtime, catch any error it might throw, and pass { severity, message, type } through the thread. If message is an object, and contains something non serializable, it's game over.

What happens to my error classes with { fix, message, description } etc? Well they'll be logged to console (safely) and written to state (safely). So all this really means is that the error object thrown from the runtime might lack detail that's send to lighting. Like lighting's error object may be incomplete, but everything else should be set

I'll have to do something to e.message to ensure it's safe. Or maybe I can just catch the error and say "error serializing error object" - because the real error stuff should be properly logged anyway

@josephjclark josephjclark mentioned this issue Feb 13, 2025
7 tasks
@github-project-automation github-project-automation bot moved this from New Issues to Done in v2 Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants