Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploading a stream to a block blob extremely slow #29

Open
connor4312 opened this issue Nov 28, 2018 · 3 comments
Open

Uploading a stream to a block blob extremely slow #29

connor4312 opened this issue Nov 28, 2018 · 3 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@connor4312
Copy link

connor4312 commented Nov 28, 2018

Which service(blob, file, queue, table) does this issue concern?

Blob

Which version of the SDK was used?

"@azure/storage-blob": "^10.2.0-preview"

What's the Node.js/Browser version?

Node.js

What problem was encountered?

Storage streams never drained / flushed.

Steps to reproduce the issue?

For one scenario we're using the tar package and uploading streams directly from tar archives, using the entry event.

Emits 'entry' events with tar.ReadEntry objects, which are themselves readable streams that you can pipe wherever. Each entry will not emit until the one before it is flushed through, so make sure to either consume the data (with on('data', ...) or .pipe(...)) or throw it away with .resume() to keep the stream flowing.

A few files upload fine, but then it gets 'stuck' and takes several minutes before, slowly, continuing on to upload more files. This indicates that BlockBlobUrl.upload does not consume the file stream. The promise returned from blobUrl.upload also does not resolve.

Our code is something like this:

const parser = new Parse();
createReadStream(tarball).pipe(parser);

parser.on('entry', file => {
  blobUrl.upload(Aborter.none, () => file, file.size, {
    blobHTTPHeaders: {
      blobContentType: mime.getType(file.path)!,
    },
  });
});

There's a bit more stuff in there around handling promises and such, but that's the gist of it.

Using storage.createBlockBlobFromStreamAsync from the previous SDK worked fine in this scenario, and also manually calling uploadStreamToBlockBlob works...

uploadStreamToBlockBlob(Aborter.none, file, blobUrl, 2 * 1024 * 1024, 20, {
  blobHTTPHeaders: {
    blobContentType: mime.getType(file.path),
  },
});

...but the more ergonomic blobUrl.upload does not.

Have you found a mitigation/solution?

Above ^

@XiaoningLiu XiaoningLiu self-assigned this Nov 29, 2018
@XiaoningLiu XiaoningLiu added the question Further information is requested label Nov 29, 2018
@XiaoningLiu
Copy link
Member

XiaoningLiu commented Nov 29, 2018

Hi @connor4312

I think main reason for the slow uploading is because BlockBlobURL.upload is not a parallelism uploading, it's a convenience layer uploading method.

For high performance uploading, please use uploadStreamToBlockBlob and other public methods provided in highlevel.ts. They fully support parallel upload.

BTW, how do you know "This indicates that BlockBlobUrl.upload does not consume the file stream? uploadStreamToBlockBlob internally calls into BlockBlobURL.upload to upload chunks.

@connor4312
Copy link
Author

connor4312 commented Nov 29, 2018

how do you know "This indicates that BlockBlobUrl.upload does not consume the file stream? uploadStreamToBlockBlob internally calls into BlockBlobURL.upload to upload chunks.

I can tell this because the Parser stops emitting entry events. In the quoted section of the docs, they say that's how they work. Using .upload causes it to stall, but using uploadStreamToBlockBlob works, and just piping to a temp file on the filesystem also works.

I think main reason for the slow uploading is because BlockBlobURL.upload is not a parallelism uploading, it's a convenience layer uploading method. For high performance uploading, please use uploadStreamToBlockBlob and other public methods provided in highlevel.ts. They fully support parallel upload.

Should this be the way it works? As a naive consumer I would expect blob.upload() to "just work". Instead I got mysterious timeouts, and just randomly tried uploadStreamToBlockBlob because I was looking through Github issues and stumbled upon it mentioned in another issue. It's very non-obvious that attempting parallelism with blob.upload() would fail in this way.

@XiaoningLiu XiaoningLiu added the enhancement New feature or request label Nov 30, 2018
@XiaoningLiu
Copy link
Member

Hi @connor4312

Is the timeout error thrown by blobURL.upload? If there are any errors thrown by blobURL.upload, please share the full error message to us. And you can also enable the logging, and share the logs to us for debugging.

  const pipeline = StorageURL.newPipeline(sharedKeyCredential, {
    logger: {
      log: console.log,
      minimumLogLevel: HttpPipelineLogLevel.INFO
    }
  });

At least, you should not use blobUrl.upload in this scenario. Because blobUrl.upload accepts a stream factory method instead of a stream. The stream factory method needs to return a new stream starting from the offset 0 in data source every time. This is because blobUrl.upload will retry when network broken, and it needs a new stream to start a new HTTP request. But method () => file will share the existing file stream. I'm not sure this is the root cause. But I cannot think of other reasons besides the poor network speed, which may stall a upload, because the readable stream is directly passed into https://github.com/axios/axios the underline request module.

parser.on('entry', file => {
  blobUrl.upload(Aborter.none, () => file, file.size, {
    blobHTTPHeaders: {
      blobContentType: mime.getType(file.path)!,
    },
  });
});

We will make the documentation about blobURL.upload and highlevel APIs more clear.

XiaoningLiu added a commit to XiaoningLiu/azure-storage-js-1 that referenced this issue Jan 2, 2019
XiaoningLiu added a commit to XiaoningLiu/azure-storage-js-1 that referenced this issue Jan 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants