Fail to download full folder with 15K images and 7K text files #328

dbickson · 2023-07-07T06:42:54Z

Describe the bug

We are following the example by compiling the s3 demo copy program and running it on ubuntu 20.04 using the command line:
nohup time aws-c-s3/build/samples/s3/s3 cp s3://vl-sample-dataset-kitti/Kitti/ ~/Kitti --region us-east-2 &

The program runs fine without error but the number of received file is around 11K namely half the files are missing.

# Kitti is the folder of received files with s3 c demo
ubuntu@ip-172-31-30-217:~/Kitti$ du -sh .
6.1G	.
ubuntu@ip-172-31-30-217:~/Kitti$ find . -type f | wc
  11647   11647  391999

# Kitti2 is the data downloaded using aws s3 sync
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ find ~/Kitti2/ | wc
  22487   22487 1161674
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ du -sh ~/Kitti2
12G	/home/ubuntu/Kitti2

Expected Behavior

All files should be copied locally

Current Behavior

Only 50% of the files are copied, there is no error.
The machine has enough disk space for the copy:

ubuntu@ip-172-31-30-217:~$ df -k .
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/root      304681132 245898460  58766288  81% /
ubuntu@ip-172-31-30-217:~$

Attached below are the running logs, of listing aws via the aws-c-s3 (number of files is correctly 22k, listing via aws s3 command (again 22k) and the full output of the run that copied 11k files.
We have opened the bucket permissions so you can try on your own.

logs.zip

s3_exec.zip

Reproduction Steps

nohup time aws-c-s3/build/samples/s3/s3 cp s3://vl-sample-dataset-kitti/Kitti/ ~/Kitti --region us-east-2 &

Possible Solution

No response

Additional Information/Context

Interestingly, when looking at the number of download printouts,
grep "download: s3://vl-sample-dataset-kitti/Kitti/raw" nohup.out | sort -u | wc
I see somehitng between 34K to 44K printouts. Maybe due to multithreading?

aws-c-s3 version used

latest from repo compiled July 6, 2023

Compiler and version used

ubuntu@ip-172-31-30-217:~/Kitti$ cmake --version cmake version 3.16.3 CMake suite maintained and supported by Kitware (kitware.com/cmake).

Operating System and version

ubuntu@ip-172-31-30-217:~~/Kitti$ uname -a Linux ip-172-31-30-217 5.15.0-1039-aws #44~~20.04.1-Ubuntu SMP Thu Jun 22 12:21:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux, instance was t2.xlarge

The text was updated successfully, but these errors were encountered:

dbickson · 2023-07-07T07:10:14Z

filelist.zip

Those are the full files as downloaded by aws s3 sync vs s3 c demo (sorted). As can be seen there are many missing files without any specific pattern.

waahm7 · 2023-07-07T16:38:47Z

@dbickson Thank you for creating the issue with details. We will take a look and will let you know as soon as we have any updates.

TingDaoK · 2023-07-07T17:00:20Z

We have opened the bucket permissions so you can try on your own.

I still got an AccessDenied from s3://vl-sample-dataset-kitti/Kitti/

TingDaoK · 2023-07-07T18:49:49Z

I tried to create a bucket with the same file directory, but fake files with a single char in it.

I cannot reproduce the error you seen as well. I successfully downloaded all 22480 files with ./s3 cp -v ERROR -r us-west-2 s3://test-bucket-asd/raw/ ./test &>cp-out

I saw something in the output that's suspicious

�[1431B22.20user 17.14system 0:59.49elapsed 66%CPU (0avgtext+0avgdata 182988maxresident)k
16096inputs+12727272outputs (64major+161888minor)pagefaults 0swaps

And after that, you did another download of the same file? I am not sure why?

dbickson · 2023-07-07T18:56:20Z

Hi @TingDaoK can we do a short zoom session I can share my screen it takes less than a minute to reproduce the issue on my side.

according to our management console the bucket is public

Can the issue originate from the fact I am using t2.xlarge instance?

TingDaoK · 2023-07-07T19:02:45Z

Not sure. And from your nohup.out, we can actually see all 22480 files are downloaded. So, I guess something went wrong writing to the disk?

dbickson · 2023-07-07T19:09:33Z

HI @TingDaoK I have given additional bucket permissions, can you try accessing again.

dbickson · 2023-07-07T19:18:25Z

I agree in the nohup there are exactly 22,480 download unique filename printouts, which is correct. But not all of them are saved to disk.
grep download nohup.out | sort -u > 1
cut -f 2 1 -d ' '> 2
sort -u 2 | wc
22480 22480 1498642
ubuntu@ip-172-31-30-217:/mnt/data/wakar$ find ~/Kitti -type f | wc
11287 11287 571716

dbickson · 2023-07-07T19:43:04Z

HI @TingDaoK I am a little closer to breakthrough, I have added the following trace:
if (!transfer_ctx->output_sink) {
printf("Failed to download to %s\n", (char*)file_path);
return AWS_OP_ERR;
}

I see a lot of failures here which may explain the missing files. Please advise?

TingDaoK · 2023-07-07T19:53:38Z

Can you do -v ERROR, and attach the log to us?

TingDaoK · 2023-07-07T19:54:57Z

HI @TingDaoK I have given additional bucket permissions, can you try accessing again.

Cool. Now, I have access, I'll try to reproduce it with your bucket.

dbickson · 2023-07-07T19:57:13Z

Some more hints:

aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005993.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005994.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005995.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005996.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005997.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005998.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005999.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)

dbickson · 2023-07-07T20:03:24Z

hi @TingDaoK what is the correct way to increase the open file limit? I tried with ulimit but it did not work here.

TingDaoK · 2023-07-07T20:05:33Z

aha, so it's basically too many open files that hit the limits of file descriptor from system.

Easiest way maybe https://stackoverflow.com/questions/11342167/how-to-increase-ulimit-on-amazon-ec2-instance

dbickson · 2023-07-08T12:24:05Z

Issue is solved and now all files are received.
My only suggestion is to make -v ERROR the default since I was expecting some feedback regarding the problem and the run seems to finish fine.
Thanks again for your great support!

TingDaoK · 2023-07-10T15:37:21Z

I have an ongoing PR #330 to improve the error handling for samples. Should actually error out in your case with the change.

**Issue** The XML API was hard to use right, leading to bugs like this: #328 **Description of changes:** - Adapt to API changes from: awslabs/aws-c-common#1043 - Break up node traversal functions, to ensure we're processing the correct XML elements. - Previously, the same callback would be used for all XML elements. This could cause error if an element with the same name occurred at different parts of the document tree. - Improved error checking - Previously, many calls to `aws_xml_node_as_body()` weren't being checked for error. - Replace ~aws_xml_get_top_level_tag()~ and ~aws_xml_get_top_level_tag_with_root_name()~ with `aws_xml_get_body_at_path()` - ~aws_xml_get_top_level_tag()~ didn't check the name of the root node - ~aws_xml_get_top_level_tag_with_root_name()~ was clunky to use (IMHO) - so replace with an API that can retrieve an element at any depth (not just 2), checking names the whole way, and with a nicer API (IMHO). - new function gives `aws_byte_cursor` instead of `aws_string`, the user was usually just deleting it afterwards, which made their error-handling more complicated. - Trivial stuff: - Remove unused functions ~aws_s3_list_objects_operation_new()~ and ~aws_s3_initiate_list_parts()~ - `aws_replace_quote_entities()` returns `aws_byte_buf` by value, instead of as out-param - Some functions take `aws_byte_cursor` by value, instead taking `aws_string *` or `aws_byte_buf *` or `aws_byte_cursor *` by pointer

TingDaoK · 2023-07-27T22:11:55Z

We have updated the error handling here #332. It should properly error out now

dbickson added the bug Something isn't working label Jul 7, 2023

waahm7 added the p2 This is a standard priority issue label Jul 7, 2023

TingDaoK mentioned this issue Jul 10, 2023

Sample improve, exit when error happens instead of raise error #330

Closed

graebm mentioned this issue Jul 13, 2023

Adapt to XML API changes #332

Merged

TingDaoK closed this as completed Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to download full folder with 15K images and 7K text files #328

Fail to download full folder with 15K images and 7K text files #328

dbickson commented Jul 7, 2023 •

edited

Loading

dbickson commented Jul 7, 2023

waahm7 commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

dbickson commented Jul 7, 2023 •

edited

Loading

TingDaoK commented Jul 7, 2023

dbickson commented Jul 7, 2023

dbickson commented Jul 7, 2023

dbickson commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

dbickson commented Jul 7, 2023

dbickson commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

dbickson commented Jul 8, 2023

TingDaoK commented Jul 10, 2023

TingDaoK commented Jul 27, 2023

Fail to download full folder with 15K images and 7K text files #328

Fail to download full folder with 15K images and 7K text files #328

Comments

dbickson commented Jul 7, 2023 • edited Loading

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

aws-c-s3 version used

Compiler and version used

Operating System and version

dbickson commented Jul 7, 2023

waahm7 commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

dbickson commented Jul 7, 2023 • edited Loading

TingDaoK commented Jul 7, 2023

dbickson commented Jul 7, 2023

dbickson commented Jul 7, 2023

dbickson commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

dbickson commented Jul 7, 2023

dbickson commented Jul 7, 2023

TingDaoK commented Jul 7, 2023

dbickson commented Jul 8, 2023

TingDaoK commented Jul 10, 2023

TingDaoK commented Jul 27, 2023

dbickson commented Jul 7, 2023 •

edited

Loading

dbickson commented Jul 7, 2023 •

edited

Loading