Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to download full folder with 15K images and 7K text files #328

Closed
dbickson opened this issue Jul 7, 2023 · 17 comments
Closed

Fail to download full folder with 15K images and 7K text files #328

dbickson opened this issue Jul 7, 2023 · 17 comments
Labels
bug Something isn't working p2 This is a standard priority issue

Comments

@dbickson
Copy link

dbickson commented Jul 7, 2023

Describe the bug

We are following the example by compiling the s3 demo copy program and running it on ubuntu 20.04 using the command line:
nohup time aws-c-s3/build/samples/s3/s3 cp s3://vl-sample-dataset-kitti/Kitti/ ~/Kitti --region us-east-2 &

The program runs fine without error but the number of received file is around 11K namely half the files are missing.

# Kitti is the folder of received files with s3 c demo
ubuntu@ip-172-31-30-217:~/Kitti$ du -sh .
6.1G	.
ubuntu@ip-172-31-30-217:~/Kitti$ find . -type f | wc
  11647   11647  391999

# Kitti2 is the data downloaded using aws s3 sync
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ find ~/Kitti2/ | wc
  22487   22487 1161674
ubuntu@ip-172-31-30-217:/mnt/data/crtsdk$ du -sh ~/Kitti2
12G	/home/ubuntu/Kitti2

Expected Behavior

All files should be copied locally

Current Behavior

Only 50% of the files are copied, there is no error.
The machine has enough disk space for the copy:

ubuntu@ip-172-31-30-217:~$ df -k .
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/root      304681132 245898460  58766288  81% /
ubuntu@ip-172-31-30-217:~$

Attached below are the running logs, of listing aws via the aws-c-s3 (number of files is correctly 22k, listing via aws s3 command (again 22k) and the full output of the run that copied 11k files.
We have opened the bucket permissions so you can try on your own.

logs.zip

s3_exec.zip

Reproduction Steps

nohup time aws-c-s3/build/samples/s3/s3 cp s3://vl-sample-dataset-kitti/Kitti/ ~/Kitti --region us-east-2 &

Possible Solution

No response

Additional Information/Context

Interestingly, when looking at the number of download printouts,
grep "download: s3://vl-sample-dataset-kitti/Kitti/raw" nohup.out | sort -u | wc
I see somehitng between 34K to 44K printouts. Maybe due to multithreading?

aws-c-s3 version used

latest from repo compiled July 6, 2023

Compiler and version used

ubuntu@ip-172-31-30-217:~/Kitti$ cmake --version cmake version 3.16.3 CMake suite maintained and supported by Kitware (kitware.com/cmake).

Operating System and version

ubuntu@ip-172-31-30-217:/Kitti$ uname -a Linux ip-172-31-30-217 5.15.0-1039-aws #4420.04.1-Ubuntu SMP Thu Jun 22 12:21:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux, instance was t2.xlarge

@dbickson dbickson added the bug Something isn't working label Jul 7, 2023
@dbickson
Copy link
Author

dbickson commented Jul 7, 2023

filelist.zip

Those are the full files as downloaded by aws s3 sync vs s3 c demo (sorted). As can be seen there are many missing files without any specific pattern.

@waahm7
Copy link
Contributor

waahm7 commented Jul 7, 2023

@dbickson Thank you for creating the issue with details. We will take a look and will let you know as soon as we have any updates.

@waahm7 waahm7 added the p2 This is a standard priority issue label Jul 7, 2023
@TingDaoK
Copy link
Contributor

TingDaoK commented Jul 7, 2023

We have opened the bucket permissions so you can try on your own.

I still got an AccessDenied from s3://vl-sample-dataset-kitti/Kitti/

@TingDaoK
Copy link
Contributor

TingDaoK commented Jul 7, 2023

I tried to create a bucket with the same file directory, but fake files with a single char in it.

I cannot reproduce the error you seen as well. I successfully downloaded all 22480 files with ./s3 cp -v ERROR -r us-west-2 s3://test-bucket-asd/raw/ ./test &>cp-out

I saw something in the output that's suspicious

�[1431B22.20user 17.14system 0:59.49elapsed 66%CPU (0avgtext+0avgdata 182988maxresident)k
16096inputs+12727272outputs (64major+161888minor)pagefaults 0swaps

And after that, you did another download of the same file? I am not sure why?

@dbickson
Copy link
Author

dbickson commented Jul 7, 2023

Hi @TingDaoK can we do a short zoom session I can share my screen it takes less than a minute to reproduce the issue on my side.

Screen Shot 2023-07-07 at 21 57 51

according to our management console the bucket is public

Can the issue originate from the fact I am using t2.xlarge instance?

@TingDaoK
Copy link
Contributor

TingDaoK commented Jul 7, 2023

Not sure. And from your nohup.out, we can actually see all 22480 files are downloaded. So, I guess something went wrong writing to the disk?

@dbickson
Copy link
Author

dbickson commented Jul 7, 2023

HI @TingDaoK I have given additional bucket permissions, can you try accessing again.

@dbickson
Copy link
Author

dbickson commented Jul 7, 2023

I agree in the nohup there are exactly 22,480 download unique filename printouts, which is correct. But not all of them are saved to disk.
grep download nohup.out | sort -u > 1
cut -f 2 1 -d ' '> 2
sort -u 2 | wc
22480 22480 1498642
ubuntu@ip-172-31-30-217:/mnt/data/wakar$ find ~/Kitti -type f | wc
11287 11287 571716

@dbickson
Copy link
Author

dbickson commented Jul 7, 2023

HI @TingDaoK I am a little closer to breakthrough, I have added the following trace:
if (!transfer_ctx->output_sink) {
printf("Failed to download to %s\n", (char*)file_path);
return AWS_OP_ERR;
}

I see a lot of failures here which may explain the missing files. Please advise?

@TingDaoK
Copy link
Contributor

TingDaoK commented Jul 7, 2023

Can you do -v ERROR, and attach the log to us?

@TingDaoK
Copy link
Contributor

TingDaoK commented Jul 7, 2023

HI @TingDaoK I have given additional bucket permissions, can you try accessing again.

Cool. Now, I have access, I'll try to reproduce it with your bucket.

@dbickson
Copy link
Author

dbickson commented Jul 7, 2023

Some more hints:

aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005993.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005994.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005995.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005996.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005997.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005998.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)
[ERROR] [2023-07-07T19:56:51Z] [00007f0146ffd700] [common-io] - static: Failed to open file. path:'/home/ubuntu/Kitti/raw/testing/image_2/005999.png' mode:'wb' errno:24 aws-error:45(AWS_ERROR_MAX_FDS_EXCEEDED)

@dbickson
Copy link
Author

dbickson commented Jul 7, 2023

hi @TingDaoK what is the correct way to increase the open file limit? I tried with ulimit but it did not work here.

@TingDaoK
Copy link
Contributor

TingDaoK commented Jul 7, 2023

aha, so it's basically too many open files that hit the limits of file descriptor from system.

Easiest way maybe https://stackoverflow.com/questions/11342167/how-to-increase-ulimit-on-amazon-ec2-instance

@dbickson
Copy link
Author

dbickson commented Jul 8, 2023

Issue is solved and now all files are received.
My only suggestion is to make -v ERROR the default since I was expecting some feedback regarding the problem and the run seems to finish fine.
Thanks again for your great support!

@TingDaoK
Copy link
Contributor

I have an ongoing PR #330 to improve the error handling for samples. Should actually error out in your case with the change.

graebm added a commit that referenced this issue Jul 18, 2023
**Issue**
The XML API was hard to use right, leading to bugs like this: #328

**Description of changes:**
- Adapt to API changes from: awslabs/aws-c-common#1043
- Break up node traversal functions, to ensure we're processing the correct XML elements.
    - Previously, the same callback would be used for all XML elements. This could cause error if an element with the same name occurred at different parts of the document tree.
- Improved error checking
    - Previously, many calls to `aws_xml_node_as_body()` weren't being checked for error.
- Replace ~aws_xml_get_top_level_tag()~ and ~aws_xml_get_top_level_tag_with_root_name()~ with `aws_xml_get_body_at_path()`
   - ~aws_xml_get_top_level_tag()~ didn't check the name of the root node
   - ~aws_xml_get_top_level_tag_with_root_name()~ was clunky to use (IMHO)
   - so replace with an API that can retrieve an element at any depth (not just 2), checking names the whole way, and with a nicer API (IMHO).
   - new function gives `aws_byte_cursor` instead of `aws_string`, the user was usually just deleting it afterwards, which made their error-handling more complicated.
- Trivial stuff:
    - Remove unused functions ~aws_s3_list_objects_operation_new()~ and ~aws_s3_initiate_list_parts()~
    - `aws_replace_quote_entities()` returns `aws_byte_buf` by value, instead of as out-param
    - Some functions take `aws_byte_cursor` by value, instead taking `aws_string *` or `aws_byte_buf *` or `aws_byte_cursor *` by pointer
@TingDaoK
Copy link
Contributor

We have updated the error handling here #332. It should properly error out now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants