Skip to content

Latest commit

 

History

History
187 lines (136 loc) · 4.29 KB

glacier_restore.md

File metadata and controls

187 lines (136 loc) · 4.29 KB

Retrieving data from Glacier

These instructions assume that the project was archived using aws_backup.sh. This process is best done on an ec2 instance with a large enough volume.

Create a temp directory to unarchive

mkdir ~/ebs_tmp/
cd ~/ebs_tmp

Clone this repo

git clone https://github.com/broadinstitute/imaging-backup-scripts.git

Define variables

PROJECT_NAME=2015_10_05_DrugRepurposing_AravindSubramanian_GolubLab_Broad
BATCH_ID=2016_04_01_a549_48hr_batch1

Create a list of plates to be restored. Here, use the shortened name that is used throughout the rest of the profiling workflow e.g. use SQ00015147, not SQ00015147__2016-05-17T07_49_53-Measurement1:

echo "SQ00015147" > list_of_plates.txt

Run the retrieval process.

In this example, we retrieve only the backend (--get_backend). To restore only images, use --get_images. To restore both, use both flags.

cd imaging-backup-scripts
parallel \
  --results restore \
  -a ../list_of_plates.txt \
  ./glacier_restore.sh \
  --project_name ${PROJECT_NAME} \
  --batch_id ${BATCH_ID} \
  --plate_id {1} \
  --get_backend

The retrieval may take several hours. Check status again in a few hours and ensure that all files are available. To do so, run the same command as above but with the --check_status flag:

parallel \
  --results restore \
  -a ../list_of_plates.txt \
  ./glacier_restore.sh \
  --project_name ${PROJECT_NAME} \
  --batch_id ${BATCH_ID} \
  --plate_id {1} \
  --get_backend \
  --check_status

This creates an stdout file per plate at restore/1/<plate_id>/stdout. If a request has been made, you'll receive a response (in stdout) similar to the following if the restore is still in progress

> "Restore": "ongoing-request=\"true\""
> "StorageClass": "GLACIER"

After the restore is complete, the response is similar to the following

"Restore": "ongoing-request=\"false\", expiry-date=\"Sun, 13 Aug 2017 00:00:00 GMT\""

If no request has been made, "Restore" key will be absent

Once all files have been restored, download the backend files from Glacier.

First, collect the URLs

cd ~/ebs_tmp
parallel -a list_of_plates.txt "grep ^Download imaging-backup-scripts/restore/1/{1}/stdout|sed s,Download:,,1" > url_list.txt

Do the same for the MD5 checksum files

parallel -a list_of_plates.txt "grep MD5Download imaging-backup-scripts/restore/1/{1}/stdout|sed s,MD5Download:,,1" > md5_url_list.txt

Next, download these files

parallel -a url_list.txt aws s3 cp {1} .
parallel -a md5_url_list.txt aws s3 cp {1} .

Uncompress the files

For backend:

TARSET=backend

For images, illum, and analysis:

TARSET=images_illum_analysis
parallel -a list_of_plates.txt tar -xvzf ${PROJECT_NAME}_${BATCH_ID}_{1}_${TARSET}.tar.gz

Verify the md5

parallel -a list_of_plates.txt \
  "md5sum ${PROJECT_NAME}_${BATCH_ID}_{1}_${TARSET}.tar.gz > ${PROJECT_NAME}_${BATCH_ID}_{1}_${TARSET}.md5.local"
parallel -a list_of_plates.txt \
  diff \
  ${PROJECT_NAME}_${BATCH_ID}_{1}_${TARSET}.md5.local \
  ${PROJECT_NAME}_${BATCH_ID}_{1}_${TARSET}.md5 > md5_diffs.txt

Inspect md5_diffs.txt and confirm that there are no diffs.

Sync to S3 bucket (if you want to restore to the original location on s3://imaging-platform).

WARNING: Be cautious because this step overwrites files at the destination

For backend

parallel \
  -a list_of_plates.txt \
  aws s3 sync \
  ${PROJECT_NAME}_${BATCH_ID}_{1}/${PROJECT_NAME}/workspace/backend/${BATCH_ID}/ \
  s3://imaging-platform/projects/${PROJECT_NAME}/workspace/backend/${BATCH_ID}/

For analysis:

parallel \
  -a list_of_plates.txt \
  aws s3 sync \
  ${PROJECT_NAME}_${BATCH_ID}_{1}/${PROJECT_NAME}/workspace/analysis/${BATCH_ID}/ \
  s3://imaging-platform/projects/${PROJECT_NAME}/workspace/analysis/${BATCH_ID}/

For images:

parallel \
  -a list_of_plates.txt \
  aws s3 sync \
  ${PROJECT_NAME}_${BATCH_ID}_{1}/${PROJECT_NAME}/${BATCH_ID}/images/ \
  s3://imaging-platform/projects/${PROJECT_NAME}/${BATCH_ID}/images/

For illum:

parallel \
  -a list_of_plates.txt \
  aws s3 sync \
  ${PROJECT_NAME}_${BATCH_ID}_{1}/${PROJECT_NAME}/${BATCH_ID}/illum/ \
  s3://imaging-platform/projects/${PROJECT_NAME}/${BATCH_ID}/illum/