Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create notebook to update collection hrefs #112

Merged
merged 5 commits into from
Apr 8, 2024
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions transformation-scripts/update-hrefs.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Notebook to update hrefs in particular collections"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import json"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"AWS_ACCESS_KEY_ID = \"[CHANGE ME]\"\n",
"AWS_SECRET_ACCESS_KEY = \"[CHANGE ME]\"\n",
"AWS_SESSION_TOKEN = \"[CHANGE ME]\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s3_client = boto3.client(\n",
" \"s3\",\n",
" aws_access_key_id=AWS_ACCESS_KEY_ID,\n",
" aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n",
" aws_session_token=AWS_SESSION_TOKEN,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `update_json_href` function takes in a bucket_name, s3_prefix, old_href_substring and new_href_substring"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def update_json_href(\n",
" bucket_name, collection_name, old_href_substring, new_href_substring\n",
"):\n",
" \"\"\"Given the bucket name, s3 prefix,\n",
" update all hrefs in the path bucketname/s3_prefix\n",
" and update all the old_href_substring to be new_href_substring.\n",
"\n",
" Keyword arguments:\n",
" bucket_name -- the s3 bucket name\n",
" collection_name -- the collection name\n",
" old_href_substring -- the string to replace in href\n",
" new_href_substring -- the new href substring\n",
" \"\"\"\n",
" s3 = s3_client\n",
" s3_prefix = f\"{collection_name}/\"\n",
"\n",
" response = s3.list_objects_v2(Bucket=bucket_name, Prefix=s3_prefix)\n",
"\n",
" json_keys = [\n",
" item[\"Key\"] for item in response[\"Contents\"] if item[\"Key\"].endswith(\".json\")\n",
" ]\n",
"\n",
" for key in json_keys:\n",
" response = s3.get_object(Bucket=bucket_name, Key=key)\n",
" json_data = response[\"Body\"].read().decode(\"utf-8\")\n",
"\n",
" data = json.loads(json_data)\n",
" for assets_key in data[\"assets\"]:\n",
" # Update href property\n",
" data[\"assets\"][assets_key][\"href\"] = data[\"assets\"][assets_key][\n",
" \"href\"\n",
" ].replace(old_href_substring, new_href_substring)\n",
"\n",
" # Serialize updated JSON\n",
" updated_json = json.dumps(data)\n",
"\n",
" # Upload updated JSON file back to S3, commented out the line below so it doesn't actually upload\n",
" s3.put_object(Bucket=bucket_name, Key=key, Body=updated_json)\n",
" print(f\"Updated {key}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next cell calls update_json_href to update the hlsl30-ej-reprocessed and hlss30-ej-reprocessed collections from the veda-data-store bucket. Specifically, it updates the href instances of \"covid-eo-data\" to \"veda-data-store\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"update_json_href(\n",
" \"veda-data-store\", \"hlsl30-ej-reprocessed\", \"covid-eo-data\", \"veda-data-store\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"update_json_href(\n",
" \"veda-data-store\", \"hlss30-ej-reprocessed\", \"covid-eo-data\", \"veda-data-store\"\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading