This is a lambda function written using serverless framework ( This function will do the following -
- Scrape the following content from a url
- title of the page
- any image available on the page
- Store this data into an xlsx file and upload to S3
- Store a JSON representation of sccrapped data along with a signed url of the xlsx file uploaded
This repo uses these packages -
- cheerio to parse markup of the url (
- exceljs to create an xlsx file (
- serverless-local to run lambda locally (
- axio to download url content from remote (
npm install
Make sure that you have AWS access and secret key with correct permission set in the environment before you run this. You will also need a bucket in your AWS account and that bucket name should be added in environment variable named - AWS_BUCKET_NAME
in serverless.yml
npm start
Once its running, you can test this function locally from here - http://localhost:3000/local/scrapeContent?url=[url-of-a-webpage-you-want-to-scrape]
Make sure that you have AWS access and secret key with correct permission set in the environment before you run this.
sls deploy --stage [dev][prod] --region[any-aws-region-of-you-choice]