How to manipulate and data with JavaScript from server side
Table of Contents
- 🎯 Objective
- 🏗 Prerequisites
- 📱 How to scrape with Node.js? 1 example to do it
- 📦 Suggested node modules
- 👕 A complete Scraping Example for dedicatedbrand.com
- 👩💻 Just tell me what to do
- 🛣️ Related Theme and courses
Scrape products with Node.js and use JavaScript as server-side scripting to manipulate and interact with array, objects, functions...
- Be sure to have a clean working copy.
This means that you should not have any uncommitted local changes.
❯ cd /path/to/workspace/clear-fashion
❯ git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
- Pull the
master
branch to update your local with the new remote changes
❯ git remote add upstream git@github.com:92bondstreet/clear-fashion.git
## or ❯ git remote add upstream https://github.com/92bondstreet/clear-fashion
❯ git fetch upstream
❯ git pull upstream master
- Check the terminal output for the command
node sandbox.js
❯ cd /path/to/workspace/clear-fashion/server
## install dependencies
❯ yarn
## or ❯ npm install
❯ node sandbox.js
- If nothing happens or errors occur, check your node server installation (from Theme 2)
Let's try to scrape products from the e-shop brand Dedicated.
- Browse the website
- How the e-shop https://www.dedicatedbrand.com/en/ works?
- How can I access to the different products pages?
- What are the given properties for a
Product
: name, price, category, link...? - Check how that you can get list of Products: web page itself, api etc.... (Inspect Network Activity - with Chrome DevTools for instance - on any browser)
- Define the JSON object representation for a Product
- ...
- ...
Create a module called dedicatedbrand
that returns the list of Products for a given url page of Dedicated.
Example of page to scrape: https://www.dedicatedbrand.com/en/men/news
const dedicatedbrand = require('dedicatedbrand');
const products = dedicatedbrand.scrape('https://www.dedicatedbrand.com/en/men/news');
products.forEach(product => {
console.log(products.name);
})
- node-fetch - A light-weight module that brings Fetch API to Node.js.
- cheerio - Fast, flexible, and lean implementation of core jQuery designed specifically for the server.
- nodemon - Monitor for any changes in your node.js application and automatically restart the server - perfect for development
server/sources/dedicatedbrand.js contains a function to scrape a given Dedicated products page.
To start the example, call with node
cli or use the Makefile
target:
❯ cd /path/to/workspace/clear-fashion/server
❯ node sandbox.js
❯ node sandbox.js "https://www.dedicatedbrand.com/en/men/t-shirts"
❯ ## make sandbox
❯ ## ./node_modules/.bin/nodemon sandbox.js
const fetch = require('node-fetch');
const cheerio = require('cheerio');
/**
* Parse webpage e-shop
* @param {String} data - html response
* @return {Array} products
*/
const parse = data => {
const $ = cheerio.load(data);
return $('.productList-container .productList')
.map((i, element) => {
const name = $(element)
.find('.productList-title')
.text()
.trim()
.replace(/\s/g, ' ');
const price = parseInt(
$(element)
.find('.productList-price')
.text()
);
return {name, price};
})
.get();
};
/**
* Scrape all the products for a given url page
* @param {[type]} url
* @return {Array|null}
*/
module.exports.scrape = async url => {
try {
const response = await fetch(url);
if (response.ok) {
const body = await response.text();
return parse(body);
}
console.error(response);
return null;
} catch (error) {
console.error(error);
return null;
}
};
-
Scrape Products for the 3 Brands defined by the json file ../server/brands.json
-
Store the list into a JSON file
-
Commit your modification
❯ cd /path/to/workspace/clear-fashion
❯ git add -A && git commit -m "feat(shop): scrape new products"
(why following a commit message convention?)
- Commit early, commit often
- Don't forget to push before the end of the workshop
❯ git push origin master
Note: if you catch an error about authentication, add your ssh to your github profile.
- If you need some helps on git commands, read git - the simple guide