Skip to content

Latest commit

 

History

History
79 lines (57 loc) · 1.52 KB

README.md

File metadata and controls

79 lines (57 loc) · 1.52 KB

SEO crawler

Crawl your website with javascript excuted.

Mordern web use lots of javascript but search engine crawler won't excute it. As a result pages can not be indexed correctly. So we crawl our site with js-excuted and serve it to search engine crawler.

Getting Start

Requirement

Install package

npm install

Modify config.js

For example save html snapshot in C:/snapshot/, target website is https://www.paddingleft.com/

const tasks = [{
    distFolder: 'C:/snapshot/',
    startUrl: 'https://www.paddingleft.com/'
}]

Start program

npm start

Developement

# Testing
npm test

# Testing in watch mode
npm run test:w

Debug

You could modify logLevel in config.js to see detail logs.

const logLevelPriority = {
    error: 0,
    warn: 1,
    info: 2,
    verbose: 3,
    debug: 4,
    silly: 5
}

Others

Register as a window service

  1. Download nssm
  2. Extract it and go to nssm/win64 folder
  3. Type nssm install seo-crawler from command prompt
  4. Select seo-crawler.bat as Application Path
  5. nssm start seo-crawler

Kill chrome process in command line

taskkill /F /IM chrome.exe

Run Chrome headless-ly on Windows

cd "C:\Program Files (x86)\Google\Chrome\Application"
chrome --remote-debugging-port=9222 --disable-gpu --headless

References

Chromeless