This script is used to scrape data from most of the wikis on fandom.com. It retrieves only the text contained in a given div, and can be easily adjusted via class/Analyzer.mjs
.
npm install
npm start
pnpm install
pnpm start
You can change :
- The source fandom
- The source of the page containing the register of all pages
- The name of the subfolder to be created in
out/
- The name of the file containing the scrapped page content
- the name of the file containing the history of links present on the wiki.
// Base url of some fandom's wiki ex: https://some-wiki.fandom.com without '/' at end.
const from = "https://naruto.fandom.com";
// https://some-wiki.fandom.com/wiki/Special:AllPages or https://some-wiki.fandom.com/fr/wiki/Sp%C3%A9cial:Toutes_les_pages
const entry_point_from_all_pages =
"https://naruto.fandom.com/fr/wiki/Sp%C3%A9cial:Toutes_les_pages?from=%22Gaara%22...%21%21";
// Name of the subfolder to be created in out/ (Default: some-wiki relative to "from" variable).
const sub_dir = new URL(from).hostname.split(".")[0];
// Data file name.
const filename_data = `${sub_dir}-data.json`;
// History file name.
const filename_history = `${sub_dir}-history.json`;
I want to scrape the contents of the Solo Leveling fandom wiki page.
const from = "https://solo-leveling.fandom.com";
const entry_point_from_all_pages =
"https://solo-leveling.fandom.com/fr/wiki/Sp%C3%A9cial:Toutes_les_pages";
- Preparing future links to visit in order to capture text data.
- Retrieving page content.
- Collecting data in
out/solo-leveling/solo-leveling-data.json
.
This project is licensed by MIT.