-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
057df39
commit 9cbf1d4
Showing
5 changed files
with
187 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,83 @@ | ||
HTML Miner | ||
========== | ||
|
||
[data:image/s3,"s3://crabby-images/fe5f4/fe5f446b676d0281f3b9b198a69e74d3baaf27de" alt="Build Status"](https://travis-ci.org/marcomontalbano/html-miner) | ||
|
||
|
||
Install | ||
------- | ||
|
||
```sh | ||
# using yarn | ||
yarn add html-miner | ||
|
||
# using npm | ||
npm i --save html-miner | ||
``` | ||
|
||
|
||
Example | ||
------- | ||
|
||
We have following html snippet and we want to fetch the `title`. | ||
|
||
```html | ||
<div class="jumbotron"> | ||
<div class="container"> | ||
<h1 class="display-3">Hello, world!</h1> | ||
<p>This is a template for a simple marketing or informational website. It includes a large callout called a jumbotron and three supporting pieces of content. Use it as a starting point to create something more unique.</p> | ||
<p><a class="btn btn-primary btn-lg" href="#" role="button">Learn more »</a></p> | ||
</div> | ||
</div> | ||
<div class="container"> | ||
<div class="row"> | ||
<div class="col-md-4"> | ||
<h2>Heading</h2> | ||
<p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p> | ||
<p><a class="btn btn-secondary" href="#" role="button">View details »</a></p> | ||
</div> | ||
<div class="col-md-4"> | ||
<h2>Heading</h2> | ||
<p>Donec id elit non mi porta gravida at eget metus. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Etiam porta sem malesuada magna mollis euismod. Donec sed odio dui. </p> | ||
<p><a class="btn btn-secondary" href="#" role="button">View details »</a></p> | ||
</div> | ||
<div class="col-md-4"> | ||
<h2>Heading</h2> | ||
<p>Donec sed odio dui. Cras justo odio, dapibus ac facilisis in, egestas eget quam. Vestibulum id ligula porta felis euismod semper. Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus.</p> | ||
<p><a class="btn btn-secondary" href="#" role="button">View details »</a></p> | ||
</div> | ||
</div> | ||
|
||
<hr> | ||
|
||
<footer> | ||
<p>© Company 2017</p> | ||
</footer> | ||
</div> | ||
``` | ||
|
||
```javascript | ||
const htmlMiner = require('html-miner'); | ||
|
||
let json = htmlMiner(html, { | ||
title : 'h1', | ||
headings : 'h2', | ||
greet : $ => { return 'Hi!' } | ||
}); | ||
|
||
console.log( json ); | ||
// { | ||
// title : 'Hello, world!', | ||
// headings : ['Heading', 'Heading', 'Heading'], | ||
// greet : 'Hi!' | ||
// } | ||
``` | ||
|
||
|
||
Development | ||
----------- | ||
|
||
```sh | ||
yarn | ||
yarn test | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,15 +1,36 @@ | ||
'use strict'; | ||
|
||
const cheerio = require('cheerio'); | ||
const _ = require('lodash'); | ||
|
||
module.exports = (html, selectors) => { | ||
module.exports = (html, originalSelector) => { | ||
|
||
if ( ! _.isString( originalSelector ) && ! _.isArrayLike( originalSelector ) && ! _.isObjectLike( originalSelector ) ) { | ||
throw new Error("'selector' must be string, array or object"); | ||
} | ||
|
||
const $ = cheerio.load(html); | ||
|
||
let selector = _.isString(originalSelector) ? {default:originalSelector} : originalSelector; | ||
|
||
let elements = []; | ||
$( selectors ).each((i, el) => { | ||
elements.push( $(el).text() ); | ||
}); | ||
let data = _.isArrayLike(originalSelector) ? [] : {}; | ||
_.each(selector, (value, key) => { | ||
|
||
return elements.length > 1 ? elements : elements[0]; | ||
if ( _.isFunction( value ) ) { | ||
elements.push( value.apply(this, [$, data]) ); | ||
} | ||
|
||
if ( _.isString( value ) ) { | ||
$( value ).each((i, el) => { | ||
elements.push( $(el).text().replace(/\s+\n+\s+/g, "\n").trim() ); | ||
}); | ||
} | ||
|
||
data[key] = elements.length > 1 ? elements : elements[0]; | ||
elements = []; | ||
|
||
}); | ||
|
||
return _.isString(originalSelector) ? data.default : data; | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters