Skip to content
This repository has been archived by the owner on May 30, 2021. It is now read-only.

Commit

Permalink
Merge pull request #83 from sotetsuk/develop-v0.1.0
Browse files Browse the repository at this point in the history
Develop v0.1.0
  • Loading branch information
sotetsuk committed May 7, 2016
2 parents 63ab769 + 34f55f4 commit c2c154e
Show file tree
Hide file tree
Showing 19 changed files with 1,109 additions and 691 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
.idea/*
*.iml
go-scholar
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ go:
before_install:
- go get github.com/axw/gocov/gocov
- go get github.com/mattn/goveralls
- go get github.com/Sirupsen/logrus
- if ! go get code.google.com/p/go.tools/cmd/cover; then go get golang.org/x/tools/cmd/cover; fi

install:
- go get github.com/PuerkitoBio/goquery
- go get github.com/docopt/docopt-go
- go get github.com/PuerkitoBio/goquery
- go get github.com/Sirupsen/logrus

script:
- $HOME/gopath/bin/goveralls -repotoken $COVERALLS_TOKEN

env:
global:
secure: "RCNzxxPKVQJ38Jve0dvu+VP8Guwuvm6wmgNrOIIFkDWUnKf8ZpCNpv4cqeFA3PLOWWuoa3EHxLIcnae9HkRIx1Be3uUQ+e8pq3TU0bS3yL5GtMG20DUVwS7Ely6m0fIxsi/wBM845zP94wgpgSmFV2osXwbtvbW5gJGJ3Aq7C6rQlu5gaP33OMvzV1/aSIDHWgxD2UMgV40ScuySl17fS8ab0Mnd+Jvf7NRSNFxB0Q3aysl+M5KYpAgKsvu+TYs5loqeF1OeVLtmJAe85sxpoCuCC87/lrrOzE8LVb9LLcYka6Rtrs67RXXwn9YFtSQHct2zm9FHQqUm9udUn8eaoJaSrq/L6ZDkIRS4STCU1MzYpulB+c5ujb3InqdmhdFKY6RxqgEROq5IOyXtUflD3eQno93jsEasq4V8clfvOyAlaRVhBhBECupP6q9rLTV28/qgBC3QF0P07Ia1sIIauXwiRtkFHSCqgPy6jdyxTDrl5xqpSnk7iR0A1Y9UdJXvRMOWIBUl0/YIzcKy9UT7Dz4NRKB+22gemimYmVZ6Wfh73rsiptYY1Jjzl41tXLC3KQ2kPqWRuH7x1q4sbdKaazBS/1RLHobmcHnH23oSO36+ClgM+pxsvIJUcKWMkKPiFIEE9Zqhwf71ksqJz87boa3B1wMI1WjqlqpR/tY9xnw="
secure: "OEFJ2jmTPx7LI3bvTdg2g2Not1PAFTYSBojun+XQyaeKRKO05CPn2mEdFAhtmEe9BwsXqk/TcidlFtAJDv1kS0r1J3wblGiYt6o/dRMmeHVG216udLxtABgKbvXt/Awz9PUD5n/82oZ+kdTc/Qk7GqjjiYhflcbs9dpJ00boWTXw2m6a6yK37RcKlyMBMZH1sHl3747WcaD1yqm+abciF0c9UfGU61fy9Up/M9bPWii7+S9V9jaHcHbqPDXmMLYWamFfWSE2+WMkvL05yuTt7KNW0VSPyYppCthkQyJ9CNa9zCIA75EyEwX/HRoDRIV75CyD1/Nzf9afHxzfZKLGa4o/uQCHNKiHL82kwmuPHvtWaC7gwN1cJ0cU46FfdhL7m4OcCrVOwDBd7cjWp/GH/dAKh8/BQIHyDL0PMuaaWFOO0GJYU6HQqWO/DsdtE4KXUFtHKzxOkfYrnHsucU9d730DFzPbMqUrub613EBxZbdp9td/DMsxNFpb3JNxqK9mbDhO+AjfHiy/yvWQuAmbjZIZkUfcCpTPrxeXngypkpjp40bkTp+lJlS8RyPP8rDO34eJDdzE90hIEvqc+d+Otv8gH7eUKXMqcRnMOdr/iIhfiF7dxUiznwMNkduu/ZOfvFg6YwdA9VJMh/vTUUZXx4SqM/VwEjuuUh3zMEdoExw="
158 changes: 114 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,103 +1,173 @@
[![Build Status](https://travis-ci.org/sotetsuk/go-scholar.svg?branch=master)](https://travis-ci.org/sotetsuk/go-scholar)
[![Coverage Status](https://coveralls.io/repos/github/sotetsuk/go-scholar/badge.svg?branch=master)](https://coveralls.io/github/sotetsuk/go-scholar?branch=master)
[![GoDoc](https://godoc.org/github.com/sotetsuk/goscholar?status.svg)](https://godoc.org/github.com/sotetsuk/goscholar)
[![Build Status](https://travis-ci.org/sotetsuk/goscholar.svg?branch=master)](https://travis-ci.org/sotetsuk/goscholar)
[![Coverage Status](https://coveralls.io/repos/github/sotetsuk/goscholar/badge.svg?branch=master)](https://coveralls.io/github/sotetsuk/goscholar?branch=master)
[![license](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)]()
[![GitHub version](https://badge.fury.io/gh/sotetsuk%2Fgo-scholar.svg)](https://badge.fury.io/gh/sotetsuk%2Fgo-scholar)
[![GitHub version](https://badge.fury.io/gh/sotetsuk%2Fgoscholar.svg)](https://badge.fury.io/gh/sotetsuk%2Fgoscholar)

# go-scholar
**Go**ogle **Scholar** crawler and scraper written in **Go**
# goscholar
**Go**ogle **Scholar** scraper written in **Go**


## Install

Assume that `$GOPATH` is set and `$PATH` includes `$GOPATH/bin`.

```sh
$ go get github.com/sotetsuk/goscholar
```
$ go get github.com/sotetsuk/go-scholar
$ go-scholar -h

for command line:

```sh
$ go get github.com/sotetsuk/goscholar/cmd/goscholar
$ goscholar -h
```

## Feature

- API for Go
- API for command line
- search by keywords, title, and author
- find by ```<cluster-id>```
- fetch citing articles of ```<cluster-id>```
- crawl recursively (**not implemented yet**)
- search the articles citing ```<cluster-id>```
- JSON output

## Example
## Go API

### Example

```go
// create Query and generate URL
q := Query{Keywords:"deep learning", Author:"y bengio"}
url = q.SearchUrl()

// fetch document sending the request to the URL
doc, err := goscholar.Fetch(url)
if err != nil {
log.Error(err)
return
}

// parse articles
ch := make(chan *goscholar.Article, 10)
go goscholar.ParseDocument(ch, doc)

for a := range ch {
fmt.Println(a)
}
```
$ go-scholar search --title "Deep learning via Hessian-free optimization" --num 1 | python -mjson.tool

## Command line API

### Example

```sh
$ goscholar search --keywords "deep learning nature" --author "y bengio" --after 2015 --num 1 | python -mjson.tool
[
{
"ClusterId": "5362332738201102290",
"InfoId": "0qfs6zbVakoJ",
"Link": {
"Format": "PDF",
"Name": "psu.edu",
"Url": "http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.436.894&rep=rep1&type=pdf"
},
"NumCite": "390",
"NumVer": "7",
"Title": {
"Name": "Deep learning",
"Url": "http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.html"
},
"Year": "2015"
}
]
```

```sh
$ goscholar find 15502119379559163003 | python -mjson.tool
[
{
"ClusterId": "15502119379559163003",
"InfoId": "e6RSJHGXItcJ",
"NumberOfCitations": "260",
"NumberOfVersions": "9",
"PDFLink": "http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_Martens10.pdf",
"PDFSource": "wustl.edu",
"Title": "Deep learning via Hessian-free optimization",
"URL": "http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_Martens10.pdf",
"Link": {
"Format": "PDF",
"Name": "wustl.edu",
"Url": "http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_Martens10.pdf"
},
"NumCite": "260",
"NumVer": "",
"Title": {
"Name": "Deep learning via Hessian-free optimization",
"Url": "http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_Martens10.pdf"
},
"Year": "2010"
}
]
]
```


```
$ go-scholar find 8108748482885444188 | python -mjson.tool
```sh
$ goscholar cite 15502119379559163003 --num 1 | python -mjson.tool
[
{
"ClusterId": "8108748482885444188",
"InfoId": "XOJff8gPiHAJ",
"NumberOfCitations": "376",
"NumberOfVersions": "",
"PDFLink": "",
"PDFSource": "",
"Title": "Learning in science: A comparison of deep and surface approaches",
"URL": "http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1098-2736(200002)37:2%3C109::AID-TEA3%3E3.0.CO;2-7/abstract",
"Year": "2000"
"ClusterId": "3674494786452480182",
"InfoId": "tmCGO4pt_jIJ",
"Link": {
"Format": "PDF",
"Name": "toronto.edu",
"Url": "http://www.cs.toronto.edu/~asamir/papers/SPM_DNN_12.pdf"
},
"NumCite": "1452",
"NumVer": "27",
"Title": {
"Name": "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups",
"Url": "http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6296526"
},
"Year": "2012"
}
]
```

## Usage
(This article cites 15502119379559163003=Deep learning via Hessian-free optimization)

### Usage

```
$ go-scholar -h
go-scholar: Google Scholar crawler and scraper written in Go
Usage:
go-scholar search [--author=<author>] [--title=<title>] [--query=<query>]
go-scholar search [--keywords=<keywords>] [--author=<author>] [--title=<title>]
[--after=<year>] [--before=<year>] [--num=<num>] [--start=<start>]
[--json|--bibtex]
go-scholar find <cluster-id> [--json|--bibtex]
go-scholar cite <cluster-id> [--after=<year>] [--before=<year>] [--num=<num>] [--start=<start>] [--json|--bibtex]
go-scholar find <cluster-id>
go-scholar cite <cluster-id> [--after=<year>] [--before=<year>] [--num=<num>] [--start=<start>]
go-scholar -h | --help
go-scholar --version
Query-options:
<cluster-id>
--keywords=<keywords>
--author=<author>
--title=<title>
--query=<query>
Search-options:
--after=<year>
--before=<year>
--num=<num>
--start=<start>
Output-options:
--json
--bibtex
Others:
-h --help
--version
```

## Dependencies

- [github.com/docopt/docopt-go](https://github.com/docopt/docopt-go)
- [github.com/PuerkitoBio/goquery](https://github.com/PuerkitoBio/goquery)
- [github.com/Sirupsen/logrus](https://github.com/PuerkitoBio/goquery)

## Related Work
goscholar is inspired by [scholar.py](https://github.com/ckreibich/scholar.py)

## Contribute
Contritubing is more than welcome! See [Issues](https://github.com/sotetsuk/go-scholar/issues) for what is required.
Contritubing is more than welcome! See [Issues](https://github.com/sotetsuk/goscholar/issues) for what is required.

## License
MIT License
Loading

0 comments on commit c2c154e

Please sign in to comment.