Skip to content

Commit

Permalink
Merge pull request #78 from Wooble/counter5_wip
Browse files Browse the repository at this point in the history
Counter5 wip
  • Loading branch information
Wooble authored Mar 29, 2019
2 parents b203c81 + ac4da2f commit c285f31
Show file tree
Hide file tree
Showing 14 changed files with 571 additions and 21 deletions.
101 changes: 101 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,105 @@
# Changelog
## 1.1.1 (2019-03-01)

### Bugfixes
* SUSHI: Deal with missing institutional identifier for customer. [Geoffrey Spear]

(see Issue #72. This should catch the exception involved in the
issue, but there may be a wider issue with the Gale report since
the "customer" object involved should not be None if there's
an actual report to read, since the report itself is in the
Customer XML element.)


### Code quality/CI

* Add pre-commit black hook, format all code with black. [Geoffrey Spear]

* Python 3.7 support. [Geoffrey Spear]

* Replace arrow library with pendulum. [Geoffrey Spear]

* Pyup: try updating all deps. [Geoffrey Spear]


## 1.1.0 (2018-08-03)

### Other

* SUSHI: Verify SSL certs by default. [Geoffrey Spear]

(Bumps version to 1.1 because this could be a breaking change for
sites that rely on requests to broken servers working without a flag)

Fixes #67

* Don't try to do string formatting on given output file name. [Geoffrey Spear]

Fixes #58


## 1.0.3 (2018-08-01)

### Bugfixes

* Negate bash regex match correctly. [Geoffrey Spear]

Issue #63

* Retry SUSHI reports if "Report Queued" message is returned. [Geoffrey Spear]

(This is kind of an ugly hack that looks for this string in the
raw XML. A nicer fix will be possible with a fix for #3 )

### Docs
* Help for sushiclient --nodelay option (which probably shouldn't actually be used, but was helpful for testing without making the rest suite wait 60 seconds) [Geoffrey Spear]


### Code quality/CI

* Exclude builds for flake8/lint/manifest? [Geoffrey Spear]
Issue #63

* Only use pylint version 1. [Geoffrey Spear]

(version 2 drops py2 support)


## 1.0.2 (2018-05-11)

### Bugfixes
* Fix incorrect first_date_col for DB1 reports. [James Fournie]

The first date column in a DB1 report should actually have index 5 (6th column). See: https://www.projectcounter.org/code-of-practice-sections/usage-reports/#databases

### Tests

* Add failing test for PR #60. [Geoffrey Spear]

* Add test for gaps in stats being output correctly. [Geoffrey Spear]

### Code quality/CI

* Create pyup.io config file. [pyup-bot]

* Flake8: Fix whitespace after comma. [James Fournie]

* Run 2.7 flake8 with 2.7; correct matrix syntax. [Geoffrey Spear]

* Do linting through tox instead of directly in .travis.yml. [Geoffrey Spear]

* Docs: pypi stuff. [Geoffrey Spear]

fix pypi link

remove some outdated advice on running setup.py install directly

link to PyPA installing packages page


## 1.0.1 (2018-04-06)

* Use universal wheel. [Geoffrey Spear]

## 1.0.0

Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ recursive-include docs *.gitkeep
recursive-include docs *.py
recursive-include docs *.rst
recursive-include pycounter *.csv
recursive-include pycounter *.json
recursive-include pycounter *.tsv
recursive-include pycounter *.xlsx
recursive-include pycounter *.xml
Expand Down
16 changes: 16 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ Licensed under the MIT license. See the file LICENSE for details.
pycounter is tested on Python 2.7, 3.4, 3.5, 3.6, 3.7 and pypy2 (if you're still
stuck on Python 2.6 or 3.3, please use version 0.16.1 of pycounter)

pycounter 2.x will be the last version with support for Python 2.

Documentation is on `Read the Docs <http://pycounter.readthedocs.io>`_.


Expand All @@ -55,6 +57,19 @@ From inside the source distribution:
Probably do all of this in a virtualenv. `The PyPA <https://packaging.python.org/tutorials/installing-packages/>`_
has a good explanation of how to get started.)


COUNTER 5 Note
--------------

In this alpha release, reports are output in COUNTER 4 format with COUNTER 5 data,
which is wrong, and probably not a valid apples-to-apples comparison since, for example,
TR_J1 excludes Gold Open Access counts that would be included in JR1, and also has
HTML and PDF columns that will always be 0 because these are no longer reported.

Before the final 2.0 release, it will be capable of producing actual COUNTER 5 reports,
probably with an API for getting COUNTER 4 style data compatible with scripts that
were making assumptions about the data received to pass it into another system.

Usage
-----

Expand Down Expand Up @@ -100,3 +115,4 @@ Our code is automatically styled using black. To install the pre-commit hook:
pip install pre-commit

pre-commit install

14 changes: 14 additions & 0 deletions pycounter/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
u"and Page-Type (formatted for normal browsers/delivered "
u"to mobile devices and for mobile devices/delivered to "
u"mobile devices)",
u"TR_J1": u'Journal Requests (Excluding "OA_Gold")',
}

HEADER_FIELDS = {
Expand Down Expand Up @@ -173,6 +174,19 @@
u"Access denied category",
u"Reporting Period Total",
),
# FIXME: this is outputting counter 5 reports in 4 format for... reasons.
"TR_J1": (
u"Journal",
u"Publisher",
u"Platform",
u"Journal DOI",
u"Proprietary Identifier",
u"Print ISSN",
u"Online ISSN",
u"Reporting Period Total",
u"Reporting Period HTML",
u"Reporting Period PDF",
),
}

TOTAL_TEXT = {
Expand Down
67 changes: 47 additions & 20 deletions pycounter/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -643,40 +643,63 @@ def parse_generic(report_reader):
"""
report = CounterReport()

report.report_type, report.report_version = _get_type_and_version(
six.next(report_reader)[0]
)
first_line = six.next(report_reader)
if first_line[0] == "Report_Name": # COUNTER 5 report
second_line = six.next(report_reader)
third_line = six.next(report_reader)
report.report_type, report.report_version = _get_c5_type_and_version(
first_line, second_line, third_line
)
else:
report.report_type, report.report_version = _get_type_and_version(first_line[0])

# noinspection PyTypeChecker
report.metric = METRICS.get(report.report_type)
if report.report_version != 5:
# noinspection PyTypeChecker
report.metric = METRICS.get(report.report_type)

report.customer = six.next(report_reader)[0]
report.customer = six.next(report_reader)[1 if report.report_version == 5 else 0]

if report.report_version == 4:
if report.report_version >= 4:
inst_id_line = six.next(report_reader)
if inst_id_line:
report.institutional_identifier = inst_id_line[0]
report.institutional_identifier = inst_id_line[
1 if report.report_version == 5 else 0
]
if report.report_type == "BR2":
report.section_type = inst_id_line[1]

six.next(report_reader)
if report.report_version == 5:
for _ in range(3):
six.next(report_reader)

covered_line = six.next(report_reader)
report.period = convert_covered(covered_line[0])
report.period = convert_covered(
covered_line[1 if report.report_version == 5 else 0]
)

six.next(report_reader)
if report.report_version < 5:
six.next(report_reader)

date_run_line = six.next(report_reader)
report.date_run = convert_date_run(date_run_line[0])
report.date_run = convert_date_run(
date_run_line[1 if report.report_version == 5 else 0]
)

if report.report_version == 5:
for _ in range(2):
# Skip Created_By and blank line
six.next(report_reader)

header = six.next(report_reader)

try:
report.year = _year_from_header(header, report)
except AttributeError:
warnings.warn("Could not determine year from malformed header")
if report.report_version < 5:
try:
report.year = _year_from_header(header, report)
except AttributeError:
warnings.warn("Could not determine year from malformed header")

if report.report_version == 4:
if report.report_version >= 4:
countable_header = header[0:8]
for col in header[8:]:
if col:
Expand All @@ -693,7 +716,7 @@ def parse_generic(report_reader):
end_date = last_day(convert_date_column(header[last_col - 1]))
report.period = (start_date, end_date)

if report.report_type != "DB1":
if report.report_type != "DB1" and report.report_version != 5:
six.next(report_reader)

if report.report_type == "DB2":
Expand Down Expand Up @@ -723,8 +746,8 @@ def _parse_line(line, report, last_col):
doi = ""
prop_id = ""

if report.report_version == 4:
if report.report_type.startswith("JR1"):
if report.report_version >= 4:
if report.report_type.startswith("JR1") or report.report_type == "TR_J1":
old_line = line
line = line[0:3] + line[5:7] + line[10:last_col]
doi = old_line[3]
Expand Down Expand Up @@ -761,7 +784,7 @@ def _parse_line(line, report, last_col):
for data in line[5:]:
month_data.append((curr_month, format_stat(data)))
curr_month = next_month(curr_month)
if report.report_type.startswith("JR"):
if report.report_type.startswith("JR") or report.report_type == "TR_J1":
return CounterJournal(
metric=report.metric,
month_data=month_data,
Expand Down Expand Up @@ -809,6 +832,10 @@ def _get_type_and_version(specifier):
return report_type, report_version


def _get_c5_type_and_version(first_line, second_line, third_line):
return second_line[1], int(third_line[1])


def _year_from_header(header, report):
"""Get the year for the report from the header.
Expand Down
5 changes: 5 additions & 0 deletions pycounter/sushi.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@
import requests
import six

from pycounter import sushi5
import pycounter.constants
import pycounter.exceptions
from pycounter.helpers import convert_date_run
import pycounter.report


logger = logging.getLogger(__name__)
NS = pycounter.constants.NS

Expand Down Expand Up @@ -130,6 +132,9 @@ def get_report(*args, **kwargs):
:param no_delay: don't delay in retrying Report Queued
"""
if kwargs.get("release") == 5:
return sushi5.get_report(*args, **kwargs)

no_delay = kwargs.pop("no_delay", False)
delay_amount = 0 if no_delay else 60
while True:
Expand Down
Loading

0 comments on commit c285f31

Please sign in to comment.