Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This takes the foundation I started in #223 that informed 18F's M-15-13 compliance analysis, and updates it for analyzing the impact of BOD 18-01.
The code is included in this PR. The data is very large (several GB), and I've uploaded it as a ~600MB
.tar.gz
as a "release" here. See the release details for more information on the directory/data structure, and which dates are included.About the data
The data is based on one set of historical scans from 2017-02-10 until 2017-09-28 (pre-BOD), and then on another set of historical scans from 2017-11-20 until 2018-04-28 (post-BOD). There was a 2 month gap between those two periods. Both were captured by pulse.cio.gov archiving its scan data to S3 over time. During that 2 month gap, I re-engineered the Pulse stack to capture more and better data.
The pre-BOD scans were less frequent, and performed on a smaller pool of data (~16K hosts) than the post-BOD scans (~26K hosts). The expansion of data sources captured more of the long tail, which lowered the overall %'s of good things a bit, so there's a clear inflection point at that time. 3DES detection is only captured during the post-BOD period.
None of this captures the time between M-15-13's release (~May 2015) and its own compliance deadline (Jan 2017). Unfortunately, the only good historical data over that time is for the parent domains, the graph for which can be seen on the M-15-13 analysis.
Note that the snapshot of subdomain data for December 31, 2016 contained in that blog post is of the ~26K set (which wasn't integrated into Pulse's automatic scans until 2017-11-20). Combined with the ~16K data from around 2017-02-10, it confirms that the ~26K set has generally lower numbers for HTTPS/HSTS support than the ~16K set.