Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BOD 18-01 (web) analysis #247

Merged
merged 16 commits into from
Jun 20, 2018
Merged

BOD 18-01 (web) analysis #247

merged 16 commits into from
Jun 20, 2018

Conversation

konklone
Copy link
Contributor

This takes the foundation I started in #223 that informed 18F's M-15-13 compliance analysis, and updates it for analyzing the impact of BOD 18-01.

The code is included in this PR. The data is very large (several GB), and I've uploaded it as a ~600MB .tar.gz as a "release" here. See the release details for more information on the directory/data structure, and which dates are included.

About the data

The data is based on one set of historical scans from 2017-02-10 until 2017-09-28 (pre-BOD), and then on another set of historical scans from 2017-11-20 until 2018-04-28 (post-BOD). There was a 2 month gap between those two periods. Both were captured by pulse.cio.gov archiving its scan data to S3 over time. During that 2 month gap, I re-engineered the Pulse stack to capture more and better data.

The pre-BOD scans were less frequent, and performed on a smaller pool of data (~16K hosts) than the post-BOD scans (~26K hosts). The expansion of data sources captured more of the long tail, which lowered the overall %'s of good things a bit, so there's a clear inflection point at that time. 3DES detection is only captured during the post-BOD period.

None of this captures the time between M-15-13's release (~May 2015) and its own compliance deadline (Jan 2017). Unfortunately, the only good historical data over that time is for the parent domains, the graph for which can be seen on the M-15-13 analysis.

Note that the snapshot of subdomain data for December 31, 2016 contained in that blog post is of the ~26K set (which wasn't integrated into Pulse's automatic scans until 2017-11-20). Combined with the ~16K data from around 2017-02-10, it confirms that the ~26K set has generally lower numbers for HTTPS/HSTS support than the ~16K set.

@konklone konklone merged commit 69f747b into master Jun 20, 2018
@konklone konklone deleted the bod-18-01-analysis branch June 20, 2018 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant