forked from awslabs/open-data-registry
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patheot-web-archive.yaml
33 lines (33 loc) · 1.42 KB
/
eot-web-archive.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Name: End of Term Web Archive Dataset
Description: >
The End of Term Web Archive (EOT) captures and saves U.S.
Government websites at the end of presidential administrations. The EOT has
thus far preserved websites from administration changes in 2008, 2012, 2016,
and 2020. Data from these web crawls have been made openly available in
several formats in this dataset.
Documentation: https://eotarchive.org/data/
Contact: Mark Phillips <mark.phillips@unt.edu>, Sawood Alam <sawood@archive.org>
ManagedBy: "[End of Term Web Archive](https://eotarchive.org)"
UpdateFrequency: Every four years after a US Presidentaial Election
Tags:
- aws-pds
- natural language processing
- internet
- web archive
- archives
License: >
There are no restrictions on the use, access, and/or download of data
from the End of Term Web Archive Dataset. We request that you cite the End of
Term Web Archive project when using the data provided from this
dataset. <br/><br/> [Creative Commons
Zero](https://creativecommons.org/publicdomain/zero/1.0/)
Resources:
- Description: Web Archive Crawl Data (WARC and ARC formats)
ARN: arn:aws:s3:::eotarchive
Region: us-east-1
Type: S3 Bucket
DataAtWork:
Publications:
- Title: Moving the End of Term Web Archive to the Cloud to Encourage Research Use and Reuse
URL: https://digital.library.unt.edu/ark:/67531/metadc1998717/
AuthorName: Mark Phillips and Sawood Alam