-
Notifications
You must be signed in to change notification settings - Fork 14
#####Q: Do you have any plan to create a GUI? A: Yes, I have. However I can't tell you when since I work on this project on free time and I don't have a lot of it
#####Q: How are tweets organized? A: The main idea behind Dump Scraper is to work on files saved on the filesystem. Every step (scrape, organize, classify) will create new files instead of moving the existing ones. In this way, if anything goes wrong or we improve the algorithm, you'll be able to work again on the same files, you simply have to delete the output.
All dumps will be stored under the data
directory:
data
`- organized
`- hash
`- YYYY-MM-DD
`- <tweet id>.txt
`- plain
`- trash
`- processed
`- hash
`- YYYY-MM-DD
`- <tweet id>.txt
`- plain
`- raw
`- YYYY-MM-DD
`- <tweet id>.txt
`- features.csv
-
raw
Stores all the dumps downloaded from PasteBin, creating one directory for each day -
organize
Contains the dumps divided into three categories: Trash (files we don't care), Plain (files with plain passwords) and Hash (files with encrypted passwords) -
processed
Contains the final result: on each line there will be one hash or plain password, depending by the original type
Dump Scraper and its documentation are Copyright © 2015-2016 Davide Tampellini / FabbricaBinaria.
Dump Scraper is Open Source Software, distributed under the GNU General Public License, version 3 of the license, or (at your option) any later version.
The Dump Scraper Wiki content is provided under the GNU Free Documentation License, version 1.3 of the license, or (at your option) any later version.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found on the GNU site.