This repository contains a collection of scripts for various data scraping and processing tasks. Below is an overview of the scripts and their functionalities, organized by category.
Currently working on a bash script to automate these processes more efficiently.
Scripts that serve general purposes across different platforms.
- Purpose: Communicates with OpenAI's GPT model to retrieve answers based on input from an Excel file.
- Dependencies:
openai
- Purpose: Data bank containing scraped application data for various uses.
- Purpose: Calculates the Levenshtein distance and similarity ratio between application titles and descriptions.
Scripts specifically designed for scraping and analyzing data from the App Store.
- Purpose: Flask server that retrieves privacy information from an app's store listing on the App Store.
- Dependencies:
flask
,selenium
- Purpose: Determines whether an application targets children based on its title, description, and reviews.
- Dependencies:
app-store-scraper
by Facundo Olano
- Purpose: Fetch information from the App Store.
- Dependencies:
app-store-scraper
,app_store_scraper_server.py
- Purpose: Find matching applications on the App Store using the identified apps on the Play Store.
- Dependencies:
app-store-scraper
,app_store_scraper_server.py
Scripts for scraping data from Google Play.
- Purpose: Scrapes application data from Google Play.
- Dependencies:
google-play-scraper
Scripts designed for extracting and processing text from privacy policies.
- Purpose: Retrieves the text of a privacy policy from a provided URL.
- Dependencies:
trafilatura
- Purpose: Extract relevant paragraphs from privacy policy texts according to different data type categories.
- Dependencies:
spacy
,selenium