This project aims to provide an interface for exploring datasets hosted on OpenML (https://www.openml.org/search?type=data&sort=runs&status=active). The interface allows users to search for datasets based on various criteria and visualize the search results. It utilizes Python for accessing OpenML data and Dash for creating the web interface.
- Dataset Filtering: Users can filter datasets based on various criteria, such as date range, number of features, and number of instances.
- Pagination: The application supports pagination for large datasets, enabling users to navigate through multiple pages of results.
- Interactive Visualization: Users can click on dataset items to view detailed information, including histograms and summary statistics for features.
- Error Handling: The application includes error handling for invalid input ranges and displays error messages to users when necessary.
-
Clone the repository to your local machine:
git clone https://github.com/yourusername/dataset-explorer.git
-
Navigate to the project directory:
cd dataset-explorer
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the application:
python main.py
-
Open a web browser and go to
http://localhost:8050
to access the application. -
Use the filtering options to refine the dataset selection.
-
Click on dataset items to view detailed information and visualizations.
To adjust the cache directory for storing OpenML data, modify the CACHE_DIRECTORY
variable in the code:
Contributions are welcome! If you find any bugs or have suggestions for improvements, please open an issue or submit a pull request.
Special thanks to the Dash framework and the Plotly team for providing powerful tools for building interactive web applications in Python.