The ETL Analysis Toolkit is a collection of R functions for performing Extract, Transform, Load (ETL) tasks and data analysis on CSV files.
- Merge CSV Files: Combines two or more CSV files into a single file.
- Filter CSV Data: Filters data from a CSV file based on a provided condition.
- Remove Duplicates from CSV: Removes duplicate records from a CSV file.
- Analyze Data Distribution: Analyzes the distribution of a variable in a dataset.
- Analyze Data Correlation: Calculates and displays the correlation matrix between variables in a dataset.
- Analyze Time Series: Analyzes a time series, displaying the original series, trend, seasonal component, and residuals.
- Analyze Linear Regression: Performs simple linear regression between two variables and displays the regression model and plot.
- Forecast Time Series with ARIMA: Generates forecasts for a time series using the ARIMA model and displays the results.
The toolkit requires the following libraries:
- readr
- ggplot2
- forecast
Make sure you have the libraries installed before running the code.
- Clone the repository or download the
etl_analysis_toolkit.R
file. - Open the file in an R environment (such as RStudio) or run it in an R terminal.
- Make sure the required dependencies are installed.
- Call the
main_menu()
function to start the main menu. - Choose an option by entering the corresponding number and follow the instructions to provide the required parameters.
- The results will be displayed in the console or saved to files as specified.
- Ensure you provide the correct file paths and valid parameters to avoid errors.
- The toolkit is designed to handle CSV files. Other file formats are not supported.
Enjoy exploring the ETL Analysis Toolkit!