This Python script generates fake donation data in CSV format, simulating contributions to a charity or non-profit organization. It's designed for data analysis projects that need structured datasets to explore metrics like donor acquisition, retention, and donation patterns over time.
📖 Data Dictionary
- donor_id: Unique ID
- donor_type: Individual or Organization
- donation_dates: Comma-separated donation dates
- donation_amounts: Comma-separated donation amounts
- acquisition channel: Source channel (Direct Mail, Online Event, etc.)
- age, gender, location: Donor demographics
This function takes three arguments: start_year, end_year, and total_dates. The first two are global variables that the user can modify. The total_dates is a randomly generated number between 1 and 10, which determines how many unique dates will be created for a given donor_id. Dates are generated by randomly selecting a year, month, and day, iterating the number of times specified by total_dates.
This function needs three inputs: num_donors, earliest_year, and latest_year, with the last two using the same global variables as generate_unique_dates. Each donor_id has a single row in the data table, with donation details organized into two columns: donation_date and donation_amount. Each of these columns holds multiple values for the donor, separated by commas. Each date in the donation_date column directly corresponds to an amount in the donation_amount column, so when separating the data, the user must make sure to keep the order of the values intact. The distribution of attributes reflects real life—like 30% of donors being organizations and 70% individuals.
There’s definitely room to improve this script, like adding weights to acquisition channels to better mimic real data. But it’s a solid foundation for exploring data cleaning, handling date formats, and creating visuals, making it a practical starting point for analysis projects.