Skip to content

This is a python script that generates fake data in CSV format for data analysis projects.

Notifications You must be signed in to change notification settings

K-Bloch/data-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

data-generator

This Python script generates fake donation data in CSV format, simulating contributions to a charity or non-profit organization. It's designed for data analysis projects that need structured datasets to explore metrics like donor acquisition, retention, and donation patterns over time.

📖 Data Dictionary
  • donor_id: Unique ID
  • donor_type: Individual or Organization
  • donation_dates: Comma-separated donation dates
  • donation_amounts: Comma-separated donation amounts
  • acquisition channel: Source channel (Direct Mail, Online Event, etc.)
  • age, gender, location: Donor demographics

Functions

generate_unique_dates

This function takes three arguments: start_year, end_year, and total_dates. The first two are global variables that the user can modify. The total_dates is a randomly generated number between 1 and 10, which determines how many unique dates will be created for a given donor_id. Dates are generated by randomly selecting a year, month, and day, iterating the number of times specified by total_dates.

generate_donor_data

This function needs three inputs: num_donors, earliest_year, and latest_year, with the last two using the same global variables as generate_unique_dates. Each donor_id has a single row in the data table, with donation details organized into two columns: donation_date and donation_amount. Each of these columns holds multiple values for the donor, separated by commas. Each date in the donation_date column directly corresponds to an amount in the donation_amount column, so when separating the data, the user must make sure to keep the order of the values intact. The distribution of attributes reflects real life—like 30% of donors being organizations and 70% individuals.

Notes

There’s definitely room to improve this script, like adding weights to acquisition channels to better mimic real data. But it’s a solid foundation for exploring data cleaning, handling date formats, and creating visuals, making it a practical starting point for analysis projects.

About

This is a python script that generates fake data in CSV format for data analysis projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages