A synthetic dataset comprising 1 million rows was generated for this purpose, encompassing the following attributes,
- Club Member: Indicates whether the customer is a club member.
- Campaign: Represents the campaign the customer is associated with.
- State: Denotes the state where the customer resides.
- Month: Indicates the month of purchase.
- Case Count: The number of cases raised by the customer.
- Case Type Return: Specifies whether the customer returned any product in the last year.
- Case Type Shipment Damaged: Indicates whether the customer experienced any shipment damage in the last year.
- Engagement Score: Reflects the level of customer engagement, including responses to mailing campaigns, logins to the online platform, etc.
- Tenure: This represents the number of years the customer has been part of NT.
- Clicks: The average number of clicks the customer made within one week before purchase.
- Pages Visited: The average number of page visits the customer made within one week before purchase.
- Product Purchased: Specifies the product purchased by the customer.
In a real-life scenario, DataCloud can be utilized to ingest data from various sources, employing powerful batch and streaming transformational capabilities to create a robust dataset for model training.
The dataset can be accessed here. Afterward, you have the option to upload the CSV file to an S3 bucket.
Here are the steps to create Data Streams from S3 in Salesforce:
Log in to the org
Navigate to "Data Streams" and click "New"
Select "Amazon S3" and click on Next
Enter S3 bucket and file details
Click Next
Click Next
Click on Full Refresh
Select Frequency = "None"
Click Deploy to create data stream