This project demonstrates advanced data analysis of e-commerce behavior using Google BigQuery, focusing on user interactions, purchasing patterns, and product performance metrics. The analysis encompasses various aspects of the e-commerce journey, from traffic sources to purchase completion, providing valuable insights for business decision-making.
- Comprehensive tracking of visits, pageviews, and transactions across different time periods
- Bounce rate analysis by traffic source to understand user engagement quality
- Deep dive into pageview patterns comparing purchasers vs. non-purchasers
- Multi-dimensional revenue tracking by traffic source, broken down by week and month
- Average transaction values and session spending patterns
- Customer purchase behavior and revenue generation metrics
- Detailed product-level analysis including cross-selling patterns
- Product view to purchase conversion funnel analysis
- Cohort mapping from product view to add-to-cart to purchase completion
This analysis provides actionable insights for:
- Marketing channel effectiveness
- User behavior patterns
- Product performance optimization
- Conversion funnel optimization
- Customer purchase patterns
- Revenue optimization opportunities
The analysis utilizes the Google Analytics sample dataset available in BigQuery:
- Dataset:
bigquery-public-data.google_analytics_sample
- Table: ga_sessions_*
- Time period: 2017
For detailed information about the Google Analytics dataset schema, please refer to the official Google Analytics documentation, available at at https://support.google.com/analytics/answer/3437719?hl=en ↩
Key data access patterns include:
- Basic session data: Direct access to ga_sessions_* table
- Product data: Requires UNNEST(hits) and UNNEST(hits.product)
- E-commerce actions: Requires UNNEST(hits) to access eCommerceAction
- Revenue data: Accessed through product.productRevenue after unnesting
The project leverages BigQuery's powerful features including:
- Complex data structures handling through UNNEST operations
- Advanced SQL aggregations and window functions
- Custom metrics calculations (bounce rates, conversion rates, etc.)
SELECT
FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.transactions) AS transactions
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
WHERE _table_suffix BETWEEN '0101' AND '0331'
GROUP BY month
ORDER BY month;
This table provides a clear summary of total visits, pageviews, and transactions for the first quarter of 2017. January and March show notably higher engagement, with March reaching the highest numbers across all metrics, particularly transactions (993). February reflects a slight dip, possibly indicating seasonal or user behavior fluctuations. Overall, this data effectively highlights monthly trends in user activity and transaction volume.
SELECT
trafficSource.source as source,
sum(totals.visits) as total_visits,
sum(totals.Bounces) as total_no_of_bounces,
round (sum (totals.bounces)/sum (totals.visits) * 100, 1) as bounce_rate
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
GROUP BY source
ORDER BY total_visits DESC;
The bounce rate varies significantly across sources. Major traffic sources like _Google (51.6%) and (direct) (43.3%) have _moderate bounce rates, indicating relatively engaged users. However, some sources, such as l.facebook.com (88.2%), duckduckgo.com (87.5%), and search.mysearch.com (91.7%), show very high bounce rates, suggesting lower user engagement. On the other hand, sources like plus.google.com (25%) and hangouts.google.com (20%) exhibit much lower bounce rates, reflecting higher interaction levels.
with
month_data as(
SELECT
"Month" as time_type,
format_date("%Y%m", parse_date("%Y%m%d", date)) as month,
trafficSource.source AS source,
SUM(p.productRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201706*`,
unnest(hits) hits,
unnest(product) p
WHERE p.productRevenue is not null
GROUP BY 1,2,3
),
week_data as(
SELECT
"Week" as time_type,
format_date("%Y%W", parse_date("%Y%m%d", date)) as week,
trafficSource.source AS source,
SUM(p.productRevenue)/1000000 AS revenue
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201706*`,
unnest(hits) hits,
unnest(product) p
WHERE p.productRevenue is not null
GROUP BY 1,2,3
)
select * from month_data
union all
select * from week_data
order by time_type, revenue DESC;
The table shows revenue data by month and week from various sources. In June 2017, the top revenue source was (direct) with 97,333.62, followed by google with 18,757.18. Weekly data reveals that (direct) consistently performs best, with a peak in Week 201724 at 30,908.91, while google maintains _steady contributions. Sources like phandroid.com and sites.google.com have lower revenue, suggesting areas for improvement. Overall, the data highlights key revenue drivers like _(direct) and google, and shows opportunities for optimizing other sources.
with
purchaser_data as(
select
format_date("%Y%m",parse_date("%Y%m%d",date)) as month,
(sum(totals.pageviews)/count(distinct fullvisitorid)) as avg_pageviews_purchase,
from `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
,unnest(hits) hits
,unnest(product) product
where _table_suffix between '0601' and '0731'
and totals.transactions>=1
and product.productRevenue is not null
group by month
),
non_purchaser_data as(
select
format_date("%Y%m",parse_date("%Y%m%d",date)) as month,
sum(totals.pageviews)/count(distinct fullvisitorid) as avg_pageviews_non_purchase,
from `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
,unnest(hits) hits
,unnest(product) product
where _table_suffix between '0601' and '0731'
and totals.transactions is null
and product.productRevenue is null
group by month
)
select
pd.*,
avg_pageviews_non_purchase
from purchaser_data pd
full join non_purchaser_data using(month)
order by pd.month;
The table shows that average pageviews for purchase sessions increased from June (94.02) to July (124.24), indicating higher engagement in purchasing activities in July. However, non-purchase sessions consistently have a higher average pageview count compared to purchase sessions, with the count increasing from June (316.87) to July (334.06). This trend might highlight opportunities to improve conversion strategies for non-purchase sessions.
select
format_date("%Y%m",parse_date("%Y%m%d",date)) as month,
sum(totals.transactions)/count(distinct fullvisitorid) as Avg_total_transactions_per_user
from `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
,unnest (hits) hits,
unnest(product) product
where totals.transactions>=1
and product.productRevenue is not null
group by month;
In July 2017, the average number of transactions per user was approximately 4.16. This metric can provide insight into user activity levels and transaction frequency, potentially useful for tracking engagement or setting benchmarks for future months.
select
format_date("%Y%m",parse_date("%Y%m%d",date)) as month,
((sum(product.productRevenue)/sum(totals.visits))/power(10,6)) as avg_revenue_by_user_per_visit
from `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
,unnest(hits) hits
,unnest(product) product
where product.productRevenue is not null
and totals.transactions>=1
group by month;
In July 2017, the average revenue generated by user per visit was approximately 43.86. This metric provides insight into the _revenue efficiency per visit, which can help assess user spending behavior and guide strategies to optimize revenue per visit.
Other products purchased by customers who purchased product "YouTube Men's Vintage Henley" in July 2017
with buyer_list as(
SELECT
distinct fullVisitorId
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
, UNNEST(hits) AS hits
, UNNEST(hits.product) as product
WHERE product.v2ProductName = "YouTube Men's Vintage Henley"
AND totals.transactions>=1
AND product.productRevenue is not null
)
SELECT
product.v2ProductName AS other_purchased_products,
SUM(product.productQuantity) AS quantity
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
, UNNEST(hits) AS hits
, UNNEST(hits.product) as product
JOIN buyer_list using(fullVisitorId)
WHERE product.v2ProductName != "YouTube Men's Vintage Henley"
and product.productRevenue is not null
GROUP BY other_purchased_products
ORDER BY quantity DESC;
This data lists other products purchased by customers who bought the "YouTube Men's Vintage Henley" in July 2017. The "Google Sunglasses" stands out as the most frequently purchased additional item, with a quantity of 20, followed by the "Google Women's Vintage Hero Tee Black" with 7 purchases and "SPF-15 Slim & Slender Lip Balm" with 6 purchases. Most other products have lower quantities, with the majority purchased just once or twice, highlighting a wide variety of additional interests among these customers.
with product_data as(
select
format_date('%Y%m', parse_date('%Y%m%d',date)) as month,
count(CASE WHEN eCommerceAction.action_type = '2' THEN product.v2ProductName END) as num_product_view,
count(CASE WHEN eCommerceAction.action_type = '3' THEN product.v2ProductName END) as num_add_to_cart,
count(CASE WHEN eCommerceAction.action_type = '6' and product.productRevenue is not null THEN product.v2ProductName END) as num_purchase
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
,UNNEST(hits) as hits
,UNNEST (hits.product) as product
where _table_suffix between '20170101' and '20170331'
and eCommerceAction.action_type in ('2','3','6')
group by month
order by month
)
select
*,
round(num_add_to_cart/num_product_view * 100, 2) as add_to_cart_rate,
round(num_purchase/num_product_view * 100, 2) as purchase_rate
from product_data;
The data shows the performance metrics for product engagement from January to March 2017. Overall, there is a positive trend in both the add-to-cart rate and purchase rate over the three months, with March showing the highest values at 37.29% and 12.64%, respectively. This indicates increasing customer interest and conversion effectiveness. Notably, the number of product views fluctuates, suggesting that marketing efforts or seasonal factors may influence customer engagement.
This project has provided valuable insights into consumer behavior in the e-commerce landscape through data analysis using Google Analytics on BigQuery. Key findings include:
- User Engagement: A noticeable increase in page views and transactions, particularly in March 2017, suggests improved user engagement. Ongoing optimization of marketing channels is essential.
- Traffic Source Analysis: Direct traffic has a lower bounce rate compared to other sources, indicating higher user interest. Sources with high bounce rates should be examined for content improvements.
- Product Performance: Variations in revenue and conversion rates among products highlight the need to optimize underperforming items to enhance overall sales.
- Shopping Behavior Patterns: Consumers are increasingly making multiple purchases per session, indicating a positive trend in spending habits.
- Optimization Opportunities: Continuous monitoring of high-performing traffic sources and addressing conversion weaknesses can significantly boost revenue and customer satisfaction.
Overall, these insights lay a foundation for strategic marketing and product development efforts, emphasizing the importance of ongoing data analysis for sustained success in e-commerce.