Skip to content

Used SQL in Google BigQuery to analyze e-commerce data focusing on user interactions, shopping patterns, and product performance metrics to drive marketing decisions.

Notifications You must be signed in to change notification settings

bichngocbui/SQL-in-BigQuery-E-commerce-Website-Performance-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

SQL - BigQuery - E-commerce Website Performance Analysis

Project Overview

This project demonstrates advanced data analysis of e-commerce behavior using Google BigQuery, focusing on user interactions, purchasing patterns, and product performance metrics. The analysis encompasses various aspects of the e-commerce journey, from traffic sources to purchase completion, providing valuable insights for business decision-making.

Key Analysis Areas

Traffic and Engagement Analysis

  • Comprehensive tracking of visits, pageviews, and transactions across different time periods
  • Bounce rate analysis by traffic source to understand user engagement quality
  • Deep dive into pageview patterns comparing purchasers vs. non-purchasers

Revenue Analysis

  • Multi-dimensional revenue tracking by traffic source, broken down by week and month
  • Average transaction values and session spending patterns
  • Customer purchase behavior and revenue generation metrics

Product Performance

  • Detailed product-level analysis including cross-selling patterns
  • Product view to purchase conversion funnel analysis
  • Cohort mapping from product view to add-to-cart to purchase completion

Business Value

This analysis provides actionable insights for:

  • Marketing channel effectiveness
  • User behavior patterns
  • Product performance optimization
  • Conversion funnel optimization
  • Customer purchase patterns
  • Revenue optimization opportunities

Data access & Structure

Dataset information

The analysis utilizes the Google Analytics sample dataset available in BigQuery:

  • Dataset: bigquery-public-data.google_analytics_sample
  • Table: ga_sessions_*
  • Time period: 2017

Data Schema

For detailed information about the Google Analytics dataset schema, please refer to the official Google Analytics documentation, available at at https://support.google.com/analytics/answer/3437719?hl=en ↩

Data Access Patterns

Key data access patterns include:

  • Basic session data: Direct access to ga_sessions_* table
  • Product data: Requires UNNEST(hits) and UNNEST(hits.product)
  • E-commerce actions: Requires UNNEST(hits) to access eCommerceAction
  • Revenue data: Accessed through product.productRevenue after unnesting

Technical Implementation

The project leverages BigQuery's powerful features including:

  • Complex data structures handling through UNNEST operations
  • Advanced SQL aggregations and window functions
  • Custom metrics calculations (bounce rates, conversion rates, etc.)

Exploring the dataset

Query 1: Calculate total visit, pageview, transaction for Jan, Feb and March 2017

Syntax

SELECT 
    FORMAT_DATE('%Y%m', PARSE_DATE('%Y%m%d', date)) AS month,
    SUM(totals.visits) AS visits, 
    SUM(totals.pageviews) AS pageviews,   
    SUM(totals.transactions) AS transactions
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
WHERE _table_suffix BETWEEN '0101' AND '0331'
GROUP BY month
ORDER BY month;

Result

image

This table provides a clear summary of total visits, pageviews, and transactions for the first quarter of 2017. January and March show notably higher engagement, with March reaching the highest numbers across all metrics, particularly transactions (993). February reflects a slight dip, possibly indicating seasonal or user behavior fluctuations. Overall, this data effectively highlights monthly trends in user activity and transaction volume.

Bounce rate per traffic source in July 2017

Syntax

SELECT
    trafficSource.source as source,
    sum(totals.visits) as total_visits,
    sum(totals.Bounces) as total_no_of_bounces,
    round (sum (totals.bounces)/sum (totals.visits) * 100, 1) as bounce_rate
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
GROUP BY source
ORDER BY total_visits DESC;

Result

image

The bounce rate varies significantly across sources. Major traffic sources like _Google (51.6%) and (direct) (43.3%) have _moderate bounce rates, indicating relatively engaged users. However, some sources, such as l.facebook.com (88.2%), duckduckgo.com (87.5%), and search.mysearch.com (91.7%), show very high bounce rates, suggesting lower user engagement. On the other hand, sources like plus.google.com (25%) and hangouts.google.com (20%) exhibit much lower bounce rates, reflecting higher interaction levels.

Revenue by traffic source by week, by month in June 2017

Syntax

with 
month_data as(
  SELECT
    "Month" as time_type,
    format_date("%Y%m", parse_date("%Y%m%d", date)) as month,
    trafficSource.source AS source,
    SUM(p.productRevenue)/1000000 AS revenue
  FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201706*`,
    unnest(hits) hits,
    unnest(product) p
  WHERE p.productRevenue is not null
  GROUP BY 1,2,3
),

week_data as(
  SELECT
    "Week" as time_type,
    format_date("%Y%W", parse_date("%Y%m%d", date)) as week,
    trafficSource.source AS source,
    SUM(p.productRevenue)/1000000 AS revenue
  FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201706*`,
    unnest(hits) hits,
    unnest(product) p
  WHERE p.productRevenue is not null
  GROUP BY 1,2,3
)

select * from month_data
union all
select * from week_data
order by time_type, revenue DESC;

Result

image

The table shows revenue data by month and week from various sources. In June 2017, the top revenue source was (direct) with 97,333.62, followed by google with 18,757.18. Weekly data reveals that (direct) consistently performs best, with a peak in Week 201724 at 30,908.91, while google maintains _steady contributions. Sources like phandroid.com and sites.google.com have lower revenue, suggesting areas for improvement. Overall, the data highlights key revenue drivers like _(direct) and google, and shows opportunities for optimizing other sources.

Average number of pageviews by purchaser type (purchasers vs non-purchasers) in June, July 2017

Syntax

with 
purchaser_data as(
  select
      format_date("%Y%m",parse_date("%Y%m%d",date)) as month,
      (sum(totals.pageviews)/count(distinct fullvisitorid)) as avg_pageviews_purchase,
  from `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
    ,unnest(hits) hits
    ,unnest(product) product
  where _table_suffix between '0601' and '0731'
  and totals.transactions>=1
  and product.productRevenue is not null
  group by month
),

non_purchaser_data as(
  select
      format_date("%Y%m",parse_date("%Y%m%d",date)) as month,
      sum(totals.pageviews)/count(distinct fullvisitorid) as avg_pageviews_non_purchase,
  from `bigquery-public-data.google_analytics_sample.ga_sessions_2017*`
      ,unnest(hits) hits
    ,unnest(product) product
  where _table_suffix between '0601' and '0731'
  and totals.transactions is null
  and product.productRevenue is null
  group by month
)

select
    pd.*,
    avg_pageviews_non_purchase
from purchaser_data pd
full join non_purchaser_data using(month)
order by pd.month;

Result

image

The table shows that average pageviews for purchase sessions increased from June (94.02) to July (124.24), indicating higher engagement in purchasing activities in July. However, non-purchase sessions consistently have a higher average pageview count compared to purchase sessions, with the count increasing from June (316.87) to July (334.06). This trend might highlight opportunities to improve conversion strategies for non-purchase sessions.

Average number of transactions per user that made a purchase in July 2017

Syntax

select
    format_date("%Y%m",parse_date("%Y%m%d",date)) as month,
    sum(totals.transactions)/count(distinct fullvisitorid) as Avg_total_transactions_per_user
from `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
    ,unnest (hits) hits,
    unnest(product) product
where  totals.transactions>=1
and product.productRevenue is not null
group by month;

Result

image

In July 2017, the average number of transactions per user was approximately 4.16. This metric can provide insight into user activity levels and transaction frequency, potentially useful for tracking engagement or setting benchmarks for future months.

Average amount of money spent per session. Only include purchaser data in July 2017

Syntax

select
    format_date("%Y%m",parse_date("%Y%m%d",date)) as month,
    ((sum(product.productRevenue)/sum(totals.visits))/power(10,6)) as avg_revenue_by_user_per_visit
from `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
  ,unnest(hits) hits
  ,unnest(product) product
where product.productRevenue is not null
and totals.transactions>=1
group by month;

Result

image

In July 2017, the average revenue generated by user per visit was approximately 43.86. This metric provides insight into the _revenue efficiency per visit, which can help assess user spending behavior and guide strategies to optimize revenue per visit.

Other products purchased by customers who purchased product "YouTube Men's Vintage Henley" in July 2017

Syntax

with buyer_list as(
    SELECT
        distinct fullVisitorId
    FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
    , UNNEST(hits) AS hits
    , UNNEST(hits.product) as product
    WHERE product.v2ProductName = "YouTube Men's Vintage Henley"
    AND totals.transactions>=1
    AND product.productRevenue is not null
)

SELECT
  product.v2ProductName AS other_purchased_products,
  SUM(product.productQuantity) AS quantity
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
, UNNEST(hits) AS hits
, UNNEST(hits.product) as product
JOIN buyer_list using(fullVisitorId)
WHERE product.v2ProductName != "YouTube Men's Vintage Henley"
 and product.productRevenue is not null
GROUP BY other_purchased_products
ORDER BY quantity DESC;

Result

image

This data lists other products purchased by customers who bought the "YouTube Men's Vintage Henley" in July 2017. The "Google Sunglasses" stands out as the most frequently purchased additional item, with a quantity of 20, followed by the "Google Women's Vintage Hero Tee Black" with 7 purchases and "SPF-15 Slim & Slender Lip Balm" with 6 purchases. Most other products have lower quantities, with the majority purchased just once or twice, highlighting a wide variety of additional interests among these customers.

Calculate cohort map from product view to addtocart to purchase in Jan, Feb and March 2017

Syntax

with product_data as(
select
    format_date('%Y%m', parse_date('%Y%m%d',date)) as month,
    count(CASE WHEN eCommerceAction.action_type = '2' THEN product.v2ProductName END) as num_product_view,
    count(CASE WHEN eCommerceAction.action_type = '3' THEN product.v2ProductName END) as num_add_to_cart,
    count(CASE WHEN eCommerceAction.action_type = '6' and product.productRevenue is not null THEN product.v2ProductName END) as num_purchase
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
,UNNEST(hits) as hits
,UNNEST (hits.product) as product
where _table_suffix between '20170101' and '20170331'
and eCommerceAction.action_type in ('2','3','6')
group by month
order by month
)

select
    *,
    round(num_add_to_cart/num_product_view * 100, 2) as add_to_cart_rate,
    round(num_purchase/num_product_view * 100, 2) as purchase_rate
from product_data;

Result

image

The data shows the performance metrics for product engagement from January to March 2017. Overall, there is a positive trend in both the add-to-cart rate and purchase rate over the three months, with March showing the highest values at 37.29% and 12.64%, respectively. This indicates increasing customer interest and conversion effectiveness. Notably, the number of product views fluctuates, suggesting that marketing efforts or seasonal factors may influence customer engagement.

Conclusion

This project has provided valuable insights into consumer behavior in the e-commerce landscape through data analysis using Google Analytics on BigQuery. Key findings include:

  • User Engagement: A noticeable increase in page views and transactions, particularly in March 2017, suggests improved user engagement. Ongoing optimization of marketing channels is essential.
  • Traffic Source Analysis: Direct traffic has a lower bounce rate compared to other sources, indicating higher user interest. Sources with high bounce rates should be examined for content improvements.
  • Product Performance: Variations in revenue and conversion rates among products highlight the need to optimize underperforming items to enhance overall sales.
  • Shopping Behavior Patterns: Consumers are increasingly making multiple purchases per session, indicating a positive trend in spending habits.
  • Optimization Opportunities: Continuous monitoring of high-performing traffic sources and addressing conversion weaknesses can significantly boost revenue and customer satisfaction.

Overall, these insights lay a foundation for strategic marketing and product development efforts, emphasizing the importance of ongoing data analysis for sustained success in e-commerce.

About

Used SQL in Google BigQuery to analyze e-commerce data focusing on user interactions, shopping patterns, and product performance metrics to drive marketing decisions.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published