Hi, I am a junior linguist with a current interest in computational stylometry. As this area of interest requires a deep understanding of linguistics, statistics, and computational methods, I'm using data science and machine learning to explore my area of interest.
Currently, I'm learning how to extract important linguistic features from text data and how to experiment machine learning models for text classification. I am also exploring how to apply quantitative approaches to authorship attribution. In addition to these two, I am working on some data science projects in business context to get myself familiar with numbers.
Key Projects
-
PREDICTIVE MODELING
-
Optimizing Ride Fares: A Dynamic Pricing Model for Ride-Sharing Services
- Currently, ride-sharing prices are primarily set based on ride duration, overlooking fluctuating demand and supply. This project explores a dynamic pricing model powered by machine learning to enhance profitability while keeping prices appealing to customers. By experimenting with 12 machine learning (ML) algorithms and two feature engineering techniques, the project developed a model that, when tested with a simulation of 100 customers, showed that increasing the expected ride duration by 20% through a promotional campaign could generate a net profit of $2,400. (Read More)
-
Addressing Customer Churn in an E-Commerce Company
- This project seeks to reduce an e-commerce company's customer churn rate from 16.8% to 10%. Using diagnostic analysis and a classification model, we focused on minimizing false negatives due to their higher financial impact. After testing various techniques and algorithms, we chose XGBoost and identified tenure and cashback amount as key factors for intervention. Simulations showed that with targeted strategies, achieving the 10% churn rate can be achieved. (Read More)
-
Optimizing Ride Fares: A Dynamic Pricing Model for Ride-Sharing Services
-
DATA ANALYSIS
-
Evaluating Marketing Campaign Effectiveness for New Menu Items: An A/B Testing Approach
- This project assesses which promotional campaign best boosts sales for a fast-food company's new menu items. Statistical analysis, including the Kruskal-Wallis
$H$ test and Dunn's post-hoc test, was used due to non-normal sales distributions and outliers. Results showed the first campaign achieved the highest median sales, but the practical difference ($\eta^2$ ) between campaigns were minor. It is recommended that the Marketing Manager re-evaluate marketing strategies and target customers to improve campaign impact. (Read More)
- This project assesses which promotional campaign best boosts sales for a fast-food company's new menu items. Statistical analysis, including the Kruskal-Wallis
-
Improving the Number of Review: Exploring Review Patterns in Bangkok's Airbnb Landscape
- Despite an increase in reviews, about 36% (5.7 thousand) of Airbnb listings in Bangkok received none from 2012 to 2022. This project explores why some listings lack reviews and offers recommendations for Airbnb Thailand. It finds that unreviewed listings often have higher prices and longer minimum stays, which may deter bookings and reviews. In contrast, reviewed listings are typically entire homes or apartments, more centrally located, and closer to popular areas. Recommendations include adjusting prices and minimum stays for unreviewed listings, running promotions to boost reviews, and improving marketing to highlight unique features and attractions. (Read More)
-
Evaluating Marketing Campaign Effectiveness for New Menu Items: An A/B Testing Approach
-
NATURAL LANGUAGE PROCESSING
-
Using Personal Names to Predict Gender: A 3-Character N-Gram Approach
- This project investigated whether conventional machine learning algorithms with character n-grams could outperform Long Short-Term Memory (LSTM) models, which achieved an F1 score of 0.93 (Septiandri, 2017). Using 3-character n-grams focusing on word boundaries to capture spacing between name parts, the Support Vector Machine with a linear kernel performed best, achieving an F1 score of 0.94. The results suggest that conventional models can match or exceed LSTM performance when using word-boundary 3-character n-grams. (Read More)
-
Understanding User Perceptions about Products on Tokopedia
- Multiple ML experiments are carried out to perform automatic sentiment extractions about customer reviews on Tokopedia (still on progress). The experiments so far have involved conventional ML models, Recurrent Neural Network models, and large language models (LLM). Prior to performing the traditional ML experiment, an exploratory data analysis was done to understand the feature engineering techniques. In short, the first experiment with Support Vector Machine model performed well, surpassing Long Short-Term Memory (LSTM) models, in terms of F1 scores (0.95 vs 0.75). The more recent experiment, implementing base IndoBERT model (uncased) with 110M parameters achieved an outstanding F1 score of 0.98, indicating excellent performance. Judging from these tentative experiment outputs, fine-tuned IndoBERT model is the most promising candidate model in production for the project goal. Not only does IndoBERT have outstanding performance on test set, the LLM predictions can also be explained with various explainable AI techniques, contributing to both good performance and transparency of predictions. (Read More)
-
Using Personal Names to Predict Gender: A 3-Character N-Gram Approach