sdg-text

We leverage readily-available natural language data, scraped from Wikipedia, to predict localized indices (asset, sanitation, women's education) relevant to the UN's Sustainability Goals. We explore the impact of different text embedding extraction methods and model architectures on performance in this small data task. We explore logistic regression models, feedforward DNNs, and NLP-CNNs. We use geolocated and extracted “relevant” sentence embeddings to achieve ROC-AUC scores of 0.80 (logistic regression model), 0.70 (logistic regression model), and 0.81 (feedforward DNN model) for asset, sanitation, and women's education index classification, respectively.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

sdg-text

Files

README.md

Latest commit

History

README.md

File metadata and controls

sdg-text