Understanding Southern Asia with Machine Learning hannah haegeland, Stimson Center

Stimson’s South Asia Program is using machine learning to understand how states in South Asia think about their interests, threat perceptions and strategic priorities. With natural language processing, researchers have clearer insight into how influential thinkers and experts in the region are addressing areas of concern for policymakers and analysts.

A key challenge for policymakers is understanding how other states think about their interests, threat perceptions, and strategic priorities.

Southern Asia is a critical region for U.S. interests and studies of strategic competition.

Policymakers and analysts alike would benefit from a deeper understanding of what motivates and shapes thinking on strategic issues in Southern Asia.

Stimson is using Natural Language Processing (NLP) techniques.

NLP applies statistical and machine learning tools developed for numbers to analyze “text data.” Researchers are especially critical on the front end -- when it comes to selecting the right data for a question, not just the available data -- as well as the back end -- when it comes to contextualizing results and interpreting their implications for policy.

"Text data policy research is not a problem you can just throw a computer at; the quantitative approach has to be complemented with qualitative analysis."

Stimson is using a dataset of literature published in Southern Asia to better understand how influential thinkers and experts address issues of shared concern.

We coded our collection with variables of interest, such as the professional experience of the author, and analyzed the data using a mixed-membership, structural topic modeling approach.

This approach allows us to identify themes and trends.

In single articles...

We can see themes and trends across the collection of texts and break any given article down into the proportions of its topics. For example, you could see if an article is predominantly about Economic Development, Great Powers & Geopolitics, or Regional Strategic Concerns:

Topic Proportions For a Single Document

And across time...

When looking at the whole dataset, we see how the salience of particular topics changes over time and can trace topics through the dataset in reference to historical events.

Topic Proportions Over Time

The assumption, based on early findings, is that there is tremendous diversity over time in how influential thinkers and experts approach key themes on shared interests in the region as interests, threat perceptions, and strategic priorities shift.

Shaking up Conventional Wisdom

Stimson's initial findings seem to contradict long-held conventional wisdom about how important actors in South Asia conceive of their national interests, strategic priorities, and threats.

This type of research could reshape how policymakers approach the region.

What's next?

Stimson's South Asia program is building on its analytical approach to include:

  • Sentiment analysis, to assess how topics are being discussed
  • Audio/video
  • Text in mulitple languages
  • Network analysis


The project is pursuing partnership and advisory relationships, including:

  • Next steps with ready tools from the data science community,
  • Partners interested in building datasets and diversifying the approach,
  • Alternative NLP approaches that are less labor intensive.

Contact: Hannah Haegeland, Research Analyst: hhaegeland@stimson.org.

This project is spear-headed by Hannah Haegeland, Research Analyst in the South Asia program, Sameer Lalwani, Senior Fellow & Director in the South Asia program, and Eyal Hanfling, Research Assistant in the South Asia program.


Created with images by Annie Spratt - "Piles of old worn books" • Markus Spiske - "untitled image"

Report Abuse

If you feel that this video content violates the Adobe Terms of Use, you may report this content by filling out this quick form.

To report a copyright violation, please follow the DMCA section in the Terms of Use.