Coffee production. patterns. predictions.

Will my latte cost more next month?

To answer this question, I overlapped temperature and production data to model a benchmark indicator for global coffee prices. By optimizing a random forest regressor, I was able to model the data with a 6-month lag between data and predictions.

The bean that we roast, grind, steep and drink is actually the seed from a coffee cherry, a stone fruit that grows on a spindly shrub, native to Ethiopia. Coffee plants take three to five years to bear fruit, but will produce for 25-40 years upon maturation. This makes coffee production inelastic to short-run price changes: Growers cannot quickly produce more product to adapt to spikes in demand, and neither can they adjust to overproduction – and price drops – or drops in consumption, such as recessions in principal consuming countries.

“Consumption of coffee is relatively stable, expanding or contracting only gradually in response to sharp changes in supply and price brought about by short-run factors such as variations in crops due to changes in weather.”

Despite this relative production stability, the coffee market itself is volatile and difficult to predict. Coffee prices, especially for the consumer, can be heavily impacted by distributor decisions.

Nearly all of the world's coffee beans come from one of two species: Arabica and Robusta. 75–80% of the coffee produced worldwide is the Arabica species, and 20% is Robusta. My model captured 73% of Arabica and 78% of Robusta global exports, by focusing on the primary coffee-producing countries.

The USDA’s Foreign Agricultural Service issues semi-annual reports on coffee production; among the influences in its reports are previous production volumes, droughts, abnormally warm temperatures, frosts, pest incidences, diseases that affect coffee plants and labor shortages. These influences are quantifiable, and – to some degree – predictable.

By considering past production trends, ending stocks (the amount of coffee held in storehouses before exporting) and frost incidences for these primary producing countries, I sought to model future price predictions through an industry-focused lens:

Based on past patterns in coffee production, and the effect of weather conditions on crop yield, can we predict upcoming pricing changes for coffee roasters or suppliers?

Before I can answer that, let's look at the available data sources.

Following the UN's creation of the International Coffee Agreement (1962), which included a coffee exports quota system, the International Coffee Organization (ICO) was established in 1963 to enhance cooperation between nations that consume, distribute and produce coffee. Today the ICO has 73 member countries, representing 98% of global coffee production and 83% of its consumption.

The ICO also tracks and publishes export data from its member countries. Its composite indicator price, which is a weighted average, is considered a benchmark for global coffee prices. It contains important information for key industry stakeholders, like growers that need to brace for an upcoming price slump, or large corporations, like Starbucks, that purchase coffee through futures contracts on the commodities market.

The export quota system ultimately collapsed, disappearing altogether in 1989. This means that all data after 1990 reflect free-market pricing, whereas data before then is a mix of free-market and artificially stabilized prices.

I modeled composite indicator data, using the consistent pricing conditions of 1990 and later.

Source: ICO

Sometimes, coffee price increases are caused by increasing demand or market changes; but more often, the market is impacted by poor harvest caused by low temperature and frost.

Frosts can be devastating to coffee producers, as they have the potential to disrupt harvests for up to 5 years. Volatility levels are generally highest in May through August, when it’s winter in the southern hemisphere.

To capture that in the model, I used temperature data from Berkeley Earth, an independent organization that merges and cross-validates 16 data sets to track global temperature.

The dots represent the weather stations in the data that are located in latitudinal and elevation sweet spots for each coffee species. (Source: Berkeley Earth)

I also wanted to include the amount each country produced the previous year, and the amount each country held in inventory. So, I incorporated data from the USDA’s Foreign Agricultural Service. The FAS has specialists stationed all around the world, providing figures that are validated and balanced.

This chart contains data dating back to 1960. Remember that my model only looks as far back as 1990.

Notice the variability in the Brazilian Naturals category. That’s because coffee plants grow cherries every other year. This effect is averaged out in every other country except for Brazil, so it shows on- and off-years.

So, how did I create my model?

I imported the data and performed standard data cleaning and formatting, like converting certain strings into date-time values. Only two things stood out in the processing phase: First, Berkeley Earth records days using a decimal notation, shown below, which I had to change into a usable format. I also had to convert the annual production data to compare it to the other data, which was monthly. I did this by filling the gaps with the previous recorded value (a technique called padding or forward-filling).

2005.067 = 2005 + (25 − 0.5) / 365 = January 25, 2005

Ultimately, a random forest regressor, optimized with GridSearchCV, seemed to fit the nature of the data best. I used 200 trees and 10-fold cross-validation, which are also ways to minimize the chances of overfitting and minimize errors when I show new data to the model.

Random forests model the data by building decision trees and randomly removing features to minimize overfitting.

Here's what I found.

0.95 R2 achieved after cross-validation

Below, I've created a zoomed-in view of the previous plot to show pricing for the last three years, 2013-2016.

Initially, I was skeptical of these results.

Having only 27 years of data can cause a dimensionality problem, and makes it more likely that the model is overfitting. On the other hand, using ensemble methods like random forests tend to lower generalization error – error found caused by introducing new data – and using cross-validation reduces this error further.

The model results also suggest another interpretation, which comes from the fact that the coffee sector generally changes slowly. This means that coffee prices tend to follow simple supply-demand economics.

Using the previous year's data automatically supplies information about the state of the farms, especially significant changes, like the number of fruiting plants. Weather-driven anomalies are relatively infrequent.

So... what about my latte?

I can't answer that directly. While the model suggests raw coffee will become more expensive over the next few months, how much companies charge consumers depends on their business internals.

For example, while researching for this project, J.M. Smuckers announced it would increase its ground coffee prices by an average of 6%. Smuckers owns brands like Folgers and Cafe Bustelo and is licensed to sell Dunkin Donuts brand ground coffee.

However, since the model is directionally accurate, I wanted to see what the next few months might look like.

This is how prices could change in the first half of 2017.

The shaded band represents future values.

My takeaway?

I’m going to hold off on solidifying any coffee futures contracts for a few months.

Nathan Mitchell is a General Assembly-trained data scientist who is seeking new opportunities to use the world's data to solve real-world problems. Believe it or not, he generally drinks decaf.


  • Yorgason, V. The International Coffee Agreement: Prospect and Retrospect. Development and Change, 1976, 7: 207–221.
  • Hopp, H. and Foote, R. A Statistical Analysis of Factors That Affect Prices of Coffee. American Journal of Agricultural Economics, 1955, v. 37, n. 3, 429–438.
  • “Coffee Frosts: Effects of Frost on South American Coffee Beans”. Accessed February 2017.
  • Form 10-K. Starbucks Corporation, 2016
Created By
Nathan Mitchell

Made with Adobe Slate

Make your words and images move.

Get Slate

Report Abuse

If you feel that this video content violates the Adobe Terms of Use, you may report this content by filling out this quick form.

To report a Copyright Violation, please follow Section 17 in the Terms of Use.