Predicting Retention: Curve Fitting and Beyond

5 min readAug 18, 2023

Imagine you’ve just launched your app. User starts downloading and engaging with your app, but as days pass, fewer continue down the funnel. After a week, your app’s retention values are in.

Now, aiming to market smartly, you seek to compute the ideal CPI for your campaign, calling for predictive retention analytics.

Your retention table would look more or less like this for different cohorts:

Let’s say you want to predict your Day 14 Retention. You can try linear regression but knowing it would eventually go negative it wouldn’t be the smartest choice.

It is natural that different cohorts have different characteristics, sizes, and hence, averages of retention values. For the calculation, we need singular data points. Notice that retention values are similar for most cohorts, with a few outliers. You can choose to exclude those outliers, but after calculating the mean and the median of the datapoints, I’ve decided that it is not necessary to exclude them. Also, generally, I don’t subscribe to deleting data points.

Hence, I will be using average retention values of all cohorts for my calculation. (This should have been weighted average, but I was not given that data at the time, hence proceeded with basic average).

Exponential

When you graph a data on MS Excel, you can simply add trendlines that fits curve to your data. If your data is strictly positive and increases or decreases rapidly with a constantly increasing rate, the best type of trendline is exponential.

It’s equation is: y = a * e^bx

Let’s start with exponential, we know that it won’t go negative, so it might be the ideal candidate for curve fitting. A very quick way to use exponential for prediction on MS Excel is to insert a trendline to your graph and use the coefficients (a, b, or alpha and beta) for calculation.

After calculating expected retention values, we get a Mean Absolute Percent Error of 14.39% which is obviously not the best. Also another problem here was I couldn’t use Day 7 retention value in our calculation since the retention values for Day 4, 5, 6 were missing.

Power

If you have data with only positive values that show a steady increase or decrease with a fixed rate of growth or decline, you can use the power trendline.

It’s equation is: y = a * x^b

Again, to compute alpha and beta I can only use the retention values for Day 1, 2, and 3 as they are consecutive and the retention values for Day 4, 5, 6 are missing.

For a quick calculation, we are using MS Excel’s trendline feature and add Power trendline to our graph.

Power equation seem to fit our data much better and resulted in a MAPE of 6.45%.

This MAPE value is not bad, but can I do better than mere curve fitting?

Towards Modeling Churn: shifted-Geometric

During my master’s I’ve had the opportunity take Prof. Peter Fader’s course MKTG 7760 and learned fundamental models for customer lifetime value.

One of the models I’ve learned was Shifted-Beta Geometric.

Let’s think the as if churn process works like this:

At the end of each day, a user makes the decision of churning with probability θ (theta). For a user churning at Day t and churning after day t, we would get the below equation.

Note that this closely resembles the Geometric Distribution. Specifically, it’s the shifted Geometric Distribution, where “shifted” reflects our observation beginning at day (or time) 1. So if you place 1 into t, you would get θ, the probability of a user churning at that day. Likewise, if you put 2, you would get the probability of user not churning at day one times user churning at day 2.

I go ahead and run the shifted-Geometric model on my retention data and determine θ using Excel’s solver. For a step-by-step calculation guide, you can refer to Prof. Daniel McCarthy’s explanation, accessible here.

Well, it failed and I got the MAPE of 33.45%. Maybe I pushed my luck too far and should have stuck to the Power. At this point I can give up or try or question why it failed and try to fix it.

Well, let’s fix it.

The reason shifted-Geometric failed here is because this equation works with the assumption that all users churn with the same probability θ. But could all users share the same probability of churning, assuming a complete homogeneity (sameness)?

Well, there might be certain products where high homogeneity could be observed, but for this app (which is a hyper-casual mobile game), it is simply cannot be true.

Modeling Churn: shifted-Beta-Geometric

To account for heterogeneity among the user base, we introduce the Beta function to our base model, the shifted-Geometric. This yields a mixture model known as the shifted-Beta-Geometric. Note that in this equation, continuity (integral) is considered to account for theta values across an infinite number of individuals.

I go ahead run the shifted-Beta-Geometric model on my retention data and calculate the alpha and Beta using excel’s solver. The calculation walkthrough by Prof. Daniel McCarthy can be accessed here.

The MAPE is an improvement over the Power metric, yet it’s essential to note that Day 7 retention remains untapped, significantly contributing to the MAPE calculation.

If I treat Day 4,5,6 as one block and calculate the churn value for that block of time from Day 7 retention, I can actually have more data for calibrating my shifted-Beta-Geometric model.

I go ahead and run the model with these days included. And voila, I minimized my MAPE to 3.27% and get a much much better model for predicting my retention.

Predicting Retention: Curve Fitting and Beyond

Exponential

Power

Towards Modeling Churn: shifted-Geometric

Modeling Churn: shifted-Beta-Geometric

Written by Atakan Birbir