# A Way to Calculate and Compare Customer Concentration

Last year, I came up with a way to measure and compare customer concentration, which I didn’t think was particularly special at the time. That is until I came across the GINI coefficient for income inequality, which uses similar principles for measuring wealth concentration of a given country. Since the GINI coefficient has it’s own Wikipedia page, and I haven’t seen my method out there for analyzing businesses, I figured it was worth sharing in case others may find it helpful.

The pretty simple calculation produces a number between 0 (Perfectly Concentrated) and 1 (Evenly Distributed). The example illustration below shows 5 hypothetical customer bases, ranging from heavily concentrated (left) to evenly distributed (right), as well as the resulting concentration factors calculated for each.

In the top row, each of the four blue columns represents a customer, with its height proportional to revenue. The bottom row shows the cumulative revenue, summed from smallest to largest. The green area represents the difference between the company’s customer distribution and an idealized perfectly-even customer distribution.

I stumbled on this method while working on a due diligence project last year. We were considering an investment in a firm with a highly concentrated customer base: a few big accounts made up most of the revenue. But it turns out the market it sold into was also highly concentrated. Of all potential customers, a few big firms make up most of the market. Fortunately, there was a public lists of all of their potential customers, including each potential customer’s size.

The diligence question I wanted to answer was: How does the customer concentration of Company X compare to that of the market?

I had all of the raw data, but no algorithm or real way of comparing the data. But how can you compare — or even measure — customer concentration? One method is to determine the portion of revenue made up by the top 5, 10, or 25 customers. But is there a better way? Or at least a different lens to use? I stumbled on one while manipulating the data to find some answers.

Consider a company with $400 of monthly recurring revenue (MRR) that has 4 customers:

- Customer A: $40/mo
- Customer B: $50/mo
- Customer C: $60/mo
- Customer D: $250/mo
**Total MRR: $400/mo**

From this picture, it’s pretty easy to see this is a concentrated customer base. Customer D is $250/$400 = 62.5% of the revenue. Now compare this with a perfectly even customer base, where each customer is $100/month.

It’s pretty easy to see side-by-side which of these companies is more or less concentrated. But when customer bases grow into the hundreds or even thousands, it becomes a lot harder to just “eyeball it.” And that’s why it’s nice to use some simple analytics to derive a comparable concentration metric. So let’s give it a shot.

Back to the original company, line up the customers in order from smallest to largest, and then add them up cumulatively like this.

Notice how it grows slowly with the small customers and then finally sprouts up with the last, big customer. Compare this with the perfectly even company:

If you overlay these, you’ll notice something interesting:

They both climb to $400/mo, the total MRR for the company, but the more concentrated customer base grows more slowly, only shooting up towards the end once the larger customers are included.

Now let’s assume instead of 4 customers, we had a really big number of customers. Imagine there were so many customers that the columns were too tiny to even see anymore. You’d have continuous curves like this:

If the blue area fills the entire bottom-right triangle (no green showed through), it would mean the customer base was perfectly even. If the blue area bows out to the bottom-right corner, it means that there’s a lot of small customers on the left, and a few really big ones on the right — so the curve grows really slow and then shoots up at the end. Consider this example comparing two customer bases (Orange, Blue) against the perfectly even distribution (Green):

In this example above the Company in blue is more concentrated than the company in orange. This lends itself to a pretty simple idea that we can compare concentration of customers by determining how big the area under the company’s curve is to the area under the hypothetically perfectly even customer base curve. Like this:

If the green Area B = 0, it means the customer base matches the even distribution, so the Customer Concentration Factor = Area A / (Area A + 0) = 1.

As the customer base gets more concentrated, the green area grows and the blue area shrinks, resulting in a number that approaches 0 as it gets more concentrated.

Now how do we calculate these areas? Well, our data is discrete (broken down into individual customer columns), not continuous like the curves above. We can just assume each column has a width of 1 unit and calculate the areas, much like when calculating a Riemann sum in calculus.

We just need to make one adjustment, though, because we’re dealing with discrete data, so as not to double-count the last column. The last column always equals the total revenue and does not add any information to our analysis regarding concentration. Therefore we need to calculate this for N-1 customers, as follows (ignoring the gray area, and focusing on the area outlined in orange below).

Here’s the formula for calculating the coefficient:

Note there’s a quick shortcut for calculating the denominator (Area A + Area B), using the fact that it’s a triangular number.

And here’s what it looks like in Excel for the above sample data:

So in the example above, the concentration factor is 0.467. Here are some more examples showing what it looks like for heavily concentrated and evenly distributed customers.

So that’s it! If you have a table showing revenue by customer, you can use this pretty basic formula to calculate a customer concentration coefficient. You can then use this as a standalone measure of concentration, or to compare concentrations among companies. I don’t make any particular suggestions about what’s a good or bad coefficient, though I’d be interested in performing some benchmarking analysis to figure out where different types of companies fall.

The attached file below includes the visual demos shown above, as well as a second sheet for calculating the concentration factor for an arbitrarily large data set. This is meant merely as an aid, and no warranties are made — so use it at your own risk.

Concentration Example _ Published

*****CAUTION*****

As is this case with any model or calculation that reduces some complex set of data to a simple number, beware that you lose a lot of information as you simplify. (As we saw in the so-called “Formula that Killed Wall Street” — the Gaussian Copula). It’s always important to visualize the data and apply a number of analytical tools before coming to a conclusion.