Have you ever been told “*t**he data is statistically significant, so you can rely on it,*” when you questioned market research? As Mark Twain is quoted as saying, “*lies, damned lies and statistics,” *meaning take heed for arguments that are backed up with statistics shouldn’t be believed blindly.

The Net Promoter System arose out of market research, which statistics has always been an essential part of. However for businesses the heavy focus on statistics can at times be a path that leads them into problematic territory.

Businesses, due to their lack of time and resources, can misunderstand statistical results and at times get them wrong. Interpreting statistics can be mathmatically challenging – and even for highly numerate people understanding correlations can take time.

For this reason businesses often rely on external resources such as market research companies and application providers to carry out the statistical component of research and present them with findings. The result being that business people can interpret data while not being aware of the statistical errors (or ignoring them) – and this can lead to business decisions which are actually based less on data, more on “leaps of faith.

*In the following we will explain three basic and common statistical endeavors and their pitfalls: *

**Statistical significance**

*The conventional meaning of statistical significance is that your error margin is at most 5%.*

When surveying customers it is essentially impossible to have one hundred percent of customers respond, this means then that your Net Promoter Score is not a precise number. It is simply the score for a sample of your population.

However, when companies mention their NPS they hardly ever tell you the margin of error or at least if their result is statistically significant.

Take for example a company that proudly states they have an NPS of +50%. This is indeed a good number but if the margin of error is 25% it is likely that if they repeat the survey their NPS will lie anywhere between +25% and +75%.

The term ‘likely’ in this case means then that you can be 95% sure of this, also known as the confidence level. So if you repeated the same survey 100 times, your NPS would likely fall outside this interval only 5 times.

The conventional meaning of statistical significance is that your error margin is at most 5% or in other words that you can be rather certain that your true NPS is close to the measured NPS.

Some companies use lower confidence levels than 95%, which reduces the error margin but the lower the confidence level the less likelihood there is that repeat surveys will fall between the margins of error.

**Issues with statistical significance**

Does statistical significance mean that your sample result represents the entire population? In the Ultimate Question 2.0 Fred Reichheld and Rob Markey mention that non-responders often don’t look like responders. In fact, a common reason for not answering the survey is that detractors are so dissatisfied that they don’t answer in larger numbers than passives or promoters.

Statistical significance also quickly loses its meaning when researchers use it as a justification to analyze smaller segments within a company. For instance, managers that are presented with scores for their department/team have considerably larger margins of error as the sample size from which it is calculated shrinks.

So is statistical significance worthless? No, you should aim for it to know your company’s NPS with better accuracy and determine if changes to your NPS are caused by sample noise or are they really statistically significant (read more about this subject in this article). Just don’t use it as a blank cheque to make any kind of conclusion.

**Correlation analysis**

To understand the drivers of a Net Promoter Score many companies do correlation analysis. Besides asking the ultimate question (“How likely are you to recommend …”) many also ask respondents to rate a number of perceived drivers, e.g. “Please rate your satisfaction with the *delivery* of your purchased item ….”

If the correlation factor is high, the perception is often that the driver has a high impact on the NPS, i.e. the driver is important. By correlating each driver with the Net Promoter Score a company can then decide, which drivers have the highest impact and prioritize what to improve.

There are however a few common misconceptions regarding correlation analysis:

**Correlation and statistical significance**

Many people think that when their data sample is statistically significant, any correlation factor on a subset is statistically significant. While statistical significance may not be needed to draw conclusions in a business context, people should be aware that correlation factors should be used with caution.

**Correlation implies causation**

Causation is where a first event is understood to be responsible for a second event. And while at times correlation can be the result of direct cause and effect, a correlation between two variables does not automatically imply that one causes the other.

Just because a driver correlates highly with NPS, it doesn’t mean that the driver has a high impact on NPS. A correlation may be indirect, unknown or even coincidental.

We sometimes see that interpretation of drivers vary between respondents for which reason driver analysis purely based on statistical analysis becomes uncertain. Of course you can sharpen your questionnaire but this often leads to long and tedious surveys – a path you don’t want to take as research shows that the quality of answers decline when a questionnaire becomes too long.

Rather keep it short and follow up with in-depth root cause analysis, e.g. by following up on samples and interviewing respondents. You will most likely be surprised once you start discussing your drivers with your customers.

**Correlation and linearity**

The most commonly applied correlation analysis assumes linearity, i.e. that there is a linear correlation between NPS and a driver. The below image shows the scatterplots of Francis Anscombe’s quartet: four different sets of variables with the same linear correlation factor.

It is clear that a linear approximation of some of these observation variables is not the best approximation. To understand if a linear correlation can be applied, it often requires insight into the data set and a good deal of statistical expertise.

We have seen several examples of data sets where linearity cannot be assumed, yet the correlation does tell a lot. The example below shows for each customer the relationship NPS and a transactional NPS (in this case for the Customer Support). While there is a positive linear correlation (as indicated by the green line), the interesting part of this correlation is actually that it is almost impossible that a customer dissatisfied with the support (transactional NPS, y-axis) is also a promoter (relational NPS, x-axis).

In other words, linearity doesn’t explain the correlation, something that is supported by a recent HBR article: It is not enough to excel in different touch points. You need to deliver a consistent experience across touch points, which of course includes delivering good experiences for all touch points.

Correlation analysis and correlation factors are of interest but what we often see is that closing the loop is the best way to learn more about a respondent’s score and driver choices.

**Regression analysis**

Some companies use regression analysis to establish a formula for how drivers contribute to the NPS. The most common method is Ordinary Least Squares Linear Regression (OLS), which tries to establish the dependent variable (NPS) as a linear function of independent variables (the drivers), e.g.

**NPS = 0.34 x Buying exp. + 0.27 x Delivery + 0.29 x Use + 0.1 x Support**

The purpose is to predict the future of your NPS if you change some of the drivers. While regression analysis is a powerful tool, it is also a tool to be used with care. There are of course other regression techniques but many require considerable processing power and strong statistical competences.

To begin, regression analysis runs into the same problem as mentioned above in regards to linearity. For regression analysis tries to establish a linear approximation of the relationship, which may not be linear at all.

Ordinary Least Squares Linear Regression is also vulnerable to outliers when certain points have excessively large or small values for the dependent variable, compared to the rest of the data. The result is that is has a disproportionately large and erroneous effect on the resulting solution. For a more in-depth understanding of the pitfalls and problems of Ordinary Least Squares Linear Regression read the following.

Using regression analysis should be done with great caution in the NPS industry. For by simply mirroring research methodologies in their application to NPS or relying on software that automatically carries out this function, may not give you adequate insight into how your drivers affect your NPS.

**In summary**

Being statistically correct and accurate is important for NPS and it should be used to ensure that your NPS improves. However, we believe the most important thing is to keep things simple.

Keep your surveys short and ask in a simple manner how your drivers are contributing to your NPS rather than carrying out extensive statistical analysis. While, following up with customers about their feedback will give you far more accurate insights than attempting to look for answers in your data.

To understand the benefits of this, please read our next article where we will explain the benefits of short surveys and why short doesn’t necessarily mean less effective.

You can also click below to check out some benchmark statistics and best practices for NPS.