Master Power Law To Avoid Disasters

math

clt

Published

February 21, 2024

Modified

February 21, 2024

import scipy
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

Introduction

Remember “Long-Term Capital Management” story? In essence, they applied the Gaussian distribution for forecasting market volatility. According to the Gaussian distribution, the likelihood of an event occurring that is 10 standard deviations away from the mean is 1.3x10⁻²³. However, under a power law with an exponent of 2, this probability increases to 0.5%!

Real-life proved they were wrong:

This situation serves as a strong motivator to delve into more complex distributions than the Gaussian.

I have been in the data analysis field for 15 years.

During this time, I’ve helped companies of all sizes navigate risks, developed predictive models, and performed many AB-tests.

But do you want to know a secret?

I actually use the same 3 techniques every time:

Technique 1: Identifying Power Law Dynamic

Plot your data on a log-log scale. A straight line indicates a power law distribution.
Remember that fewer points at the far right tail can make it tricky to draw definitive conclusions.
Use your log-log plot to calculate the power law’s exponent.

Follow these steps, and you’ll be able to identify power law distribution accurately.

Here is an example with exponential distribution (non straight line on the left):

num_samples = 1000
dat = sorted(scipy.stats.expon.rvs(1.5, size=num_samples))
x = range(len(dat))
fig, ax = plt.subplots(1,2, figsize=(15, 7))
g = sns.distplot(dat, ax=ax[0])
g.set_title("Original")
g = sns.scatterplot(x=np.log(x), y=np.log(dat), ax=ax[1])
g.set_title("Log-log")
g.set(ylim=(min(np.log(dat)), None))

But with Powerlaw we have:

dat = sorted(scipy.stats.powerlaw.rvs(0.1, size=num_samples))
x = range(len(dat))
fig, ax = plt.subplots(1,2, figsize=(15, 7))
g = sns.distplot(dat, ax=ax[0])
g.set_title("Original")
g = sns.scatterplot(x=np.log(x), y=np.log(dat), ax=ax[1])
g.set_title("Log-log")
g.set(ylim=(min(np.log(dat)), None))

Technique #2: Acknowledging Real-World Limits

When you identify Powerlaw but still need to perform AB-test, there are some tricks you can perform, one of them is to consider real-world constraints, like market collapses or physical boundaries. In such cases, a truncated power law distribution may be more accurate, and with a propper boundary it’s mean can even converge to normal.

By considering these factors and incorporating them into your analysis, you’ll achieve more realistic assessments.

Here’s an example, suppose we’re examining annual incomes. For decision-making purposes, we’re primarily interested in the majority and not particularly focused on individuals with exceptionally high incomes. Therefore, we’ve set a cap:

N = 1000
num_samples = 2000
dat = scipy.stats.pareto.rvs(.8, size=(N, num_samples))
d = dat.mean(axis=0)
dat[dat > 200] = 200
fig, ax = plt.subplots(1,2, figsize=(12, 5))
g = sns.histplot(d, ax=ax[0], bins=20)
g.set_title("Mean distibution without cap")
g = sns.histplot(dat.mean(axis=0), ax=ax[1], bins=20)
g.set_title("With cap")

Text(0.5, 1.0, 'With cap')

Technique #3: Limitations of Historical Data

Realize that learning from history can be challenging if the data follows power law.

Remember that more data often means more extreme values.

Always consider the possibility of more extreme events beyond what your current data shows.

A power law is a type of heavy-tailed distribution, indicating that the likelihood of rare events is significantly higher compared to a Gaussian distribution. Depending on the circumstances, this detail may or may not be crucial. For instance, while analyzing A/B testing on button colors, it might be negligible. However, it becomes vital when modeling financial risks.

Happy coding!