Probability distribution function (PDF)
The tail of a PDF is linked to its kurtosis. This kurtosis gives the concentration of values around the central value of the law and thus the concentration for extreme values, which means, far from the average (mean). In this exercise we will compare the tail of empirical probability distribution and a normal distribution. Using
Question
Using the data set, calculate:
The mean value
The variance
The standard deviation
Question
Using the given data set , compare the empirical PDF and Normal distribution
The empirical pdf is calculate using Weibull distribution
Python script
@author: yacine.mezemate
"""
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
#Import data
data = np.genfromtxt("data.txt", delimiter= '')
data = np.diff(data) # use the difference
# Statistics
mu = np.mean(data) # mean
Var = np.var(data) # variance
std = np.sqrt(Var) # standard deviation
# Calculate and plot the Empirical PDF using Weibull distribution
neg = np.sort(data[data<0]) # sort negative values
pos = np.sort(data[data>0]) # sort positive values
P_neg = np.arange(len(neg), dtype = np.double)/len(neg) # probability of negative values
P_pos = np.arange(len(pos), dtype = np.double)/len(pos) # probability of positive values
plt.plot(neg, np.log(P_neg), marker='+', label ="Empirical Estimation") # plot PDF
# Calculate and plot Normal PDF
mn = np.min(data) # min value
mx = np.max(data) # max value
x = np.linspace(mn, 0, 100) # bins number
Gauss=np.log(mlab.normpdf(x,mu,std)) # Normal distribution
plt.plot(x,Gauss, label="Normal distribution") # plot PDF
# Plot informations
plt.legend(loc='down left')
plt.title("Probability distribution", fontsize = 18)
plt.ylabel("$\log(Pr(\Delta v))$", fontsize=14)
plt.xlabel("$\Delta v$", fontsize=14)
plt.show()
The plot is logarithmic so as to emphasis the heavy tail of the distribution. The plot shows that the normal distribution does not fit the empirical one. In complex system such as in geophysics, extreme values can not be detected using a Gaussian distribution.