「講義で使える統計素材」シリーズ. 今回は,確率(密度)分布と確率の関係を説明するための図を作成しました.
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st
離散分布¶
離散型の確率密度分布$f_X(x_i) (i=1,\cdots,n)$が与えられているとき次の確率は確率分布から計算できます.
\[ P(X=x_i) = f_X(x_i) \] \[ P(x_0 < X \leq x_1) = \sum_{x=x_0}^{x_1}f_X(x) \]In [2]:
mu = 170
sigma = 10
n = 5000
step = 5
x = np.random.normal(mu, sigma, n)
bins = range(140, 210, step)
y, bins = np.histogram(x, bins, density=True)
id = np.where(bins == 160)[0][0]
id0 = np.where(bins == 150)[0][0]
id1 = np.where(bins == 180)[0][0]
plt.figure(figsize=(6,6))
plt.ylim(0, 0.05)
plt.bar(bins[:-1], y, width=step*0.8)
plt.bar(bins[id0:id1], y[id0:id1], width=step*0.8, facecolor='limegreen')
plt.bar(bins[id:id+1], y[id:id+1], width=step*0.8, color='red')
plt.savefig('plot_out.svg', transparent=True)
連続分布¶
In [3]:
x = np.arange(140, 210, 0.01)
y = st.norm.pdf(x, mu, sigma)
plt.figure(figsize=(6,6))
plt.ylim(0, 0.05)
plt.plot(x, y)
cmap = plt.get_cmap("tab10")
plt.fill_between( x[:], y[:], facecolor=cmap(0))
idx0 = np.searchsorted(x, 150)
idx1 = np.searchsorted(x, 180)
plt.fill_between( x[idx0:idx1], y[idx0:idx1], facecolor='limegreen')
idx0 = np.searchsorted(x, 160)
idx1 = np.searchsorted(x, 162)
plt.fill_between( x[idx0:idx1], y[idx0:idx1], facecolor='red')
plt.savefig('plot_out.svg', transparent=True)