The outcome of a single trial often exhibits randomness or contingency, but the results appearing in a large number of repeated trials often possess a certain regularity. Observations or experiments conducted to study this regularity are called random experiments.
Sample Space (): The set of all possible outcomes of a random experiment .
Sample Point / Elementary Event (): A single element of the sample space, representing a possible outcome of the random experiment.
Random Event (, , ): A subset of the sample space, representing a set of certain possible outcomes of the random experiment.
Elementary Event: An event containing only one sample point.
Certain Event (): An event containing all sample points in the sample space.
Impossible Event (): An event containing no sample points.
Relationships Between Random Events
Inclusion Relation (): The occurrence of event necessarily leads to the occurrence of event .
Equality Relation (): .
Sum/Union of Events (): Event or event occurs.
Product/Intersection of Events (): Event and event occur.
Difference of Events (): Event occurs and event does not occur.
Mutually Exclusive / Incompatible Events: Events and are mutually exclusive, meaning events and cannot occur simultaneously, i.e., .
Complementary Event (): Event does not occur, i.e., .
The sum of events is denoted , and the product is denoted . The sum of a countable sequence of events is denoted , and the product is denoted .
For events to be pairwise mutually exclusive, it means any two events are mutually exclusive, i.e., for and , we have .
For a countable sequence of events to be mutually exclusive, it means any two events are mutually exclusive, i.e., for and , we have .
Frequency (): In repeated trials, if event occurs times, then is called the frequency of event . Frequency possesses:
Non-negativity: For any event , we have .
Normalization: For the certain event , we have .
Additivity: For mutually exclusive events , we have .
Stability: As the number of trials increases, the frequency approaches a certain definite value.
Probability
Probability (): A measure of the likelihood of event occurring.
Statistical Definition of Probability: In repeated trials under identical conditions, if the frequency of event stabilizes around a constant , and this tendency increases with , then is called the probability of event , denoted .
Axiomatic Definition of Probability: Let the sample space be . Define a probability measure on its -algebra . For any event , we have:
Non-negativity: ;
Normalization: ;
Countable Additivity: For pairwise mutually exclusive events , we have .
Properties of Probability
.
Addition Theorem: For any events , , we have .
Generally, for any events , we have .
Probability Models
Classical Probability Model: If the sample space of a random experiment contains a finite number of sample points, and each sample point is equally likely to occur, then the random experiment is called a classical probability model. For any event in this model, let be the number of sample points in contained in event , and be the total number of sample points in the sample space . Its probability is .
Geometric Probability Model: If the sample space of a random experiment is a geometric figure, and each sample point in is equally likely to occur, then the random experiment is called a geometric probability model. For any event in this model, let be the area (or length, volume, etc.) of the geometric figure corresponding to event , and be the area (or length, volume, etc.) of the geometric figure corresponding to the sample space . Its probability is .
Conditional Probability
Conditional Probability (): The probability of event occurring given that event has occurred. It is defined as , where .
Multiplication Formula: For any events , , we have . Generally, for any events , we have .
Complete Set of Events: Let events satisfy when and . Then the event group is called a complete set of events or a partition of the sample space .
Law of Total Probability: Let events form a complete set of events for sample space , and for any , we have . Then for any event , we have .
Bayes’ Formula: Let events form a complete set of events for sample space , and for any , we have . Then for any event , we have .
Independence of Random Events
Independence of Events: For events and , if holds, then events and are said to be independent.
Equivalent forms: or (provided or ). Unless in trivial cases (e.g., or ), two events cannot be both mutually exclusive and independent. Among the four pairs of events , with , , if one pair is independent, then the other three pairs are also independent.
Mutual Independence of Multiple Events: For events , if for any subset , we have , then events are said to be mutually independent.
being mutually independent and are independent.
If we group mutually independent events into disjoint index sets, and within each group perform only operations (union, intersection, difference, complement, etc.) belonging to the -algebra generated by the events in that group, then the resulting new events from different groups remain mutually independent.
Random Variables and Their Distributions
Random Variables and Distribution Functions
Random Variable (): A real-valued function defined on the sample space , i.e., for each sample point , the random variable corresponds to a real number .
Distribution Function: For a random variable , define the function as the distribution function of the random variable . The distribution function is right-continuous.
Due to the right-continuity property of the distribution function, we have:
;
;
;
.
Discrete Random Variables
Discrete Random Variable: A random variable is called discrete if it takes a finite or countably infinite number of real values , and for each value , there is a determined probability .
Geometric Distribution (Geom): Let random variable represent the number of independent repeated trials needed until event occurs for the first time, and the probability of event occurring in each trial is . Then is said to follow a geometric distribution with parameter , denoted (taking values ).
Memoryless Property of Geometric Distribution: For any , we have .
Bernoulli Distribution: Let random variable represent the result of whether a particular event occurs, with probability of occurrence. Then is said to follow a Bernoulli distribution, denoted (can also be denoted ).
Binomial Distribution: Let random variable represent the number of times event occurs in independent repeated trials, with probability of event occurring in each trial. Then is said to follow a binomial distribution with parameters , denoted .
When is an integer, the most probable number of successes for the binomial distribution is or ; otherwise, it is .
Pascal Distribution (Negative Binomial): Let random variable represent the number of independent repeated trials needed until event occurs for the -th time, with probability of event occurring in each trial. Then is said to follow a negative binomial distribution with parameters , denoted .
Poisson Distribution: Let random variable represent the number of occurrences of a certain event within a specific time interval (or spatial region), and the average number of occurrences per unit time (or unit space) is . Then is said to follow a Poisson distribution with parameter , denoted .
Poisson Theorem: Let random variable follow a binomial distribution with parameters . If as , , then for any non-negative integer , we have , i.e., the distribution of gradually approaches a Poisson distribution with parameter .
Geometric Distribution
Binomial Distribution
Pascal Distribution
Poisson Distribution
Notation
Domain
Probability Distribution
Expectation
Variance
Special Properties
Memoryless
Additivity
Additivity
Additivity
Continuous Random Variables
Probability Density Function of a Continuous Random Variable: Let the distribution function of random variable be . If there exists a non-negative function such that for any real number , we have , then is called the probability density function of the random variable .
The distribution function of a continuous random variable is continuous.
The probability density function is not unique; it can differ at a finite or countably infinite number of points. The probability density function describes the probability of taking a value in a unit-length interval near , i.e., .
Uniform Distribution: Let random variable follow a uniform distribution on interval , denoted .
Exponential Distribution: Let random variable represent the time interval between occurrences of a certain event, and the average rate of occurrence is . Then is said to follow an exponential distribution with parameter , denoted .
Memoryless Property of Exponential Distribution: For any , we have .
Normal Distribution: Let random variable follow a normal distribution with parameters , denoted . Its probability density function is .
Linear transformations and linear combinations remain normal.
Uniform Distribution
Exponential Distribution
Normal Distribution
Notation
Domain
Probability Density Function
Distribution Function
Expectation
Variance
Special Properties
Memoryless
There exist random variables that are neither discrete nor continuous; they are called mixed-type random variables.
Functions of Random Variables and Their Distributions
This section analyzes a specific type of problem: given the distribution of random variable , find the distribution of random variable . The general method is to transform events concerning into events concerning .
For discrete random variables, let random variable take values with corresponding probabilities . Given a function , then .
For continuous random variables, if the function is strictly monotonic and differentiable, then the corresponding probability density function is .
For piecewise strictly monotonic functions , the corresponding probability density function is , where is the inverse function of on the -th segment.
Multidimensional Random Variables and Their Distributions
Two-dimensional Random Vector: Let random variables and be defined on the same probability space. Then the random variable pair is called a two-dimensional random variable or two-dimensional random vector.
Joint Distribution Function: For a two-dimensional random variable , define the function as the joint distribution function of . The joint function is right-continuous with respect to both and .
The joint distribution function of a two-dimensional random variable must satisfy , .
Marginal Distribution Function: Let the joint distribution function of two-dimensional random variable be . Then define as the marginal distribution function of , and define as the marginal distribution function of .
The joint distribution function determines the marginal distribution functions, but the converse is not true.
Joint Density Function: Let the joint distribution function of two-dimensional random variable be . If there exists a non-negative function such that for any real numbers , , we have , then is called the joint probability density function of .
Marginal Density Function: Let the joint probability density function of two-dimensional random variable be . Then define as the marginal density function of , and define as the marginal density function of .
Joint Distribution of Two-dimensional Discrete Random Variables: Let two-dimensional random variable be discrete, taking values . Then define as the joint distribution of . For two-dimensional discrete random variables, the distribution function and distribution law mutually determine each other.
Marginal Distribution: Let the joint distribution of two-dimensional random variable be . Then define as the marginal distribution of , and define as the marginal distribution of .
The joint distribution determines the marginal distributions, but the converse is not true.
Joint Distribution Function of Multidimensional Random Variables: Let random variables be defined on the same probability space. Then the random variable group is called an -dimensional random variable or -dimensional random vector. Define the function as the joint distribution function of the -dimensional random variable .
Marginal Distribution Function of Multidimensional Random Variables: Let the joint distribution function of -dimensional random variable be . Then define as the marginal distribution function of random variable .
Common Two-dimensional Random Variable Distributions
Uniform Distribution: Let two-dimensional random variable follow a uniform distribution over region , denoted .
For a uniform distribution over a rectangle with sides parallel to the coordinate axes, the marginal distributions are also uniform.
Bivariate Normal Distribution: Let two-dimensional random variable follow a bivariate normal distribution with parameters , denoted . Its joint probability density function is .
Define the matrix as the covariance matrix of the bivariate normal distribution. Then the probability density function can be rewritten as .
The marginal distributions of a normal distribution are still normal. For the above density function, , .
The conditional distributions of a normal distribution are still normal. For the above density function, , .
A two-dimensional random variable whose marginal densities are one-dimensional normal is not necessarily bivariate normal. Non-normal distributions can also have normal marginals.
Uniform Distribution
Normal Distribution
Notation
Domain
Joint Probability Density Function
Conditional Distribution of Two-dimensional Random Variables
For discrete two-dimensional random variables, define the conditional distribution of given as . Similarly, define the conditional distribution of given .
For continuous two-dimensional random variables, define the conditional probability density function of given as (requiring ), and the conditional distribution function as . Similarly, define the conditional probability density function of given , conditional distribution function (requiring ).
Independence of Random Variables
Random variables and are defined to be independent if and only if for any real numbers , , we have , i.e., . In this case, the marginal and joint distribution functions mutually determine each other.
If random variables are independent, then their continuous functions remain independent.
Necessary and Sufficient Condition for Independence (Density Factorization): If has a joint density , then and are independent if and only if there exist non-negative integrable functions and such that . Then , , and thus .
Define -dimensional random variables to be mutually independent if and only if for any real numbers , we have , i.e., .
Distribution of Functions of Multidimensional Random Variables
The basic approach to solving the distribution problem for functions of multidimensional random variables is similar to that for one-dimensional random variable functions, both transforming events concerning the new random variable into events concerning the original random variables.
Distribution of the Sum of Continuous Random Variables: Let random variable . Then its probability density function is .
In particular, for the normal distribution , then .
Distribution of the Quotient of Continuous Random Variables: Let random variable . Then its probability density function is .
Distribution of the Sum of Squares of Continuous Random Variables: Let random variable . Then its probability density function is (obtained by polar coordinate transformation, ).
Distribution of Extremes of Continuous Random Variables: Let random variable . Then its distribution function is . Let . Similarly, .
Expectation: Define the expectation of a discrete random variable as . Define the expectation of a continuous random variable as .
Expectation of a Function of a Random Variable: For a random variable , define the expectation of a discrete random variable as . Define the expectation of a continuous random variable as .
Expectation of a Function of Two-dimensional Random Variables: For a two-dimensional random variable , define the expectation of a discrete random variable as . Define the expectation of a continuous random variable as .
Some properties of expectation:
, where is a constant;
;
;
If random variables and are independent, then . The converse of this proposition is not true.
Variance
Variance: Define the variance of a random variable as .
Calculation Formula: .
Some properties of variance:
, where is a constant;
, where are constants;
, ;
If random variables and are independent, then . The converse of this proposition is not true.
Standardized Random Variable: Define the standardized random variable of as . Then , .
Covariance and Correlation Coefficient
Covariance: Define the covariance of random variables and as . For continuous random variables, .
Calculation formula: .
Correlation Coefficient: Define the correlation coefficient of random variables and as .
. When , and are said to be completely positively correlated; when , and are said to be completely negatively correlated. When , and are said to be uncorrelated.
If random variables and are independent, then . The converse of this proposition is not true.
Some properties of covariance:
;
;
, where are constants;
Cauchy-Schwarz Inequality: For any random variables and , . Equality holds if and only if there exists a constant such that .
Equality implies the linear relationship holds.
Higher-order Moments of Random Variables
-th Moment of a Random Variable: Define the -th moment of random variable as .
-th Central Moment of a Random Variable: Define the -th central moment of random variable as .
-th Mixed Origin Moment of Random Variables: Define the -th mixed origin moment of two-dimensional random variable as .
-th Mixed Central Moment of Random Variables: Define the -th mixed central moment of two-dimensional random variable as . Covariance is the second-order mixed central moment of .
Covariance Matrix: For an -dimensional random variable , define the matrix as its covariance matrix, with elements , i.e., .
We can write the probability density function for an -dimensional normal distribution: .
Properties:
The covariance matrix is positive semi-definite, i.e., ;
is positive definite if and only if there is no non-zero vector such that is constant (i.e., no non-trivial almost-everywhere linear relationship).
For any real numbers , we have .
If random variables are mutually independent, then their covariance matrix is diagonal.
Laws of Large Numbers and Central Limit Theorem
Preliminary Knowledge
Chebyshev’s Inequality: Let random variable have finite variance . Then for any , we have , or .
Convergence in Probability: Let a sequence of random variables and a random variable be defined on the same probability space. If for any , we have or , then the sequence is said to converge in probability to , denoted .
Laws of Large Numbers
Law of Large Numbers: A sequence of random variables is said to obey the law of large numbers if for , we have , i.e., .
Bernoulli’s Law of Large Numbers: Let be the number of times event occurs in independent repeated trials, and be the probability of occurring in each trial. Then for any , we have or , i.e., .
Bernoulli’s law states that in a large number of repeated trials, the frequency of event approaches the probability of .
Chebyshev’s Law of Large Numbers: Let a sequence of random variables be pairwise uncorrelated, with finite variances that have a common upper bound. Then the sequence obeys the law of large numbers, i.e., for any , , i.e., .
The condition of pairwise uncorrelation can be replaced by the Markov condition: .
Khintchine’s Law of Large Numbers: Let a sequence of random variables be independent and identically distributed (i.i.d.) with . Then the sequence obeys the law of large numbers, i.e., for any , , i.e., .
Khintchine’s law states that the arithmetic mean converges to the expectation; it does not strictly require finite variance.
Khintchine’s law can be extended to the -th moment: Let be i.i.d. with . Then for any , , i.e., .
Monte Carlo Method can be used to compute integrals. Let function be defined and bounded on interval . If we randomly and uniformly take points in and compute the average , then for sufficiently large , .
Central Limit Theorem
Central Limit Theorem for i.i.d. Sequences: Let be an i.i.d. sequence with , . Then for sufficiently large , the distribution of the standardized sum is approximately standard normal . That is, for any real , , where is the standard normal distribution function. In short, the limiting distribution of the standardized sum is standard normal.
DeMoivre-Laplace Central Limit Theorem: Let random variable follow a binomial distribution . Then for sufficiently large , the distribution of is approximately standard normal . That is, for any real , . Consequently, for large , the binomial distribution can be approximated by the normal distribution .
Fundamental Concepts of Mathematical Statistics
Basic Concepts
Population is the entire set of objects under study. A population is the totality of a certain (or some) quantitative indicator(s) of the objects, denoted .
An Individual is a single element of the population, denoted . An individual can be viewed as a particular value of the population.
A Sample is a subset of individuals drawn from the population, denoted , where is the -th individual and is the sample size. The sequentially observed sample values are denoted , called a sample observation of size from population , or a realization of the sample, or simply the sample values.
Sample Space is the set of all possible results of the sample observations, denoted .
Simple Random Sample. A sample drawn randomly from the population, where each individual is independent and identically distributed, is called a simple random sample.
Statistic. For a sample , if a real-valued continuous function depends only on the random variables and does not depend on unknown parameters of the population, then is called a statistic. A statistic is itself a random variable. Common statistics include:
Sample mean;
Sample variance;
Sample standard deviation;
Sample -th origin moment;
Sample -th central moment.
.
.
.
.
.
Order Statistics. For a sample , arrange the random variables in increasing order: . Then are called the order statistics of the sample, where is the sample minimum, is the sample maximum, and is the -th sample quantile.
We can define:
Range;
Median (when is odd) or (when is even).
Upper Quantile. Let the distribution function of population be . If there exists a unique real number such that , then is called the upper quantile of population .
Distributions of Several Commonly Used Quantities
Distribution
Let random variables be independent, each following a standard normal distribution . Then the random variable follows a chi-square distribution with degrees of freedom, denoted .
The density function of the chi-square distribution is:
;
;
The chi-square distribution is additive.
Distribution
Let random variable follow a standard normal distribution , and random variable follow a chi-square distribution with degrees of freedom, and let and be independent. Then the random variable follows a -distribution with degrees of freedom, denoted .
The -distribution limits to the standard normal distribution. When , the distribution is the Cauchy distribution, whose expectation does not exist.
Distribution
Let random variable follow a chi-square distribution with degrees of freedom, and random variable follow a chi-square distribution with degrees of freedom, and let and be independent. Then the random variable follows an -distribution with degrees of freedom , denoted . , are the first and second degrees of freedom, respectively.
denotes the upper quantile of the -distribution.
Sampling Distributions for Normal Populations
Let population follow a normal distribution . Draw a simple random sample of size . Then:
The sample mean follows a normal distribution .
The random variable follows a chi-square distribution with degrees of freedom.
The random variable follows a -distribution with degrees of freedom.
The random variable follows an -distribution with degrees of freedom , where , are the sample variances from two independent samples drawn from normal populations and , respectively.
For two normal sampling populations:
, , , are mutually independent.
follows a normal distribution .
Assuming both populations have the same variance , then the random variable follows a -distribution with degrees of freedom .
The random variable follows an -distribution with degrees of freedom .
Parameter Estimation
Drawing a sample from a population, we construct functions of the sample in some way to estimate unknown parameters of the population.
Point Estimation
Let the distribution of population be determined by an unknown parameter , where is a parameter of the population. Let sample . If a statistic is used to estimate the population parameter , then is called a point estimator of parameter , and its value is called the point estimate.
Method of Moments
Let the distribution of population be determined by unknown parameter , and suppose the -th moment of the population exists. Let sample . Then the -th moment of the population can be estimated by the sample -th origin moment .
By equating the first sample origin moments to the corresponding population moments, we solve for the estimator of parameter , called the method of moments estimator.
Generally, regardless of the population distribution, as long as the population expectation and variance exist, their method of moments estimators are and (note the difference from the unbiased sample variance ).
Method of moments estimators for lower-order moments are usually good; for higher-order moments, they are often not as good.
Maximum Likelihood Estimation
Let the distribution of population be determined by unknown parameter . The joint probability density function (or probability mass function) of sample is . Then the function is called the likelihood function for the sample observations .
By solving the equation (or ), we obtain the estimator of parameter , called the maximum likelihood estimator.
For a continuous population, the likelihood function is , i.e., the product of the probability density functions.
The idea of maximum likelihood estimation is to choose, among all possible parameter values, the one that maximizes the probability (likelihood) of the observed sample values.
Maximum likelihood estimators may not be unique.
Invariance Principle of Maximum Likelihood Estimation: Let be the maximum likelihood estimator of parameter , and let be an invertible function of . Then is the maximum likelihood estimator of . However, the method of moments generally does not possess invariance.
Criteria for Evaluating Point Estimators
Let the distribution of population be determined by unknown parameter , and let sample .
Unbiasedness. If a point estimator of parameter satisfies , then is called an unbiased estimator of ; if , then is called a biased estimator.
Under the i.i.d. assumption, the sample origin moment is an unbiased estimator of the population -th origin moment .
Efficiency. If there exist two unbiased estimators , for , then when , estimator is said to be more efficient than , or dominates.
Rao-Cramér Inequality: Suppose the following conditions hold:
For any possible value of , the probability density function exists and is differentiable with respect to ;
For any , ;
There exists a function (Fisher information) such that .
Let there exist an unbiased estimator for , with .
Then (for independent samples, accumulates linearly with sample size).
Consistency. If a point estimator for satisfies , then is called a consistent estimator of . Consistency does not require the estimator to be unbiased.
If a point estimator for satisfies and , then is a consistent estimator of .
Interval Estimation
Let the distribution of population be determined by unknown parameter , and let sample .
Confidence Interval. If random variables and satisfy , then the interval is called a confidence interval for parameter , where is the confidence level and is the significance level.
Confidence intervals are not unique; typically we choose the one with the shortest length. Reliability is ensured first, then precision. For symmetric probability distribution curves, symmetric confidence intervals are usually chosen.
Pivotal Quantity. A random variable is called a pivotal quantity for parameter if:
The distribution of does not depend on the unknown parameter ;
is strictly monotonic (increasing or decreasing) with respect to .
In the following tables, denotes the upper quantile of the standard normal distribution (i.e., ); denotes the upper quantile of the -distribution with degrees of freedom; and denote the upper quantiles of the respective chi-square and distributions.
Also, CI is abbreviated for confidence interval for brevity.
For a normal population , we have:
known, CI for
unknown, CI for
known, CI for
unknown, CI for
Pivotal Quantity
Distribution
Confidence Interval
One-sided Confidence Interval. For a given , if there exists a random variable such that , then the interval is called a lower one-sided confidence interval for ; if there exists a random variable such that , then the interval is called an upper one-sided confidence interval for .
Hypothesis Testing
Basic Concepts
Let the distribution of population be determined by unknown parameter , and let sample .
Hypothesis Testing. The process of testing a hypothesis about the population parameter based on the sample observations is called hypothesis testing.
Null Hypothesis and Alternative Hypothesis. In hypothesis testing, the hypothesis to be tested is usually called the null hypothesis, denoted ; the hypothesis opposite to the null is called the alternative hypothesis, denoted .
Test Statistic. If a real-valued continuous function depends only on the random variables and not on unknown parameters of the population, then is called a test statistic.
Level of Test. In hypothesis testing, if under the assumption that is true, the probability of rejecting is , then is called the level of the test or significance level.
Rejection Region/Acceptance Region. In hypothesis testing, if we reject/accept the null hypothesis when the test statistic falls in a certain region , then region is called the rejection/acceptance region.
Type I Error. In hypothesis testing, if the null hypothesis is true but we erroneously reject , this error is called a Type I error.
Type II Error. In hypothesis testing, if the null hypothesis is false but we erroneously accept , this error is called a Type II error.
Hypothesis Testing for Parameters of a Single Normal Population
Z-test (known ): When is known, to test the mean of a normal population , the test statistic is , which follows a standard normal distribution .
Null Hypothesis
Alternative Hypothesis
Rejection Region
T-test (unknown ): When is unknown, to test the mean of a normal population , the test statistic is , which follows a -distribution with degrees of freedom.
Null Hypothesis
Alternative Hypothesis
Rejection Region
-test: When is known, to test the variance of a normal population , the test statistic is , which follows a chi-square distribution with degrees of freedom.
Null Hypothesis
Alternative Hypothesis
Rejection Region
When is unknown, to test the variance of a normal population , the test statistic is , which follows a chi-square distribution with degrees of freedom.
Null Hypothesis
Alternative Hypothesis
Rejection Region
-value Method: In hypothesis testing, if the observed value of test statistic is , then the -value is defined as the probability, under the assumption that is true, that takes a value “as extreme or more extreme” than .
Hypothesis Testing for Parameters of Two Normal Populations
Z-test (two population means, variances known): When , are both known, to test the difference of means of two normal populations , , the test statistic is , which follows a standard normal distribution .
Null Hypothesis
Alternative Hypothesis
Rejection Region
T-test (two population means, variances unknown): When , are both unknown, to test the difference of means of two normal populations, the test statistic is , which follows a -distribution with degrees of freedom.
Null Hypothesis
Alternative Hypothesis
Rejection Region
F-test: To test the variances , of two normal populations , , the test statistic is , which follows an -distribution with degrees of freedom .
Null Hypothesis
Alternative Hypothesis
Rejection Region
Hypothesis Testing for Parameters of Non-normal Populations
For testing population proportions (non-normal population, large sample size), we can approximate the sample mean as the frequency of a binomial distribution: when is sufficiently large, approximately follows .