A cumulative distribution function (CDF) gives the probability that a random variable takes a value less than or equal to a given number.
Most statistics courses spend a lot of time on the bell curve and probability density functions. Then they introduce something called the cumulative distribution function, and it’s easy to wonder why you need both.
The CDF is actually a more versatile tool for answering many real-world questions. It directly tells you the chance of observing a value of X or less, which is exactly the kind of question quality control, risk analysis, and finance regularly ask. This article explains what a CDF is, how it differs from a PDF, and why it’s one of the most useful plots in statistics.
What Exactly Is a Cumulative Distribution Function?
The cumulative distribution function, or CDF, is a function defined for a random variable X. At any input value x, the CDF outputs the probability that X is less than or equal to x. Mathematically, it’s written as F(x) = P(X ≤ x).
This function has a few essential properties. It is non-decreasing — as x increases, the cumulative probability never drops. Its values always lie between 0 and 1. As x goes toward negative infinity, the CDF approaches 0; as x goes to positive infinity, it approaches 1. The CDF is also right-continuous, meaning there are no gaps when moving from left to right.
For a discrete variable like a fair six-sided die, the CDF at x = 3 gives the probability of rolling a 3 or less. That’s 3/6, or 0.5. This simple calculation makes the CDF intuitive once you see it in action.
Why You Need Both PDF and CDF
Many students first learn the probability density function (PDF), which shows the relative likelihood of different values. But the CDF provides information the PDF cannot directly give without integration. Here’s how they complement each other:
- PDF shows density; CDF shows accumulation. The PDF’s height indicates where data clusters; the CDF’s slope shows how quickly probabilities accumulate.
- CDF directly yields percentiles. To find the 90th percentile, you can read the x-value from the CDF. With a PDF you would need to integrate.
- CDF makes distribution comparisons easier. Overlaying two CDFs immediately shows which distribution tends to produce larger values.
- CDF is the second most common plot in risk analysis. After the histogram, risk analysts rely on CDF plots to evaluate probabilities of exceedance and thresholds.
- For continuous variables, CDF gives exact probabilities without calculus. The CDF itself is the antiderivative of the PDF, so reading P(X ≤ x) is a simple lookup.
Both plots are taught because they answer different types of questions. The PDF clarifies relative likelihood, while the CDF simplifies cumulative probability questions.
PDF vs CDF: What the Graphs Tell You
A probability density function (PDF) shows the frequency of various values in a data set, while a CDF shows the cumulative probability up to each value — a distinction explained in NOAA’s PDF vs CDF document. The PDF curve can peak and valley; the CDF is always increasing and flattens near 1.
On a CDF plot, the horizontal axis displays x values, and the vertical axis shows cumulative probabilities from 0 to 1. For a normal distribution, the CDF takes a smooth S-shape. This reveals at a glance where most of the probability sits.
The table below compares both functions across several dimensions.
| Feature | PDF (Probability Density Function) | CDF (Cumulative Distribution Function) |
|---|---|---|
| Definition | Density of probability at a value | Probability of ≤ a value |
| Range of y-values | Can exceed 1 (for continuous) | Always between 0 and 1 |
| Shape | Varies (bell, skewed, uniform) | Non-decreasing, S-curve typical |
| Area under curve | Total area = 1 | Not applicable (pointwise) |
| Direct interpretability | Height not a probability (for continuous) | y-value is a probability |
These differences matter when you’re choosing a graph. Use the PDF to show shape; use the CDF to answer “what proportion falls below this threshold?”
Real-World Applications of the CDF
The CDF isn’t just a classroom concept — it is used across industries to answer probability questions. Here are some of the most common applications:
- Risk analysis and reliability. Engineers use CDF plots to model failure times; the probability of failure within a period is read directly from the CDF.
- Quality control. Manufacturers compute the probability of defect from the CDF of a process distribution.
- Comparing distributions. Overlaying CDFs reveals which group tends to produce larger values, useful in A/B testing.
- Calculating p-values. The p-value is a CDF evaluation of how extreme a test statistic is.
- Financial modeling. Value at Risk (VaR) uses the CDF to find loss thresholds with a given probability.
In each case, the CDF provides a direct answer to a question that would otherwise require integration or simulation. That efficiency is why it’s a staple in data science and statistics courses alike.
How to Read a CDF Plot
Reading a CDF plot is straightforward. Per the CDF plot axes guide from Statisticsbyjim, the horizontal axis shows the possible values of the random variable, and the vertical axis shows the cumulative probability (from 0 to 1). To find the probability that X is ≤ some value, locate that value on the x-axis, go up to the curve, and read the y-coordinate.
For a discrete variable like a die roll, the CDF is a step function. The table below shows the CDF for a fair die. Notice the CDF jumps at each integer value and stays constant between them.
For a continuous variable, the curve is smooth. The steepest section corresponds to the mean region. You can also compute the probability of an interval by subtracting CDF values: P(a < X ≤ b) = F(b) – F(a). This property makes the CDF a versatile tool for any distribution.
| Roll (x) | Probability P(X = x) | CDF P(X ≤ x) |
|---|---|---|
| 1 | 1/6 ≈ 0.167 | 0.167 |
| 2 | 1/6 ≈ 0.167 | 0.333 |
| 3 | 1/6 ≈ 0.167 | 0.500 |
| 4 | 1/6 ≈ 0.167 | 0.667 |
| 5 | 1/6 ≈ 0.167 | 0.833 |
| 6 | 1/6 ≈ 0.167 | 1.000 |
The Bottom Line
The cumulative distribution function is a core concept in statistics that answers the most common probability question: “What’s the chance of X or less?” It complements the PDF by providing cumulative probabilities directly, without integration. Whether you’re in a risk analysis meeting, studying for a statistics exam, or exploring data, the CDF plot is a practical tool for understanding distributions.
If you’re working through examples for a course, take a fair die or coin and compute its CDF by hand. That exercise builds intuition faster than any formula. Your instructor or textbook’s problem sets remain the best resource for mastering CDF calculations.
References & Sources
- Noaa. “B9d55824 5e99 Ad6c 11af167fbd” A Probability Density Function (PDF) graph shows the frequency of various values in a data set, while a CDF graph shows the cumulative probability up to each value.
- Statisticsbyjim. “Cumulative Distribution Function Cdf” On a CDF plot, the horizontal axis displays the x values, while the vertical axis displays cumulative probabilities or percentiles.