Percentiles. Discrete and Continuous Percentiles.
A p-th percentile is the value below which P percent of the data falls. For example, the 90th percentile is the value below which 90% of the observations lie.
To find a percentile, the data must be sorted in ascending order. Percentiles divide the data into 100 equal parts.
It's important to mention that percentiles can be discrete or continuous.
→ Continuous Percentile
It can take any value within the data range, not just the values that are directly present in the dataset. This allows for a more accurate assessment of the data distribution, especially when we want to find a value between two existing data points.
→ Discrete Percentile
It can take a value from the dataset that most closely matches the desired percentage.
→ Finding the 90th Continuous Percentile for a dataset
Step 1. We need the 90th percentile, so P = 0.90.
Step 2. Original dataset: 2, 10, 1, 3, 40, 0
Step 3. Sorting the dataset: 0, 1, 2, 3, 10, 40
Step 4. Calculate the percentile position using the formula: [ (P * (N - 1)) + 1 = 0.90 * (6 - 1) + 1 = 5.5 ] Add one to start the position index from 1 instead of 0.
Step 5. Apply linear interpolation. Formula:
[ interpolated_value = a + (fraction * (b - a)) ].
The position 5.5 is between the 5th and 6th points in the sorted list. The 5th value: ( a = 10 ), and the 6th value: ( b = 40 ), with fraction = 0.5 (fractional part of position 5.5).
[ res = 10 + (0.5 * (40 - 10)) = 25 ]
Step 6. Thus, the value at position 5.5, which is the 90th percentile for this dataset, is 25.
Important distinctions:
- The percentile itself: 90th
- The position of the 90th percentile in our dataset: 5.5
- The value of the 90th continuous percentile in our dataset: 25
For the discrete percentile: since it does not use interpolation, we choose the nearest value that matches or exceeds the position 5.5. In this case, it is the 6th value: [ percentile_90 = 40 ].