Percentiles. Discrete and Continuous Percentiles.

Percentiles. Discrete and Continuous Percentiles.


A p-th percentile is the value below which P percent of the data falls. For example, the 90th percentile is the value below which 90% of the observations lie.

To find a percentile, the data must be sorted in ascending order. Percentiles divide the data into 100 equal parts.

It's important to mention that percentiles can be discrete or continuous.

→ Continuous Percentile

It can take any value within the data range, not just the values that are directly present in the dataset. This allows for a more accurate assessment of the data distribution, especially when we want to find a value between two existing data points.

→ Discrete Percentile

It can take a value from the dataset that most closely matches the desired percentage.

→ Finding the 90th Continuous Percentile for a dataset

Step 1. We need the 90th percentile, so P = 0.90.

Step 2. Original dataset: 2, 10, 1, 3, 40, 0

Step 3. Sorting the dataset: 0, 1, 2, 3, 10, 40

Step 4. Calculate the percentile position using the formula: [ (P * (N - 1)) + 1 = 0.90 * (6 - 1) + 1 = 5.5 ] Add one to start the position index from 1 instead of 0.

Step 5. Apply linear interpolation. Formula:

[ interpolated_value = a + (fraction * (b - a)) ].

The position 5.5 is between the 5th and 6th points in the sorted list. The 5th value: ( a = 10 ), and the 6th value: ( b = 40 ), with fraction = 0.5 (fractional part of position 5.5).

[ res = 10 + (0.5 * (40 - 10)) = 25 ]

Step 6. Thus, the value at position 5.5, which is the 90th percentile for this dataset, is 25.

Important distinctions:

  • The percentile itself: 90th
  • The position of the 90th percentile in our dataset: 5.5
  • The value of the 90th continuous percentile in our dataset: 25

For the discrete percentile: since it does not use interpolation, we choose the nearest value that matches or exceeds the position 5.5. In this case, it is the 6th value: [ percentile_90 = 40 ].




Report Page