• Python for Machine Learning
  • Machine Learning with R
  • Machine Learning Algorithms
  • EDA
  • Math for Machine Learning
  • Machine Learning Interview Questions
  • ML Projects
  • Deep Learning
  • NLP
  • Computer vision
  • Data Science
  • Artificial Intelligence

Open In App

  • Data Analysis with Python
  • Introduction to Data Analysis

    Data Analysis Libraries

    Data Visulization Libraries

    Exploratory Data Analysis (EDA)

    • Univariate, Bivariate and Multivariate data and its analysis
    • Measures of Central Tendency in Statistics
    • Measures of spread - Range, Variance, and Standard Deviation
    • Interquartile Range and Quartile Deviation using NumPy and SciPy
    • Anova Formula
    • Skewness of Statistical Data
    • How to Calculate Skewness and Kurtosis in Python?
    • Difference Between Skewness and Kurtosis
    • Histogram | Meaning, Example, Types and Steps to Draw
    • Interpretations of Histogram
    • Box Plot
    • Quantile Quantile plots
    • Using pandas crosstab to create a bar plot
    • Exploring Correlation in Python
    • Mathematics | Covariance and Correlation
    • Factor Analysis | Data Analysis
    • Data Mining - Cluster Analysis
    • MANOVA Test in R Programming
    • MANOVA Test in R Programming
    • Python - Central Limit Theorem
    • Probability Distribution Function
    • Exponential Distribution in R Programming - dexp(), pexp(), qexp(), and rexp() Functions
    • Mathematics | Probability Distributions Set 4 (Binomial Distribution)
    • Poisson Distribution | Definition, Formula, Table and Examples
    • P-Value: Comprehensive Guide to Understand, Apply, and Interpret
    • Z-Score in Statistics
    • How to Calculate Point Estimates in R?
    • Confidence Interval
    • Chi-square test in Machine Learning
    • Understanding Hypothesis Testing

    Data Preprocessing

    Data Transformation

    Time Series Data Analysis

    Case Studies and Projects

Last Updated : 11 Feb, 2024

Improve

The quantile-quantile( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same population or not. Q-Q plots are particularly useful for assessing whether a dataset is normally distributed or if it follows some other known distribution. They are commonly used in statistics, data analysis, and quality control to check assumptions and identify departures from expected distributions.

Quantiles And Percentiles

Quantiles are points in a dataset that divide the data into intervals containing equal probabilities or proportions of the total distribution. They are often used to describe the spread or distribution of a dataset. The most common quantiles are:

  1. Median (50th percentile): The median is the middle value of a dataset when it is ordered from smallest to largest. It divides the dataset into two equal halves.
  2. Quartiles (25th, 50th, and 75th percentiles): Quartiles divide the dataset into four equal parts. The first quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the median, and the third quartile (Q3) is the value below which 75% of the data falls.
  3. Percentiles: Percentiles are similar to quartiles but divide the dataset into 100 equal parts. For example, the 90th percentile is the value below which 90% of the data falls.

Note:

  • A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set.
  • For reference purposes, a 45% line is also plotted; For if the samples are from the same population then the points are along this line.


Normal Distribution:

The normal distribution (aka Gaussian distribution Bell curve) is a continuous probability distribution representing distribution obtained from the randomly generated real values.

.Quantile Quantile plots - GeeksforGeeks (1)

Quantile Quantile plots - GeeksforGeeks (2)

Quantile Quantile plots - GeeksforGeeks (3)

Normal Distribution with Area Under CUrve

How to Draw Q-Q plot?

To draw a Quantile-Quantile (Q-Q) plot, you can follow these steps:

  1. Collect the Data: Gather the dataset for which you want to create the Q-Q plot. Ensure that the data are numerical and represent a random sample from the population of interest.
  2. Sort the Data: Arrange the data in either ascending or descending order. This step is essential for computing quantiles accurately.
  3. Choose a Theoretical Distribution: Determine the theoretical distribution against which you want to compare your dataset. Common choices include the normal distribution, exponential distribution, or any other distribution that fits your data well.
  4. Calculate Theoretical Quantiles: Compute the quantiles for the chosen theoretical distribution. For example, if you’re comparing against a normal distribution, you would use the inverse cumulative distribution function (CDF) of the normal distribution to find the expected quantiles.
  5. Plotting:
    • Plot the sorted dataset values on the x-axis.
    • Plot the corresponding theoretical quantiles on the y-axis.
    • Each data point (x, y) represents a pair of observed and expected values.
    • Connect the data points to visually inspect the relationship between the dataset and the theoretical distribution.


Interpretation of Q-Q plot

  • If the points on the plot fall approximately along a straight line, it suggests that your dataset follows the assumed distribution.
  • Deviations from the straight line indicate departures from the assumed distribution, requiring further investigation.

Exploring Distribution Similarity with Q-Q Plots


Exploring distribution similarity using Q-Q plots is a fundamental task in statistics. Comparing two datasets to determine if they originate from the same distribution is vital for various analytical purposes. When the assumption of a common distribution holds, merging datasets can improve parameter estimation accuracy, such as for location and scale. Q-Q plots, short for quantile-quantile plots, offer a visual method for assessing distribution similarity. In these plots, quantiles from one dataset are plotted against quantiles from another. If the points closely align along a diagonal line, it suggests similarity between the distributions. Deviations from this diagonal line indicate differences in distribution characteristics.

While tests like the chi-square and Kolmogorov-Smirnov tests can evaluate overall distribution differences, Q-Q plots provide a nuanced perspective by directly comparing quantiles. This enables analysts to discern specific differences, such as shifts in location or changes in scale, which may not be evident from formal statistical tests alone.

Python Implementation Of Q-Q Plot

Python3

import numpy as np

import matplotlib.pyplot as plt

import scipy.stats as stats

# Generate example data

np.random.seed(0)

data = np.random.normal(loc=0, scale=1, size=1000)

# Create Q-Q plot

stats.probplot(data, dist="norm", plot=plt)

plt.title('Normal Q-Q plot')

plt.xlabel('Theoretical quantiles')

plt.ylabel('Ordered Values')

plt.grid(True)

plt.show()

 
 

Output:

Quantile Quantile plots - GeeksforGeeks (4)

Q-Q plot

Here, as the data points approximately follow a straight line in the Q-Q plot, it suggests that the dataset is consistent with the assumed theoretical distribution, which in this case we assumed to be the normal distribution.

Advantages of Q-Q plot

  1. Flexible Comparison: Q-Q plots can compare datasets of different sizes without requiring equal sample sizes.
  2. Dimensionless Analysis: They are dimensionless, making them suitable for comparing datasets with different units or scales.
  3. Visual Interpretation: Provides a clear visual representation of data distribution compared to a theoretical distribution.
  4. Sensitive to Deviations: Easily detects departures from assumed distributions, aiding in identifying data discrepancies.
  5. Diagnostic Tool: Helps in assessing distributional assumptions, identifying outliers, and understanding data patterns.

Applications Of Quantile-Quantile Plot

The Quantile-Quantile plot is used for the following purpose:

  1. Assessing Distributional Assumptions: Q-Q plots are frequently used to visually inspect whether a dataset follows a specific probability distribution, such as the normal distribution. By comparing the quantiles of the observed data to the quantiles of the assumed distribution, deviations from the assumed distribution can be detected. This is crucial in many statistical analyses, where the validity of distributional assumptions impacts the accuracy of statistical inferences.
  2. Detecting Outliers: Outliers are data points that deviate significantly from the rest of the dataset. Q-Q plots can help identify outliers by revealing data points that fall far from the expected pattern of the distribution. Outliers may appear as points that deviate from the expected straight line in the plot.
  3. Comparing Distributions: Q-Q plots can be used to compare two datasets to see if they come from the same distribution. This is achieved by plotting the quantiles of one dataset against the quantiles of another dataset. If the points fall approximately along a straight line, it suggests that the two datasets are drawn from the same distribution.
  4. Assessing Normality: Q-Q plots are particularly useful for assessing the normality of a dataset. If the data points in the plot closely follow a straight line, it indicates that the dataset is approximately normally distributed. Deviations from the line suggest departures from normality, which may require further investigation or non-parametric statistical techniques.
  5. Model Validation: In fields like econometrics and machine learning, Q-Q plots are used to validate predictive models. By comparing the quantiles of observed responses with the quantiles predicted by a model, one can assess how well the model fits the data. Deviations from the expected pattern may indicate areas where the model needs improvement.
  6. Quality Control: Q-Q plots are employed in quality control processes to monitor the distribution of measured or observed values over time or across different batches. Departures from expected patterns in the plot may signal changes in the underlying processes, prompting further investigation.

Types of Q-Q plots

There are several types of Q-Q plots commonly used in statistics and data analysis, each suited to different scenarios or purposes:

  1. Normal Distribution: A symmetric distribution where the Q-Q plot would show points approximately along a diagonal line if the data adheres to a normal distribution.
  2. Right-skewed Distribution: A distribution where the Q-Q plot would display a pattern where the observed quantiles deviate from the straight line towards the upper end, indicating a longer tail on the right side.
  3. Left-skewed Distribution: A distribution where the Q-Q plot would exhibit a pattern where the observed quantiles deviate from the straight line towards the lower end, indicating a longer tail on the left side.
  4. Under-dispersed Distribution: A distribution where the Q-Q plot would show observed quantiles clustered more tightly around the diagonal line compared to the theoretical quantiles, suggesting lower variance.
  5. Over-dispersed Distribution: A distribution where the Q-Q plot would display observed quantiles more spread out or deviating from the diagonal line, indicating higher variance or dispersion compared to the theoretical distribution.

Python3

import numpy as np

import matplotlib.pyplot as plt

import scipy.stats as stats

# Generate a random sample from a normal distribution

normal_data = np.random.normal(loc=0, scale=1, size=1000)

# Generate a random sample from a right-skewed distribution (exponential distribution)

right_skewed_data = np.random.exponential(scale=1, size=1000)

# Generate a random sample from a left-skewed distribution (negative exponential distribution)

left_skewed_data = -np.random.exponential(scale=1, size=1000)

# Generate a random sample from an under-dispersed distribution (truncated normal distribution)

under_dispersed_data = np.random.normal(loc=0, scale=0.5, size=1000)

under_dispersed_data = under_dispersed_data[(under_dispersed_data > -1) & (under_dispersed_data < 1)] # Truncate

# Generate a random sample from an over-dispersed distribution (mixture of normals)

over_dispersed_data = np.concatenate((np.random.normal(loc=-2, scale=1, size=500),

np.random.normal(loc=2, scale=1, size=500)))

# Create Q-Q plots

plt.figure(figsize=(15, 10))

plt.subplot(2, 3, 1)

stats.probplot(normal_data, dist="norm", plot=plt)

plt.title('Q-Q Plot - Normal Distribution')

plt.subplot(2, 3, 2)

stats.probplot(right_skewed_data, dist="expon", plot=plt)

plt.title('Q-Q Plot - Right-skewed Distribution')

plt.subplot(2, 3, 3)

stats.probplot(left_skewed_data, dist="expon", plot=plt)

plt.title('Q-Q Plot - Left-skewed Distribution')

plt.subplot(2, 3, 4)

stats.probplot(under_dispersed_data, dist="norm", plot=plt)

plt.title('Q-Q Plot - Under-dispersed Distribution')

plt.subplot(2, 3, 5)

stats.probplot(over_dispersed_data, dist="norm", plot=plt)

plt.title('Q-Q Plot - Over-dispersed Distribution')

plt.tight_layout()

plt.show()

 
 

Output:

Quantile Quantile plots - GeeksforGeeks (5)

Q-Q plot for different distributions



P

pawangfg

Improve

Previous Article

Box Plot

Next Article

Please Login to comment...

Similar Reads

qqplot (Quantile-Quantile Plot) in Python When the quantiles of two variables are plotted against each other, then the plot obtained is known as quantile - quantile plot or qqplot. This plot provides a summary of whether the distributions of two variables are similar or not with respect to the locations. Interpretations All point of quantiles lie on or close to straight line at an angle of 2 min read Draw a Quantile-Quantile Plot in R Programming - qqline() Function The Quantile-Quantile Plot in R Programming Language, or (Q-Q Plot) is defined as a value of two variables that are plotted corresponding to each other and check whether the distributions of two variables are similar or not concerning the locations. qqline() function in R Programming Language is used to draw a Q-Q Line Plot. QQplot in R Syntax: qql 2 min read Surface plots and Contour plots in Python Matplotlib was introduced keeping in mind, only two-dimensional plotting. But at the time when the release of 1.0 occurred, the 3d utilities were developed upon the 2d and thus, we have 3d implementation of data available today! The 3d plots are enabled by importing the mplot3d toolkit. In this article, we will discuss the surface plots and contour 4 min read Quantile Regression in R Programming Quantile Regression is an algorithm that studies the impact of independent variables on different quantiles of the dependent variable distribution. Quantile Regression provides a complete picture of the relationship between Z and Y. It is robust and effective to outliers in Z observations. In Quantile Regression, the estimation and inferences are d 3 min read How to Perform Quantile Regression in Python In this article, we are going to see how to perform quantile regression in Python. Linear regression is defined as the statistical method that constructs a relationship between a dependent variable and an independent variable as per the given set of variables. While performing linear regression we are curious about computing the mean value of the r 4 min read Quantile Transformer for Outlier Detection Data transformation is a mathematical function that changes the data into a scaled value, which makes it possible to compare different columns, e.g., salary in INR with weight in kilograms. Transforming the data will satisfy certain mathematical assumptions such as normalization, standardization, hom*ogeneity, linearity, etc. Quantile Transformer is 11 min read How Symmetric Weighted Quantile Sketch (SWQS) works? A strong method for quickly determining a dataset's quantiles in data science and machine learning is the Symmetric Weighted Quantile Sketch (SWQS). Quantiles are cut points that divide a probability distribution's range into adjacent intervals with equal probabilities. They are crucial for data summarization, machine learning model assessment, and 7 min read Seaborn | Distribution Plots Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. This article deals with the distribution plots in seaborn which is used for examining univariate and bivariate distributions. In this article we will be discussing 4 types of distributio 3 min read ML | Matrix plots in Seaborn Seaborn is a wonderful visualization library provided by python. It has several kinds of plots through which it provides the amazing visualization capabilities. Some of them include count plot, scatter plot, pair plots, regression plots, matrix plots and much more. This article deals with the matrix plots in seaborn. Example 1: Heatmaps Heatmap is 4 min read Seaborn | Regression Plots The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. Regression plots as the name suggests creates a regression line between 2 parameters and helps to visualize their linear relationships. This article deals with those kinds of plots in seaborn a 4 min read

Article Tags :

  • Data Visualization
  • ML-EDA
  • ML-plots
  • AI-ML-DS

Practice Tags :

  • Machine Learning

We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

Quantile Quantile plots - GeeksforGeeks (6)

Quantile Quantile plots - GeeksforGeeks (2024)

References

Top Articles
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated:

Views: 5887

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.