How to create an X-bar R control charts with SQL

Guide Sep 24, 2020

In this tutorial, you will learn how to create an X-bar R charts using SQL commands. This is the first guide in a series that will show you how to create various control charts using SQL so you can easily visualize them in business intelligence (BI) tools like KensoBI/Grafana, QlikView, or PowerBI.

Queries are designed to be “variable-friendly”, meaning you will be able to add variables from your BI tool and dynamically control things like date ranges, sample size, or last number of measurements. The SQL queries below were built and tested on the PostgreSQL database but will work on any rational database with very small tweaks.

Before diving into the code though, it is crucial to understand what X-Bar and R charts are, when to use them and why. Understanding the fundamentals will allow you to build efficient, custom queries that can scale across large volumes of your measurement data.

What is X-bar chart?

The X-bar (x̄ or Xbar) chart is used to monitor the mean of a process where measurements are performed in subgroups on a continuous data flow. Chart generally consists of four lines:

  • UCL - Upper Control Limit
  • LCL - Lower Control Limit
  • X-Bar or x̄ - sample mean
  • X-double bar or x̄̄ - grand mean (mean of sample means)

Upper and lower control limits are calculated from the data set, as such, almost all of the data points on the chart will fall within these limits as long as the process remains in control. Data points are ordered by time or by production sequence.

What is R chart?

When working with X-Bar charts, we have to talk about R chart. R chart is used to monitor the variation of a process performed in subgroups on a continuous data flow. It is usually plotted together with the Xbar chart because it essential to correctly interpret the Xbar chart. If the R chart is out of control, then the control limits of the Xbar chart might be wrong.

When to use X-bar R charts?

Use X-bar R charts to assess the stability of a continuous process. Process is stable when both means and variances are constant over time. X-bar chart along R chart will help you identify if variations from subgroup to subgroup in your process come from a common or special cause.

Which control chart to use - flowchart

What are the requirements for the X-Bar chart?

  • Your data, temperature, weight, length, position, and time, are measured on a continuous basis.
  • Your data is normally distributed
  • Subgroup size is 2 or more but less than 11

How to interpret X-Bar chart?

If any point in your data set is outside of the calculated control limits (that is, the UCL and LCL), this may be an indication that the process variation is unstable or out of control. Before making any conclusions about the X-bar chart, we first need to look at the R chart and see if it is in control. If the R chart is in control, we can be sure that control limits on the X-bar chart are accurate and we are doing an assessment on the process center.

Sometimes, even if all points fall within X-bar control limits, there could still be a problem with your process. It usually happens when your data points do not look random and start forming a pattern. A good example is if the first five data points out of a set of 20 are above the grand mean, and the final five are below.

Initial SQL setup

Create a sample measurement table and populate it with some measurements.

CREATE TABLE measurements
(
    partid                          int,
    characteristicid                int,
    measurementvalue                float,
    measurementdate                 timestamp
)

We will also need a table with SPC constants. Here are the queries to populate it.

CREATE TABLE spc_constants
(
    subgroup int NOT NULL,
    c4_xbar_sigma float,
    d2_xbar_range float,
    d3_range_limit float,
    a2_xbar_limit_range float,
    a3_xbar_limit_sigma float,
    b3_sigma_lcl float,
    b4_sigma_ucl float,
    d3_range_lcl float,
    d4_range_ucl float,
    CONSTRAINT subgroup_pk PRIMARY KEY (subgroup)
)

Filter and create subgroups

The first thing we need to do is to reduce the number of data points. In this example, we will filter our measurements by part, characteristic, and measurement date. Next, we split the result into sub-groups using the window function. In an X-bar chart, our subgroup should be between 2 and 10. We will go with 5.

SELECT
	measurementValue,
    (row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup 
FROM 
	measurements  
WHERE
	partId = 1 
    AND characteristicId = 1 
    AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC    
--LIMIT 50

We could further reduce the result by limiting it to the last n number of measurements. Don’t forget to order it by date before applying a limit.

Xbar data points

Raw measurement values are now in groups. Let’s average them out to get Xbar values. It could be useful to limit the result set to the last number of samples.

SELECT 
	avg(measurementValue) AS subgroupMean, 
	(subgroup + 1) AS subgroupNumber
FROM (
	SELECT
		measurementValue,
		(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup 
	FROM 
		measurements  
	WHERE
		partId = 1 
		AND characteristicId = 1 
		AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
	--ORDER BY measurementDate DESC    
	--LIMIT 50
) subgroups 
GROUP BY subgroup
ORDER BY subgroup ASC
--LIMIT 10 -- limit to last number of samples

We can now plot our sample means.

X-bar values

We are missing 3 more lines – UCL, LCL, and grand mean (X-double bar).

Calculate X-double bar

X-double bar is simply an overall mean of all our sample means. Add another SELECT statement and take an average of your X-bar values. This will give us one value that we can plot as a horizontal line.

SELECT 
	avg(subgroupMean) as xDoubleBar
FROM
(
	SELECT 
		avg(measurementValue) AS subgroupMean, 
		(subgroup + 1) AS subgroupNumber
	FROM (
		SELECT
			measurementValue,
			(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup 
		FROM 
			measurements  
		WHERE
			partId = 1 
			AND characteristicId = 1 
			AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
		--ORDER BY measurementDate DESC    
		--LIMIT 50
	) subgroups 
	GROUP BY subgroup
	ORDER BY subgroup ASC
	--LIMIT 10 -- limit to last number of samples
) sampleMeans

Calculate Upper Control Limit (UCL) for Xbar chart

UCL is a line above the X-double bar and represents 3x sigma deviation from the X-double bar.

Modify the previous query and add a range calculation. On top of that, we need to add another SELECT statement to calculate the X-double bar and R-bar (average of ranges).

SELECT 
	avg(sampleMean) as xDoubleBar,
	avg(sampleRange) as rbar
FROM
(
	SELECT 
		avg(measurementValue) AS sampleMean, 
		(subgroupNumber + 1) AS subgroupNumber,
		max(measurementValue) - min(measurementValue) as sampleRange
	FROM (
		SELECT
			measurementValue,
			(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroupNumber 
		FROM 
			measurements  
		WHERE
			partId = 1 
			AND characteristicId = 1 
			AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
		--ORDER BY measurementDate DESC    
		--LIMIT 50
	) subgroups 
	GROUP BY subgroupNumber
	ORDER BY subgroupNumber ASC
	--LIMIT 10 -- limit to last number of samples
) sampleMeans

With our data ready, it is time to apply the UCL formula that goes as follows:

UCL = X-double bar + A2 * R-bar

X-double bar is the mean of means. A2 is a constant. We will use our SPC constant reference table to pick A2value from the corresponding sample size and plug it into the formula.

We now have everything we need to calculate the upper control limit. Let’s add another SELECT to the above query.

SELECT 
	(xDoubleBar + ((SELECT a2_xbar_limit_range 
					FROM spc_constants 
                	WHERE subgroup = 5) * rBar)) AS UCL
FROM
(
	SELECT 
		avg(sampleMean) as xDoubleBar,
		avg(sampleRange) as rBar
	FROM
	(
		SELECT 
			avg(measurementValue) AS sampleMean, 
			(subgroupNumber + 1) AS subgroupNumber,
			max(measurementValue) - min(measurementValue) as sampleRange
		FROM (
			SELECT
				measurementValue,
				(row_number() OVER (ORDER BY measurementDate DESC) - 1 / 5)
                AS subgroupNumber 
			FROM 
				measurements  
			WHERE
				partId = 1 
				AND characteristicId = 1 
				AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
			--ORDER BY measurementDate DESC    
			--LIMIT 50
		) subgroups 
		GROUP BY subgroupNumber
		ORDER BY subgroupNumber ASC
		--LIMIT 10 -- limit to last number of samples
	) sampleMeans
) doubleBar

Calculate Lower Control Limit (LCL) for Xbar chart

Lower Control Limit (LCL) is the same as UCL but with 3x sigma deviation pointing downward. It is represented by a horizontal line below the grand mean. The Lower Control Limit formula goes as follows:

LCL = X-double bar - A2 * R-bar

We can reuse UCL SQL code and just change signs in our formula from + to -.

SELECT 
	(xDoubleBar - ((SELECT a2_xbar_limit_range 
					FROM spc_constants 
                	WHERE subgroup = 5) * rBar)) AS LCL
FROM
(
	SELECT 
		avg(sampleMean) as xDoubleBar,
		avg(sampleRange) as rBar
	FROM
	(
		SELECT 
			avg(measurementValue) AS sampleMean, 
			(subgroupNumber + 1) AS subgroupNumber,
			max(measurementValue) - min(measurementValue) as sampleRange
		FROM (
			SELECT
				measurementValue,
				(row_number() OVER (ORDER BY measurementDate DESC) - 1 / 5)
                AS subgroupNumber 
			FROM 
				measurements  
			WHERE
				partId = 1 
				AND characteristicId = 1 
				AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
			--ORDER BY measurementDate DESC    
			--LIMIT 50
		) subgroups 
		GROUP BY subgroupNumber
		ORDER BY subgroupNumber ASC
		--LIMIT 10 -- limit to last number of samples
	) sampleMeans
) doubleBar

We now have four queries. Each query represents a separate series on a chart. When you load it into the BI tool, your visualization should look like this:

Complete X-bar chart in KensoBI

Sample Range for R chart

With our Xbar chart out of the way, let’s focus on R chart. We will reuse most of the queries from Xbar chart and just change means to ranges.

SELECT 
	max(measurementValue)-min(measurementValue) as subgroupRange,
	(subgroup + 1) AS subgroupNumber
FROM (
	SELECT
		measurementValue,
		(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
	FROM 
		measurements  
	WHERE
		partId = 1 
		AND characteristicId = 1 
		AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
	--ORDER BY measurementDate DESC    
	--LIMIT 50
) subgroups 
GROUP BY subgroup
ORDER BY subgroup ASC
--LIMIT 10 -- limit to last number of samples

Here is the plotted result:

R chart - ranges only

Calculate R-bar

R-bar is an overall mean of all our sample ranges. Let’s add another SELECT statement and take the average of our sample ranges. This will give use one value that we can plot as horizontal line.

SELECT
	avg(subgroupRange) as rBar
FROM
	(
	SELECT 
		max(measurementValue)-min(measurementValue) as subgroupRange,
		(subgroup + 1) AS subgroupNumber
	FROM (
		SELECT
			measurementValue,
			(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
		FROM 
			measurements  
		WHERE
			partId = 1 
			AND characteristicId = 1 
			AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
		--ORDER BY measurementDate DESC    
		--LIMIT 50
	) subgroups 
	GROUP BY subgroup
	ORDER BY subgroup ASC
	--LIMIT 10 -- limit to last number of samples
) ranges

Calculate Upper Control Limit (UCL) for R chart

UCL in R chart is a line above the center line and represents 3x sigma deviation from R-bar. The formula is a little different compare to Xbar:

UCL = D4*R-bar

D4 is a constant. We will use our SPC constant reference table to pick D4 value from the corresponding sample size and plug it into the formula.

Let’s add another SELECT to the previous query.

SELECT  
  ((SELECT d4_range_ucl FROM spc_constants WHERE subgroup = 5) * rBar) as UCL
FROM
(
	SELECT
		avg(subgroupRange) as rBar
	FROM
		(
		SELECT 
			max(measurementValue)-min(measurementValue) as subgroupRange,
			(subgroup + 1) AS subgroupNumber
		FROM (
			SELECT
				measurementValue,
				(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
			FROM 
				measurements  
			WHERE
				partId = 1 
				AND characteristicId = 1 
				AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
			--ORDER BY measurementDate DESC    
			--LIMIT 50
		) subgroups 
		GROUP BY subgroup
		ORDER BY subgroup ASC
		--LIMIT 10 -- limit to last number of samples
	) ranges
)rbar

Calculate Lower Control Limit (LCL) for R chart

LCL in R chart is a line below the center line and represents 3x sigma deviation from R-bar pointing downwards. Formula for LCL in R chart goes as follows:

LCL = D3*R-bar
SELECT  
  ((SELECT d3_range_lcl FROM spc_constants WHERE subgroup = 5) * rBar) as UCL
FROM
(
	SELECT
		avg(subgroupRange) as rBar
	FROM
		(
		SELECT 
			max(measurementValue)-min(measurementValue) as subgroupRange,
			(subgroup + 1) AS subgroupNumber
		FROM (
			SELECT
				measurementValue,
				(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
			FROM 
				measurements  
			WHERE
				partId = 1 
				AND characteristicId = 1 
				AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
			--ORDER BY measurementDate DESC    
			--LIMIT 50
		) subgroups 
		GROUP BY subgroup
		ORDER BY subgroup ASC
		--LIMIT 10 -- limit to last number of samples
	) ranges
)rbar

Summary

This completes our X-Bar R guide. When you go and plot the queries it should look close to this:

X-Bar R chart in KensoBI

With rather simple SQL code, we were able to develop scripts that allow us to plot X-bar R control charts in your favorite BI tool.

I invite you to check out KensoBI and discover how you can visualize your process in the form of interactive 3D dashboards.

Tags

Tomasz Czerkas

Software Engineer and Founder @ KensoBI

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.