# How to create an X-bar R control charts with SQL

In this tutorial, you will learn how to create an X-bar R charts using SQL commands. This is the first guide in a series that will show you how to create various control charts using SQL so you can easily visualize them in business intelligence (BI) tools like KensoBI/Grafana, QlikView, or PowerBI.

Queries are designed to be “variable-friendly”, meaning you will be able to add variables from your BI tool and dynamically control things like date ranges, sample size, or last number of measurements. The SQL queries below were built and tested on the PostgreSQL database but will work on any rational database with very small tweaks.

Before diving into the code though, it is crucial to understand what X-Bar and R charts are, when to use them and why. Understanding the fundamentals will allow you to build efficient, custom queries that can scale across large volumes of your measurement data.

### What is X-bar chart?

The X-bar (x̄ or Xbar) chart is used to monitor the **mean **of a process where measurements are performed in subgroups on a continuous data flow. Chart generally consists of four lines:

**UCL**- Upper Control Limit**LCL**- Lower Control Limit**X-Bar or x̄**- sample mean**X**-**double bar or x̄̄ -**grand mean (mean of sample means)

Upper and lower control limits are calculated from the data set, as such, almost all of the data points on the chart will fall within these limits as long as the process remains in control. Data points are ordered by time or by production sequence.

### What is R chart?

When working with X-Bar charts, we have to talk about R chart. R chart is used to monitor the **variation** of a process performed in subgroups on a continuous data flow. It is usually plotted together with the Xbar chart because it essential to correctly interpret the Xbar chart. If the R chart is out of control, then the control limits of the Xbar chart might be wrong.

### When to use X-bar R charts?

Use X-bar R charts to assess the stability of a continuous process. Process is stable when both **means** and **variances** are constant over time. X-bar chart along R chart will help you identify if variations from subgroup to subgroup in your process come from a common or special cause.

### What are the requirements for the X-Bar chart?

- Your data, temperature, weight, length, position, and time, are measured on a continuous basis.
- Your data is normally distributed
- Subgroup size is 2 or more but less than 11

### How to interpret X-Bar chart?

If any point in your data set is outside of the calculated control limits (that is, the UCL and LCL), this may be an indication that the process variation is unstable or out of control. Before making any conclusions about the X-bar chart, we first need to look at the R chart and see if it is in control. If the R chart is in control, we can be sure that control limits on the X-bar chart are accurate and we are doing an assessment on the process center.

Sometimes, even if all points fall within X-bar control limits, there could still be a problem with your process. It usually happens when your data points do not look random and start forming a pattern. A good example is if the first five data points out of a set of 20 are above the grand mean, and the final five are below.

### Initial SQL setup

Create a sample measurement table and populate it with some __measurements__.

```
CREATE TABLE measurements
(
partid int,
characteristicid int,
measurementvalue float,
measurementdate timestamp
)
```

We will also need a table with SPC constants. __Here__ are the queries to populate it.

```
CREATE TABLE spc_constants
(
subgroup int NOT NULL,
c4_xbar_sigma float,
d2_xbar_range float,
d3_range_limit float,
a2_xbar_limit_range float,
a3_xbar_limit_sigma float,
b3_sigma_lcl float,
b4_sigma_ucl float,
d3_range_lcl float,
d4_range_ucl float,
CONSTRAINT subgroup_pk PRIMARY KEY (subgroup)
)
```

### Filter and create subgroups

The first thing we need to do is to reduce the number of data points. In this example, we will filter our measurements by part, characteristic, and measurement date. Next, we split the result into sub-groups using the window function. In an X-bar chart, our subgroup should be between 2 and 10. We will go with 5.

```
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
```

We could further reduce the result by limiting it to the last n number of measurements. Don’t forget to order it by date before applying a limit.

### Xbar data points

Raw measurement values are now in groups. Let’s average them out to get Xbar values. It could be useful to limit the result set to the last number of samples.

```
SELECT
avg(measurementValue) AS subgroupMean,
(subgroup + 1) AS subgroupNumber
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroup
ORDER BY subgroup ASC
--LIMIT 10 -- limit to last number of samples
```

We can now plot our sample means.

We are missing 3 more lines – UCL, LCL, and grand mean (X-double bar).

### Calculate X-double bar

X-double bar is simply an overall mean of all our sample means. Add another SELECT statement and take an average of your X-bar values. This will give us one value that we can plot as a horizontal line.

```
SELECT
avg(subgroupMean) as xDoubleBar
FROM
(
SELECT
avg(measurementValue) AS subgroupMean,
(subgroup + 1) AS subgroupNumber
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroup
ORDER BY subgroup ASC
--LIMIT 10 -- limit to last number of samples
) sampleMeans
```

### Calculate Upper Control Limit (UCL) **for Xbar chart**

UCL is a line above the X-double bar and represents 3x sigma deviation from the X-double bar.

Modify the previous query and add a range calculation. On top of that, we need to add another SELECT statement to calculate the X-double bar and R-bar (average of ranges).

```
SELECT
avg(sampleMean) as xDoubleBar,
avg(sampleRange) as rbar
FROM
(
SELECT
avg(measurementValue) AS sampleMean,
(subgroupNumber + 1) AS subgroupNumber,
max(measurementValue) - min(measurementValue) as sampleRange
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroupNumber
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroupNumber
ORDER BY subgroupNumber ASC
--LIMIT 10 -- limit to last number of samples
) sampleMeans
```

With our data ready, it is time to apply the UCL formula that goes as follows:

UCL = X-double bar + A_{2}* R-bar

X-double bar is the mean of means. A_{2} is a constant. We will use our SPC constant reference table to pick A_{2}value from the corresponding sample size and plug it into the formula.

We now have everything we need to calculate the upper control limit. Let’s add another SELECT to the above query.

```
SELECT
(xDoubleBar + ((SELECT a2_xbar_limit_range
FROM spc_constants
WHERE subgroup = 5) * rBar)) AS UCL
FROM
(
SELECT
avg(sampleMean) as xDoubleBar,
avg(sampleRange) as rBar
FROM
(
SELECT
avg(measurementValue) AS sampleMean,
(subgroupNumber + 1) AS subgroupNumber,
max(measurementValue) - min(measurementValue) as sampleRange
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1 / 5)
AS subgroupNumber
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroupNumber
ORDER BY subgroupNumber ASC
--LIMIT 10 -- limit to last number of samples
) sampleMeans
) doubleBar
```

### Calculate Lower Control Limit (LCL) **for Xbar chart**

Lower Control Limit (LCL) is the same as UCL but with 3x sigma deviation pointing downward. It is represented by a horizontal line below the grand mean. The Lower Control Limit formula goes as follows:

LCL = X-double bar - A_{2}* R-bar

We can reuse UCL SQL code and just change signs in our formula from + to -.

```
SELECT
(xDoubleBar - ((SELECT a2_xbar_limit_range
FROM spc_constants
WHERE subgroup = 5) * rBar)) AS LCL
FROM
(
SELECT
avg(sampleMean) as xDoubleBar,
avg(sampleRange) as rBar
FROM
(
SELECT
avg(measurementValue) AS sampleMean,
(subgroupNumber + 1) AS subgroupNumber,
max(measurementValue) - min(measurementValue) as sampleRange
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1 / 5)
AS subgroupNumber
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroupNumber
ORDER BY subgroupNumber ASC
--LIMIT 10 -- limit to last number of samples
) sampleMeans
) doubleBar
```

We now have four queries. Each query represents a separate series on a chart. When you load it into the BI tool, your visualization should look like this:

### Sample Range for R chart

With our Xbar chart out of the way, let’s focus on R chart. We will reuse most of the queries from Xbar chart and just change means to ranges.

```
SELECT
max(measurementValue)-min(measurementValue) as subgroupRange,
(subgroup + 1) AS subgroupNumber
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroup
ORDER BY subgroup ASC
--LIMIT 10 -- limit to last number of samples
```

Here is the plotted result:

### Calculate R-bar

R-bar is an overall mean of all our sample ranges. Let’s add another SELECT statement and take the average of our sample ranges. This will give use one value that we can plot as horizontal line.

```
SELECT
avg(subgroupRange) as rBar
FROM
(
SELECT
max(measurementValue)-min(measurementValue) as subgroupRange,
(subgroup + 1) AS subgroupNumber
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroup
ORDER BY subgroup ASC
--LIMIT 10 -- limit to last number of samples
) ranges
```

### Calculate Upper Control Limit (UCL) for R chart

UCL in R chart is a line above the center line and represents 3x sigma deviation from R-bar. The formula is a little different compare to Xbar:

UCL = D_{4}*R-bar

D_{4} is a constant. We will use our SPC constant reference table to pick D_{4} value from the corresponding sample size and plug it into the formula.

Let’s add another SELECT to the previous query.

```
SELECT
((SELECT d4_range_ucl FROM spc_constants WHERE subgroup = 5) * rBar) as UCL
FROM
(
SELECT
avg(subgroupRange) as rBar
FROM
(
SELECT
max(measurementValue)-min(measurementValue) as subgroupRange,
(subgroup + 1) AS subgroupNumber
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroup
ORDER BY subgroup ASC
--LIMIT 10 -- limit to last number of samples
) ranges
)rbar
```

### Calculate Lower Control Limit (LCL) for R chart

LCL in R chart is a line below the center line and represents 3x sigma deviation from R-bar pointing downwards. Formula for LCL in R chart goes as follows:

LCL = D_{3}*R-bar

```
SELECT
((SELECT d3_range_lcl FROM spc_constants WHERE subgroup = 5) * rBar) as UCL
FROM
(
SELECT
avg(subgroupRange) as rBar
FROM
(
SELECT
max(measurementValue)-min(measurementValue) as subgroupRange,
(subgroup + 1) AS subgroupNumber
FROM (
SELECT
measurementValue,
(row_number() OVER (ORDER BY measurementDate DESC) - 1) / 5 AS subgroup
FROM
measurements
WHERE
partId = 1
AND characteristicId = 1
AND measurementDate BETWEEN '2020-09-09' AND '2020-09-20'
--ORDER BY measurementDate DESC
--LIMIT 50
) subgroups
GROUP BY subgroup
ORDER BY subgroup ASC
--LIMIT 10 -- limit to last number of samples
) ranges
)rbar
```

### Summary

This completes our X-Bar R guide. When you go and plot the queries it should look close to this:

With rather simple SQL code, we were able to develop scripts that allow us to plot X-bar R control charts in your favorite BI tool.

I invite you to check out __KensoBI__ and discover how you can visualize your process in the form of interactive 3D dashboards.