Yaa Tabua Travel arrange a large number of holidays, and in some of these they make administrative mistakes. They are about to change their quality control procedures, and have done some experiments to see how the number of mistakes varies with the number of checks. The following table shows their findings:
Checks
0
1
2
3
4
5
6
7
8
9
10
Mistakes
92
86
81
72
67
59
53
43
32
24
12
(a) Draw a scatter plot of the data; identify the independent variable, x, and the dependent variable, y. [5 Marks]
(b) Determine the least square estimates of the regression constant, a, and the regression coefficient, b. i.e y=a+bx. [4 Marks]
(c) Comment on your estimates in (b) above. [2 Marks]
(d) Calculate the coefficient of determination. [6 Marks]
(e) Using your line in (b), predict how many mistakes Yaa Tabua would expect with 20 checks and comment on its reliability. [4 Marks]
(f) Use your estimate in (d), to comment on the suitability of your estimate in (e) above. [4 Marks]
Answer
(a) Independent variable x: Number of checks (controlled factor). Dependent y: Mistakes (outcome). Scatter plot: Points (0,92), (1,86), …, (10,12) show a strong negative linear trend, decreasing steadily with minor deviations, suggesting more checks reduce mistakes predictably. In banking, similar to plotting errors vs. audit frequency for compliance risk management under BoG directives.
(b) Using least squares: a (intercept) = 95.86, b (slope) = -7.88. Equation: y = 95.86 – 7.88x. To arrive: Minimize sum of squared residuals; formulas b = (nΣxy – ΣxΣy)/(n*Σx² – (Σx)²), a = ȳ – bẋ. With data, Σx=55, Σy=625, Σxy=2280, Σx²=385, n=11 yields b ≈ -7.88, a ≈ 95.86.
(c) a ≈ 96 indicates about 96 mistakes with zero checks, realistic baseline. b ≈ -7.88 means each additional check reduces mistakes by nearly 8, showing effective quality control, akin to how increased verification in treasury operations reduces transaction errors.
(d) Coefficient of determination R² = 0.988. To arrive: R² = 1 – (SS_res / SS_tot), where SS_res = Σ(y – ŷ)² ≈ 206.73, SS_tot = Σ(y – ȳ)² ≈ 5565, so 1 – (206.73/5565) ≈ 0.988. Interprets as 98.8% of variation in mistakes explained by checks.
(e) For x=20: y = 95.86 – 7.88*20 ≈ -61.77. Negative mistakes impossible, so unreliable—extrapolation beyond data range (0-10) risks inaccuracy, as relationship may not hold linearly indefinitely, e.g., mistakes can’t go below zero.
(f) High R²=0.988 suggests strong fit within data, but for x=20 far outside, suitability low due to potential non-linearity or floor effects, emphasizing caution in predictions, similar to extrapolating credit risk models beyond observed data in Ghanaian banking post-2019 cleanup.
QMDM – APR 2023 – L2 – Q3 – Regression Analysis for Mistakes and Checks
Analyze mistakes vs. checks data with scatter plot, regression, determination coefficient, and prediction.
Yaa Tabua Travel arrange a large number of holidays, and in some of these they make administrative mistakes. They are about to change their quality control procedures, and have done some experiments to see how the number of mistakes varies with the number of checks. The following table shows their findings:
(b) Using least squares: a (intercept) = 95.86, b (slope) = -7.88. Equation: y = 95.86 – 7.88x. To arrive: Minimize sum of squared residuals; formulas b = (nΣxy – ΣxΣy)/(n*Σx² – (Σx)²), a = ȳ – bẋ. With data, Σx=55, Σy=625, Σxy=2280, Σx²=385, n=11 yields b ≈ -7.88, a ≈ 95.86.
(c) a ≈ 96 indicates about 96 mistakes with zero checks, realistic baseline. b ≈ -7.88 means each additional check reduces mistakes by nearly 8, showing effective quality control, akin to how increased verification in treasury operations reduces transaction errors.
(d) Coefficient of determination R² = 0.988. To arrive: R² = 1 – (SS_res / SS_tot), where SS_res = Σ(y – ŷ)² ≈ 206.73, SS_tot = Σ(y – ȳ)² ≈ 5565, so 1 – (206.73/5565) ≈ 0.988. Interprets as 98.8% of variation in mistakes explained by checks.
(e) For x=20: y = 95.86 – 7.88*20 ≈ -61.77. Negative mistakes impossible, so unreliable—extrapolation beyond data range (0-10) risks inaccuracy, as relationship may not hold linearly indefinitely, e.g., mistakes can’t go below zero.
(f) High R²=0.988 suggests strong fit within data, but for x=20 far outside, suitability low due to potential non-linearity or floor effects, emphasizing caution in predictions, similar to extrapolating credit risk models beyond observed data in Ghanaian banking post-2019 cleanup.