Case 1:
Independent Samples & Equal Variances
Case 2:
Independent Samples & Unequal Variances
Case 3:
Dependent Samples: Matched/Paired Observations
Case 1:
Assumptions, Problem Description & Data, and
Discussion of the Results.
I. Underlying Assumptions
1. The samples (n1 and n2) from two normal populations are
independent
2. One or both sample sizes are less than 30
3. The appropriate sampling distribution of the test statistic is the t distribution
4. The unknown variances of the two populations are equal
II. Problem Description and
Data
Women who are union members earn $2.50 per hour more
than women who are not union members. (The Wall Street Journal, July 26, 1994).
Suppose independent samples of 15 unionized women and 20 nonunionized women in
manufacturing have been selected and the following hourly wage rates are found (Anderson,
et al., 1998, p. 394).
Union Workers (n1=
15):
| 22.40 | 18.90 | 16.70 | 14.05 | 16.20 | 20.00 | 16.10 | 16.30 | 19.10 | 16.50 |
| 18.50 | 19.80 | 17.00 | 14.30 | 17.20 | |||||
Nonunion Workers (n2= 20):
| 17.60 | 14.40 | 16.60 | 15.00 | 17.65 | 15.00 | 17.55 | 13.30 | 11.20 | 15.90 |
| 19.20 | 11.85 | 16.65 | 15.20 | 15.30 | 17.00 | 15.10 | 14.30 | 13.90 | 14.50 |
Question:
Does there appear to be any difference in the mean wage rate between these
groups?
Data Entry
Note that HOURLY WAGES of female workers in
manufacturing is the variable of interest (X); it is a continuous variable. To enter the
values, double-click on var in column one; this action opens the Define Variable
window. Type wages in the Variable Name box. Open the Type
window and set Decimal Places to two ( i.e.; accept the default value of 2). Then
open the Labels window and type Hourly Wages of Females in the Variable
Label box. Click on Continue option and then Okay to return to the data
entry screen. Next, define a nominal variable member and label it Union
Membership. In the Value box type yes and in the Value
Label box type union. Then click on Add; notice that yes
= "union" appears meaning that the variable "member" will take on
the value union corresponding to the wages of unionized female workers.
Similarly, declare another value no = "nonunion" to identify the wages
of nonunionized female workers. The datasheet
should basically look like this.
Notes:
(1) The assumption of equal population variance Ó2
(= Óu2 = Ónon2)
means that a pooled sample variance Sp2
must be used to compute the standard error of the sample mean difference S*.
(2) Unlike the one sample t-test,
there is no option for setting the test value before executing the procedure using the Command
Sequence stated earlier. SPSS/win automatically tests the null that there is no
difference in the average hourly wages of the two groups of female workers in
manufacturing against the alternative that they differ significantly.
(3) Because the data are quantitative,
the variable Type is automatically set to Numeric.
(4) In the lab, select FILE/PRINT or the
Printer Icon to send your output to the local printer.
III. Discussion of the Results
and Testing Procedure
A. The Outputs/Results
T-Test
| Union Membership | N | Mean | Std. Deviation | Std. Error Mean | |
|---|---|---|---|---|---|
| Hourly Wages of Female Workers | union | 15 | 17.5367 | 2.2403 | .5784 |
| nonunion | 20 | 15.3600 | 1.9885 | .4446 |
| Levene's Test for Equality of Variances | t-test for Equality of Means | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| F | Sig. | t | df | Sig. (2-tailed) | Mean Difference | Std. Error Difference | 95% Confidence Interval of the Difference | |||
| Lower | Upper | |||||||||
| Hourly Wages of Female Workers | Equal variances assumed | .411 | .526 | 3.036 | 33 | .005 | 2.1767 | .7169 | .7180 | 3.6353 |
| Equal variances not assumed | 2.983 | 28.183 | .006 | 2.1767 | .7296 | .6826 | 3.6707 | |||
The first output table contains
summary statistics for the two groups. The second table contains all the statistics that
are needed to perform the test. Notice that SPSS/win actually performs the test under two
alternative assumptions about the population variance -- equal and unequal variance.
Because equal variance is assumed in this example we will use the results in that row for
the ensuing discussion.
B. Testing Procedure
I will summarize the testing procedure under Hypotheses,
Decision Rules, and Conclusions, with the understanding
that the six logical steps presented earlier in the one sample t-test are
carried out mentally in the process. The goal here is to show that there is no difference
between the mean wage of union workers µu
and of nonunion workers µnon
who are women -- where the subscript 'u' denotes Union
and 'non' denotes Nonunion.
Hypotheses
Ho: µu = µnon
or Ho: µu
- µnon = 0 (i.e., there is no
significant difference in the average hourly
wage rates of female workers in the two groups)
Ha: µu
is not equal µnon or Ho:
µu - µnon
is not equal to zero (i.e., there is a
significant
difference in their average hourly wage rates)
Note that this is a t-test because the distribution of hourly wages in the populations of
unionized and nonunionized women is assumed to be normal and also both n1 and n2 are small (i.e., < 30).
Thus, the test statistic X_baru -
X_barnon can only be transformed into t-score
using the equation tov = [(X_baru
- X_barnon) - (µnon - µnon)]/S*
where S*denotes the standard
error of (X_baru - X_barnon).
Because n1 = 15, n2
= 20 it follows that v = n1
+ n2 - 2 = 33, and given alpha = .05, the critical t-value is tcv= t.025,33 = t.025,30=
±2.042 (conservative value by using v = 30 instead of a value larger than
33 -- which is not in the table).
Decision Rules: Reject Ho if |tov | > | tcv
|; Retain Ho if |tov
| < | tcv |
From the "Group Statistics" table the pooled sample variance can be derived as
Sp2
= [(n1 -1) Su2 + (n2-1)Snon2]
/(n1 + n2 + -2) = [14(2.24)2
+ 19(1.99)2]/(15 + 20 - 2) = 4.41. Although SPSS/win does not report
this result, it computes and uses it internally to determine the the value of S* = [Sp2(1/n1
+ 1/n2)]½. From the "Independent
Samples test" table (along the Equal Variances Assumed row) we can
obtain the following summary statistics: X_baru
- X_barnon = 2.1767 is the sample mean
difference, its standard error is S* = .7169,
and tov
= [(X_baru - X_barnon) - 0]/S*
= 3. 036.
Conclusions: Since tov = 3.036 vis-a-vis tcv = 2.042 is in the rejection
region (right tail), the sample data do not provide sufficient evidence to support Ho. Thus, we must reject Ho in favor of Ha. Hence, union membership does make a
significant difference in the average hourly wages of women working in manufacturing. On
average, unionized women make $2.18 more per hour than nonunionized women because the
observed mean wage of X_baru
= $17.54 per hour for unionized women vis-a-vis the sample
mean wage of X_barnon
= $15.36 for the nonunionized women is not a chance
outcome; the difference is indeed real.
Extension: A One-Tailed Test: Suppose the
goal of the test was to validate the claim that average hourly wages of unionized women in
manufacturing are generally higher than those of their nonunionized counterparts. In
this case, both the Ho and Ha
would be stated as follows:
Ho: µu
is less than or equal µnon or Ho: µu - µnon is at the most zero
Ha: µu
> µnon or Ha: µu - µnon > 0
Because this is a one-tailed test (to the right of the sampling and/or 't'
distribution), alpha = .05 must not be divided by 2 when determining the value of the tcv; indeed tcv = t.05, 30 = 1.697. However, the value of tov
= 3.036 is still valid for the test. Since
tov = 3.036
vis-a-vis tcv = 1.697 is in the
rejection region (right tail), the sample data do not provide sufficient evidence to
support Ho. Hence, we will have to conclude that unionized women working
in manufacturing make more in hourly wages, on average, than their nonunuionized
counterparts.
Case 2:
Assumptions, Problem Description & Data, and
Discussion of the Results.
I. Underlying
Assumptions
1. The samples (n1 and n2) from two normal populations are independent
2. One or both sample sizes are less than 30
3. The appropriate sampling distribution of the test statistic is the t distribution
4. The unknown variances of the two populations are not equal
II. Problem Description and
Data
Starting annual salary for individuals entering the
public accounting and financial planning professions were presented in Fortune,
June 26, 1995. The starting salaries for a sample of 12 public accountants and 14
financial planners are below. Data are in thousands of dollars (ASW, 1998, p. 403).
Public Accountant (n1 = 12)
| 30.6 | 31.2 | 28.9 | 35.2 | 25.1 | 33.2 | 31.3 | 35.3 | 31.0 | 30.1 | 29.9 | 24.4 |
Financial Planner (n2 = 14)
| 31.6 | 26.6 | 25.5 | 25.0 | 25.9 | 32.9 | 26.9 | 25.8 | 27.5 | 29.6 | 23.9 | 26.9 | 24.4 | 25.5 |
Question:
Using alpha = .05, test for any difference between
the population mean starting annual salaries for the two professions. What is your
conclusion?
Data Entry
Note that the starting annual SALARY for persons
entering public accounting and financial planning professions is the variable of interest
(X); it is a continuous variable. To enter the values, double-click on var in
column one; this action opens the Define Variable window. Type salary
in the Variable Name box. Open the Type window and set Decimal
Places to one ( since there is only one trailing decimal point for all the data). Then
open the Labels window and type Starting Salary of Public Accountants &
Financial Planners in the Variable Label box. Click on Continue option
and then Okay to return to the data entry screen. Next, define a nominal variable
person and label it Profession of an Individual. In the Value
box type pa and in the Value Label box type public
accountant. Then click on Add; notice that pa = "public
accountant" appears meaning that the variable "person" will take on
the value pa corresponding to the salary of those individuals in public
accounting profession. Similarly, declare another value fp = "financial
planner" to identify the salary of those individuals in financial planning
profession. The datasheet should basically look
like this.
Notes:
(1) The assumption of unequal population variance (i.e., Ópa2
is not equal to Ófp2) means that both
sample variances -- Spa2 and Sfp2
-- must be used to compute standard error of the sample mean difference S*.
(2) Unlike the one sample t-test,
there is no option for setting the test value before executing the procedure using the Command
Sequence stated earlier. SPSS/win automatically tests the null that there is no
difference in the average salary of public accountants and financial planners against the
alternative of a significant difference.
(3) Because the data are quantitative,
the variable Type is automatically set to Numeric.
(4) In the lab, select FILE/PRINT or the
Printer Icon to send your output to the local printer.
III. Discussion of the Results
and Testing Procedure
A. The Outputs/Results
| profession | N | Mean | Std. Deviation | Std. Error Mean | |
|---|---|---|---|---|---|
| Starting Salary of Public Accountant & Financial Planner | Public Accountant | 12 | 30.517 | 3.347 | .966 |
| Financial Planner | 14 | 27.000 | 2.641 | .706 |
| Levene's Test for Equality of Variances | t-test for Equality of Means | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| F | Sig. | t | df | Sig. (2-tailed) | Mean Difference | Std. Error Difference | 95% Confidence Interval of the Difference | |||
| Lower | Upper | |||||||||
| Starting Salary of Public Accountant & Financial Planner | Equal variances assumed | .292 | .594 | 2.994 | 24 | .006 | 3.517 | 1.175 | 1.093 | 5.941 |
| Equal variances not assumed | 2.939 | 20.848 | .008 | 3.517 | 1.197 | 1.027 | 6.006 | |||
The first output table contains
summary statistics for the two groups. The second table contains all the statistics that
are needed to perform the test. As in the Case 1 above, SPSS/win actually performs the
test under two alternative assumptions about the population variance -- equal and unequal
variance. Because unequal variance is assumed in this example we will use the
results in that row for the ensuing discussion.
B. Testing Procedure
Again, I will summarize the testing procedure
under Hypotheses, Decision Rules, and Conclusions,
with the understanding that the six logical steps presented earlier in the one sample t-test are
carried out mentally in the process. The goal here is to determine whether the average
annual starting salary in the public accounting profession µpa
differs significantly from the average annual starting salary in the financial planning
profession µfp -- where
the subscript 'pa' denotes Public Accounting and 'fp'
denotes Financial Planning professions, respectively.
Hypotheses
Ho: µpa = µfp
or Ho: µpa
- µfp = 0 (i.e., there is no
significant difference in the average annual
starting salary in the two professions)
Ha: µpa
is not equal µfp or Ho:
µpa - µfp
is not equal to zero (i.e., a significant
difference does
exist)
Note that this is a t-test because the distribution of salaries in the populations of
public accounting and financial planning professions is assumed to be normal and also both
n1 and n2 are small (i.e., < 30).
Thus, the test statistic X_barpa
- X_barfp can only be transformed into t-score
using the equation tov = [(X_barpa
- X_barfp) - (µpa - µfp)]/S*
where S*denotes the standard
error of (X_baru - X_barnon).
Because Ópa2
is not equal to Ófp2,
it follows that v = n1 + n2
- 2 = 12 + 14 - 2 = 24 is not applicable for determining the critical
t-value tcv.
Indeed, the rule for determining the value of v
yields a value that is smaller than 24. This rule, which can be found in almost any
introductory statistics text, is built into SPSS/win and yields the value v = 20.848 or 21. Thus, given alpha = .05, the tcv=
t.025,21 = ±2.080.
Decision Rules: Reject Ho if |tov | > | tcv
|; Retain Ho if |tov
| < | tcv |
From the "Group Statistics" table the sample variance of the public accountants'
salaries is given as Spa2
= 3.35, and that of the financial planners' salaries is Sfp2
= 2.64. These values are used to
determine the value of S* = [Spa2/n1
+ Sfp2/n2)]½. From the
"Independent Samples test" table (along the Equal Variances Not
Assumed row) we can obtain the following summary statistics: X_barpa - X_barfp
= 3.517 or $3517 is
the sample mean difference, its standard error is S*
=1.197, and tov = [(X_barpa -
X_barfp) - 0]/S* = 2.939.
Conclusions: Since tov = 2.939 vis-a-vis tcv = 2.080 is in the rejection region (right
tail), the sample data do not provide sufficient evidence to support Ho. Instead, evidence appears to support
the existence of a significant difference in the average salary of the two groups of
professionals. Thus, it can be concluded that the average annual staring salary in public
accounting profession is generally higher by $3517
(point estimate of the difference between µpa and µfp)
than the average annual starting salary in financial planning profession.
Extension:Interval Estimation: The value $3517 is a point estimate of the difference between µpa and µfp. From the output, we can
also obtain a 95% confidence interval estimate of the difference between the two
population means. Note that the rule for doing so is given as (X_barpa - X_barfp)
± tcv . S*. The lower
limit value is reported as 1.027 and the upper limit value is
reported as 6.006. Thus, we can conclude with a 95% confidence that the
true mean difference in the annual starting salary in the two professions is between or
$1027 and $6006.
Case 3:
Assumptions, Problem Description & Data, and
Discussion of the Results.
I. Underlying Assumptions
This case considers a research
situation in which two samples are not independent. This situation occurs when each
individual observation (i) within a sample is related (matched or paired) to an individual
observation in the second sample. The relatedness may be the result of the individual
observations in the two samples
1. representing before and after results (which is presented in this example),
2. having matching characteristics,
3. being matched by location, or
4. being matched by time.
If there are definite reasons for pairing (or matching) the
individual observations in the two samples, the two samples are dependent rather than
being independent. Generally, the precision from an analysis of dependent samples is
greater than that from the analysis of independent samples. Thus, if paired analysis is
appropriate, it is the preferred approach.
II. Problem Description and
Data
Figure Perfect, Inc., is a
women's figure salon that specializes in weight reduction programs. Weights for a sample
of clients before and after a six-week introductory program are shown below (ASW, p. 419).
Client |
Before | After |
| 1 | 140 | 132 |
| 2 | 160 | 158 |
| 3 | 210 | 195 |
| 4 | 148 | 152 |
| 5 | 190 | 180 |
| 6 | 170 | 164 |
Question:
Using alpha = .05, test to determine whether the introductory program provides
a
statistically significant weight loss. What is
your conclusion?
Data Entry
Note that the Weight of the
clients is actually the variable of interest (X); it is a continuous variable. However, to
determine whether there is a significant reduction in the weight of the participants after
joining the program we will have to declare two pseudo variables, viz., Xbefore,
and Xafter for each client. To enter the before
weight values, double-click on var in column one; this action opens the Define
Variable window. Type Xbefore in the Variable Name box.
Next, open the Type window and set Decimal Places to zero ( i.e.; type 0 to
replace the default value of 2). Finally, open the Labels window and type Client's
Weight Before in the Variable Label box. Click on Continue option and
then Okay to return to the data entry screen. Repeat similar steps to define the
Xafter
weight values; use Client's Weight After as the variable label. The datasheet should basically look like this.
Notes:
(1) The key to the analysis of the matched/paired design
sample design is to realize that we consider only the column of differences di
where di = Xafter,i - Xbefore,i for
each client. If defined as di = Xbefore,i - Xafter,i, the reported sample mean of the difference will not be preceded by a
negative sign.
(2) Unlike the one sample t-test,
there is no option for setting the test value before executing the procedure using the Command
Sequence stated earlier. SPSS/win automatically tests the null that the mean of the
difference µd = µafer - µbefore
in the average difference in the weight of the population of program
participants is zero, equal to or less than zero, or greater than or equal to
zero (i.e., Ho: µd = 0, = or <
0, = or > 0)
against Ha that it differs significantly from zero, significantly
greater than 0, or significantly less than 0 (i.e., Ho: µd = 0,
or > 0, or < 0).
(3) Because the data are quantitative,
the variable Type is automatically set to Numeric.
(4) In the lab, select FILE/PRINT or the
Printer Icon to send your output to the local printer.
III. Discussion of the Results
and Testing Procedure
A. The Outputs/Results
| Mean | N | Std. Deviation | Std. Error Mean | ||
|---|---|---|---|---|---|
| Pair 1 | Client's Weight After | 163.50 | 6 | 22.00 | 8.98 |
| Client's Weight Before | 169.67 | 6 | 26.39 | 10.78 | |
| N | Correlation | Sig. | ||
|---|---|---|---|---|
| Pair 1 | Client's Weight After & Client's Weight Before | 6 | .979 | .001 |
| Paired Differences | t | df | Sig. (2-tailed) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | Std. Deviation | Std. Error Mean | 95% Confidence Interval of the Difference | ||||||
| Lower | Upper | ||||||||
| Pair 1 | Client's Weight After - Client's Weight Before | -6.17 | 6.59 | 2.69 | -13.08 | .74 | -2.294 | 5 | .070 |
The first output table contains
summary statistics for the paired samples. The second table reports the Pearson
correlation coefficient for weight before and after the program. The value of .979
suggests the existence of a strong positive association between the clients weight before
and after the program; the p-value of .001 indicates that the observed association is
statistically significant at 5% level. The third table contains all the statistics that
are needed to perform the test.
B. Testing Procedure
Again, I will summarize the testing procedure
under Hypotheses, Decision Rules, and Conclusions,
with the understanding that the six logical steps presented earlier in the one sample t-test are
carried out mentally in the process.
Hypotheses
Ho: µd = 0 or > 0 (i.e.,
the average weight after is the same as the average weight before, if anything,
the weight after is greater than the weight before)
Ha : µd < 0 (i.e., the average weight after is strictly
less than the average weight before the introduction
of the program)
Note that this is a t-test because the distribution of weight in the population of
all possible clients
is assumed to be normal; also the sample size 'n' is small (i.e., n < 30). Thus, the test statistic d-bar can only be transformed into t-score using
the equation tov = (d-bar - µd)/Sd-bar where Sd-bar denotes the standard error of d-bar.
Because n = 6 it follows that v = n -1 = 5, and given alpha
= .05, the critical t-value is tcv=
t.05,5 = ±2.015.
Decision Rules: Reject Ho if |tov | > | tcv |;
Retain Ho if |tov |
< | tcv |
From the "Paired Samples test" table we can obtain the following summary
statistics: the mean weight
loss after introduction of the program is d-bar = -6.17
(negative sign because the difference di is defined as di =
Xafter,i - Xbefore,i),
the sample standard deviation of the difference di is Sd
= 6.59; the
standard error of d-bar is Sd-bar = Sd/n½ = 2.69, and tov = (d-bar - µd)/Sd-bar = -2.294.
Conclusions: Since, in absolute
terms, tov = -2.294 is
greater than tcv= -2.015 (it is in the rejection
region of the left tail), the sample of data does not provide sufficient evidence to retain Ho. Hence, it can be concluded that
the new program does provide weight loss.
Extension: Interval Estimation: From the
output, we can also obtain a 95% confidence interval estimate of the difference between
the two population (technically) means (µafter
- µbefore)
where µafter
and µbefore
are the average weight of all clients after and before joining the programs, respectively.
Note that the rule for doing so is given as d-bar ± tcv
. Sd-bar.
The lower limit is reported as -13.08 and the upper limit is
reported as .74. Their interpretation is as follows. Letting µafter
- µbefore
= -13.08 implies that µafter
= µbefore
- 13.08 which means that an individual can lose as much as 13.08Ibs,on average, after joining the
program. Similarly, µafter
- µbefore
= .74 implies that µafter
= µbefore
+ .74 which means that at the very worst an individual may not gain more than .74Ibs,
on average, after
joining the program.
Top or Return to Hypothesis Testing or Learning Statistics with SPSS/win
or Home Page or Send
me your Comments.
Copyright© 1996, Ebenge Usip, all rights reserved.
Last revised: Sunday, August 05, 2001.