National Evaluation of Welfare-to-Work Strategies: 2-Year Full Impact Sample Files: Calibrating Scores on Two Tests of Adult Literacy


Calibrating Scores on Two Tests of Adult Literacy:
An Equating Study of the Test of
Adult Literacy Skills (TALS) Document Test and
the Comprehensive Adult Student Assessment System (CASAS)
GAIN Appraisal Reading Test (Form 2)

A report prepared for the Manpower Demonstration Research Corporation

by: Walter Haney
Larry Ludlow
Anastasia Raczek
Sonia Stryker,
and Ann Jones

Boston College
Program in Educational Research, Measurement and Evaluation
Campion Hall
Chestnut Hill, MA 02167

October 1994

[revised October 1996]

I.	Introduction
Amid widespread concern over the learning and skills of American workers,
the education of adults has been receiving increased attention in
national education policy.  In 1985, California established its Greater
Avenues for Independence (GAIN) Program which emphasized mandatory
participation in basic education for weflare recipients who were
consided to need it. Similar emphasis on education for welfare recipients
was embodied in the federal Family Support Act of 1988 and the Job
Opportunities and Basic Skills Training (JOBS) program established under
that Act.  And on Mach 31, 1994, President Clinton signed the Educate
America Act, which for the first time establishes national education
goals as a part of federal law.The sixth of these eight national
education goals is that by the year 2000 "every adult Amercian will be
literate and will possess the knowledge and skills necessary to compete
in a global economy and exercise the rights and responsibilities of

Such aspirations for a literate citizenry and workforce, and programs
aimed at helping attain them, have prompted renewed attention to the
problem of how to measure the literacy of adults.  As several different
test  of "adult literacy" are available and are used in connection with
various adult education programs, questions quickly arise about the
comparability of scores on the different tests of adult literacy.

The purpose of this report is to present the results of a study of two
relatively new tests of adult literacy:namely the Test of Adult Literacy
Skills (TALS) Document Test (Form B) and the Comprehensive Adult Student
Assessment System (CASAS) GAIN Appraisal Reading Test (Form 2), both used
in conjunction with a national evaluation of the Job Opportunities and
Basic Skills Training (JOBS) program.  The specific purpose of the study
was to

Calibrating Scores, 10/94, p. 2

equate the scores on both tests for a sample of GAIN registrants from
Riverside, California, using a variety of traditional and item response
theory equating methods.

In this introduction we provide some background on:

*	the JOBS program and its evaluation;

*	the two tests whose comparability we are investigating; and

*	the art of test equating.
The JOBS Program and Its Evaluation

The JOBS program began operation in 1989.  Its aim is to increase the
literacy, self-sufficiency and employment prospects of people receiving
Aid to Families with Dependent Children (AFDC), the nation's largest cash
welfare program, supported with federal and state funds.  Within AFDC,
single parents are enrolled in the family group (AFDC-FG) program, while
two-parent families are enrolled in the AFDC-U (Unemployed Parent) program.
Under the Family Support Act of 1988, the federal government defines
parameters and expectations for state JOBS programs; the states determine
the sequence and content of program services and decide how to target
various parts of the AFDC population.  JOBS program activities may include
adult education (adult basic education, preparation for taking the high
school equivalency General Educational Development or GED test, and
English instruction for speakers of other languages), post-secondary
education, jobs skills training, job search workshops, on-the-job training,
and unpaid work experience.  Based on the Family Support Act's premise
that welfare involves an obligation on the part of those who receive income
and services, it requires all AFDC recipients whose children are at least
three years of age (age one at state option) to participate in JOBS to the
extent that resources permit, and AFDC payments

Calibrating Scores, 10/94, p. 3.

are to be reduced for people who do not cooperate with their assigned
program activities (exemptions may be granted to recipients who meet
specified criteria). Thus the JOBS program is broad in its coverage and
in its emphasis on developing human resources and employment prospects
welfare recipients.

To analyze the effectiveness of the JOBS welfare-to-work program, a
longitudinal evaluation is being conducted by the Manpower Demonstration
Research Corporation (MDRC) in several communities, usingrandom assignment
of welfare recipients to treatment and control conditions.  The study
is being funded by the U.S. Department of Health and Human Services (HHS)
and the U.S. Department of Education.  The study of one of the communities
in the evaluation, Riverside, California, is also being funded by the
California Department of Social Services (which in turn received funding
from the California Department of Education, the California State Job
Training Coordinating Council, HHS, and the Ford Foundation)

A key issue in the JOBS evaluation is whether JOBS has different impacts
on different types of welfare recipients.  A particularly important question
is whether the education, training and job search activities in
JOBS may have different effects on people who enter JOBS with lower literacy
than on people who enter with higher literacy.  To examine this hypothesis
regarding the impacts of JOBS on "subgroups" with high and low literacy,
it is necessary to measure the literacy of people in the evaluation before
they are randomly assigned. This has been done in four JOBS evaluation sites;
however, two different tests were used in these sites, making it difficult
to define literacy subgroups using a consistent measure. (Different tests
were used in different sites because in California and Oregon, state

Calibrating Scores. 10/94, p. 4.

require the use of CASAS tests; the TALS Document Literacy test was selected
for use in other evaluation sites by MDRC and the federal agencies funding
the evaluation.) Therefore, in order to examine one of the prime questions
about the impact of JOBS, it is necessary to understand the relationship
between welfare recipients' scores on the TALS and CASAS tests that they
took when they entered the JOBS program. An additional reason for
seeking to understand the equivalence of CASAS and TALS test scores is
to facilitate comparison of test-takers at different JOBS evaluation sites.
Finally, from a methodological perspective, we were interested in how
well a variety of both traditional and item response theory methods of test
equating worked to calibrate scores on these two tests of adult literacy.
For these reasons, the MDRC asked us to undertake a study of the
comparability of scores on these two measures of adult literacy.  This
report is a product of our inquiry.

The Two Tests

The two tests that are the focus of our inquiry are, as mentioned, the
CASAS GAIN Appraisal Reading Test (form 2) and the TALS Document Literacy
Test. Hence it is useful here to provide a brief introduction to
these instruments. More detail on the psychometric properties of these
instruments will be presented in chapter 2 of this report.

The CASAS GAIN Appraisal Reading Test (form 2) was developed by a California organization known
as the Comprehensive Adult Student Assessment System or CASAS. CASAS began
as a consortium of education providers in California in 1980 with the aim of
developing assessments with a functional as opposed to academic focus. In
addition to tests used in adult education programs, the CASAS system
encompasses a list of competencies developed from the recommendations of
adult basic education and English as

Calibrating Scores, 10/94, p. 5

a second language educational program staff in the CASAS consortium, as
well as a corresponding curriculum index. In this report, however, we
focus on the CASAS reading test used in connection with the evaluation of
JOBS, namely, the CASAS GAIN Appraisal Reading Test (form 2).

CASAS tests are constructed from [an] item bank of more than 5,000 test
items. Each test item has an established difficulty level based on extensive
field testing and analysis. The psychometric theory used to establish this
difficulty level is Item Response Theory (IRT) through which each item
is assigned a difficulty level on a common scale….CASAS tests are developed
to have established difficulty levels primarily for learners at or below
high school graduation level. .. . The basic and functional context of CASAS
test items includes applied reading, math and listening in a variety of
situations. Most CASAS tests are group administered and untimed, but
generally take approximately 30 to 40 minutes to administer. (Stiles,
Rickard, Kharde, Posey & Martois, 1990, p. 5)

In some JOBS sites, both CASAS reading and math tests are used to determine
whether AFDC registrants are lacking in basic skills and thus are in need
of education services. For example in the GAIN program in California (GAIN
became California's version of JOBS after passage of the JOBS legislation in

Registrants who lack a high school diploma or a GED, score below 215 on
either the reading or mathematics basic skills test or are not proficient
in English are determined by GAIN regulations to be in need of basic
education. ...A score lower than 215 on the reading or mathematics test
is a criterion used by the GAIN program to determine that individuals are
in need of basic education. According to CASAS, those who function below
215 are at low literacy levels and have difficulty pursuing programs or
jobs other than those that require only minimal literacy skills. (Martinson
& Friedlander, 1994, p. 8)

Though both the CASAS reading and math tests are used in some JOBS sites
(for example throughout California in the GAIN program), our study focuses
on the CASAS reading test titled the "GAIN Appraisal Reading Test

Calibrating Scores, 10/94, p. 6.

(Form 2)" (hereafter we refer to this instrument as the CASAS Reading test.)1
This 15-page test contains 30 multiple-choice items and takes 30 minutes to
administer.  Questions on the test require test-takers to answer questions
related to filling out a job application and an employee injury report;
interpreting a graph, a portion of an employee handbook, a performance
appraisal form and a picture of filing cabinets; applying for a social
security card number; and reading job ads, a table of contents, a work
experience record, and articles about job promotion and income tax.
According to the CASAS content specifications for this instrument, 23 of the
thirty items with "employment" life skills competencies, three deal with
"government and law", two with "consumer economics" and two with "community
resources" life skills.2

The TALS Document Literacy Test was developed by the Educational Testing
Service (ETS). The TALS tests grew out of the 1986 survey of young adult
literacy conducted by ETS as part of the National Assessment of Educational
Progress. More recently TALS instruments have also been used in a national
study funded by the U.S. Department of Labor (Kirsch & Jungeblut, 1992) and
in the National Adult Literacy Survey (Kolstad, 1993).  Like the CASAS tests,
the TALS instruments have been developed using Item Response Theory, and
focus on a competency-based approach to assessment, rather than a grade-level
classification approach that had previously been used in many adult literacy
instruments. ETS tests of adult literacy have
1We should note that this instrument has previously been referred to also as
the "GAIN 2 Appraisal Program Reading" test, but after checking with people
at MDRC and at CASAS, we have been told that "GAIN Appraisal Reading Test
(Form 2)" is the most appropriate full name for this instrument.
2 CASAS (n.d.), "Test Content by Item -Gain Reading and Math Appraisal,
Form 2.

Calibrating Scores, 10/94, p. 7.

covered three different aspects of literacy, namely those related to the
reading and interpretation of prose, ofdocuments, and of text containing
quantitative information.  However, the TALS test used in the MDRC
evaluation of the JOBS program covered only document literacy.

The specific instrument employed is titled the "ETS Tests of Applied Literacy
Skills Form B Document Literacy" (hereafter referred to as the TALS Document
test).  This 15-page test has two parts, each taking 20 minutes to administer,
with 14 items in part 1 and 12 items in part 2, for a total of 26 items
administered in a total of 40 minutes.  The tasks embodied in the TALS
Document test involve reading and interpreting graphs, filling out a
savings bank withdrawal form, interpreting a page of telephone billing
information, and reading a map of a shuttle bus route.

In sum, the CASAS Reading test and the TALS Document test have many
similarities. Both were developed using IRT (which will be discussed more
fully in section 5), employ a competency-based approach to assessment,
involve similar kinds of " real-life" as opposed to academic tasks,
take approximately the same time to administer (30 versus 40 minutes),
and are of about the same length (each is 15 pages, but with the CASAS
test having 30 items and the TALS 26).  At the same time, these two
instruments have significant differences. While the CASAS Reading test is
entirely multiple- choice in format, the TALS Document test is entirely
short answer fill-in in format. Also, while the CASAS test was created to
be used mainly with adults functioning below the level of high school
completion, the TALS test was developed to be used with a broader
population of adults.  As we explain in the next section, these differences,
though seemingly minor, have important

Calibrating Scores, 10/94, p.8

Consequences for an effort such as ours to equate scores on the two

Calibrating Scores, 10/94, p. 85

One Best Calibration Table?

Given these results, is it possible to derive one best calibration table
showing the relationship between CASAS raw scores and TALS scaled scores?
Our answer to this question is an equivocal yes

The answer is equivocal for the simple reason that before deciding upon the
merits of one equating strategy versus others, one must consider not just
the abstract characteristics of statistical distributions such as those
discussed above, but the purpose of equating.  Our understanding of MDRC's
interest in the results of this equating study is that results are to be
used not to make decisions about individual examinees, but instead to
allow comparisons among groups of examinees tested with the two instruments
in different JOBS sites.  Given this purpose, it obviously would have been
preferable to have equating data from more than a single JOBS site,
but our comparison of the distribution of CASAS scores in the equating
study sample with scores of broader GAIN/JOBS samples in California at least
suggests the plausibility of generalizing our results to broader populations
of examinees in California.

Therefore we proceeded to construct a single calibration table as follows.
Each of the equating strategies employed has some strengths.  For some
purposes one approach might be preferred over others.  However, for
the broad analytical purposes that MDRC apparently has in mind for the
results of this study, we think that greater weight ought to be placed on
the convergence of results across the four equating methods employed.

Calibrating Scores, 10/94, p. 87.

7 SITES Riverside

  SCORE        SCORE       SCORE          LEVEL                  SCORE
                        (NEAREST 10)
     .          .            .             .                       .
   174          1           120      LEVEL 1:120-225    1:LOW LITERACY SCORE
   182          2           120      LEVEL 1:120-225    1:LOW LITERACY SCORE
   187          3           130      LEVEL 1:120-225    1:LOW LITERACY SCORE
   191          4           140      LEVEL 1:120-225    1:LOW LITERACY SCORE
   193          5           160      LEVEL 1:120-225    1:LOW LITERACY SCORE
   194          5           160      LEVEL 1:120-225    1:LOW LITERACY SCORE
   196          6           160      LEVEL 1:120-225    1:LOW LITERACY SCORE
   198          7           170      LEVEL 1:120-225    1:LOW LITERACY SCORE
   199          7           170      LEVEL 1:120-225    1:LOW LITERACY SCORE
   201          8           180      LEVEL 1:120-225    1:LOW LITERACY SCORE
   203          9           180      LEVEL 1:120-225    1:LOW LITERACY SCORE
   205         10           190      LEVEL 1:120-225    1:LOW LITERACY SCORE
   207         11           200      LEVEL 1:120-225    1:LOW LITERACY SCORE
   209         12           210      LEVEL 1:120-225    1:LOW LITERACY SCORE
   210         13           210      LEVEL 1:120-225    1:LOW LITERACY SCORE
   211         13           210      LEVEL 1:120-225    1:LOW LITERACY SCORE
   212         14           220      LEVEL 1:120-225    1:LOW LITERACY SCORE
   214         15           230      LEVEL 2:226-275    1:LOW LITERACY SCORE
   216         16           230      LEVEL 2:226-275                       0
   217         17           230      LEVEL 2:226-275                       0
   218         17           230      LEVEL 2:226-275                       0
   219         18           240      LEVEL 2:226-275                       0
   220         18           240      LEVEL 2:226-275                       0
   221         19           250      LEVEL 2:226-275                       0
   223         20           260      LEVEL 2:226-275                       0
   225         21           260      LEVEL 2:226-275                       0
   227         22           270      LEVEL 2:226-275                       0
   229         23           270      LEVEL 2:226-275                       0
   231         24           280      LEVEL 3:276-325                       0
   232         24           280      LEVEL 3:276-325                       0
   234         25           290      LEVEL 3:276-325                       0
   237         26           300      LEVEL 3:276-325                       0
   240         27           310      LEVEL 3:276-325                       0
   241         27           310      LEVEL 3:276-325                       0
   245         28           330      LEVEL 4:326-375                       0
   246         28           330      LEVEL 4:326-375                       0
   253         29           350      LEVEL 4:326-375                       0
   254         30           370      LEVEL 4:326-375                       0

*Denotes two cases for which analytical and judgmental summaries yielded
different TALS scaled scores.

Calibrating Scores, 10/94 p. 87.

figure 6.1 and tried to construct a final table calibrating CASAS raw scores with TALS scaled scores.
One adopted an analytical approach, calculating the mean of results across
the four equating methods, for each CASAS raw score to the nearest 1.0, 5.0
and 10.0 points on the TALS scaled score scale.  The other member of
our team adopted a judgemental approach.  Starting with the presumption that
a final equating table ought to include values of TALS scores that are
actually reported (that is, only 10's), and with the observation that the
Rasch results tend to yield results that were too hight at the lower end of
the scale, this analyst derived results shown in Table 6.4.

	Despite the differences in these two independent approaches to summarizing
our four different methods of equating CASAS and TALS (when the analytical
summary results are rounded to the nearest 10), the results shown in
Table 6.4 are remarkably similar.  For 31 possible CASAS raw scores
(0-30), the two approaches yield identical TALS scaled scores (assuming
rounding to the nearest 10) in all but two cases.  The two differences
occurred for CASAS raw scores of five and eight.  This outcome is an
indirect reflection of the point made previously, namely that the small
number of persons in the equating study sample scoring in the low end
of CASAS raw score scale makes equating results highly sensitive to
the assumptions implicit in the different equating methods...
and hence to assumptions made about the merits of the different
equating methods.

	This leads us to three concluding points.  First, results shown in
Table 6.4 for the top two thirds of the CASAS raw score scale (above
raw scores of 12) are surely much more trustworthy than results for lower
CASAS raw scores.  Second, the fact that two of the authors -- who had
been working together for several months on this study – came up with
slightly different summaries of four different sets of equating results,
amply illustrates the role

Calibrating Scores, 10/94, p. 88.

of qualitative judgment as opposed to simple quantitative analysis in the
art of test equating.  And finally, this result clearly indicates why this
inquiry and hence the title of the report --despite frequent reference to
methods of test equating --is best thought of as an exercise in test
calibration.  In its general meaning, calibration means graduation of
a gauge while making allowances for irregularities.  Making allowance
for irregularities can never be reduced completely to rules. It requires
considerable judgment.  And our judgment is that while the CASAS and
TALS tests can be reasonably well calibrated, they cannot be directly