Frank M. Gresham, University of California-Riverside
Learning Disabilities Summit: Building a Foundation for the Future White Papers
This paper is available in alternative formats: | Download Word | Download pdf |
Pages: | 1 | 2 | 3 | 4 | 5 | 6 |
The notion of alternative responsiveness to intervention is not a new concept in the field of education and psychology. In his presidential address to the American Psychological Association, Cronbach (1957) called for the integration of correlational and experimental disciplines of scientific psychology by using the concept of aptitude H treatment interactions (ATIs). ATI research focuses on the measurement of valid aptitudes (characteristics or traits) and how these aptitudes interact with various treatments (instructional methods or types of therapy). ATI research originally attempted to provide a hybrid science spliced from the study of individual differences (aptitudes) and experimental psychology (treatments). Interactions occur when treatments or instructional methods have different effects on persons known to differ in measured aptitudes or characteristics.
Cronbach and Snow (1977) defined an aptitude as any characteristic of a person that predicts the probability of success under a particular treatment condition. These characteristics or aptitudes theoretically can be anything ranging from test-derived aptitudes (verbal-spatial, fluid-crystallized, field dependent-independent) to physical variables (right versus left hemispheric functioning, temporal versus frontal lobe damage). Treatments are defined as any manipulable variable such as instructional method, type of psychotherapy, classroom climate, and so on.
The fundamental logic of ATIs is the matching of instructional treatments to aptitudes. The basic rationale for this matching is based on the belief that learners having strengths in some aptitudes will respond better to treatments capitalizing on these aptitude strengths. Whereas Cronbach and Snow (1977) suggested that aptitudes and treatments could be matched in several ways (capitalization, compensation, and remediation), most ATI matching studies have been based on capitalization, which adapts instruction to the abilities of the student. For example, students high in verbal comprehension might be expected to learn more under verbal instruction rather than visual instruction.
At its most basic level, an ATI study must have at least two aptitudes and two treatments and thus four data points. For example, one could use scores from the Wechsler Intelligence Scale for Children III (WISC-III) Verbal (Verbal Comprehension) and Performance (Perceptual Organization) scales to define Verbal and Visual learners, respectively. These scores would represent two aptitudes. One could also use phonics and whole-word approaches to reading instruction to define two treatments. To demonstrate an ATI, one could show that Verbal learners respond better to phonics instruction than Visual learners and Visual learners respond better to whole-word instruction than Verbal learners. This example is a disordinal ATI and this logic is employed most frequently by school psychologists and special educators to make instructional recommendations based on cognitive ability or aptitude measures (Gresham & Witt, 1997; Reschly & Ysseldyke, 1995). In an ordinal ATI, there is a larger effect on one treatment for one aptitude, but no differences between the two aptitude groups for the other treatment. For instance, phonics may be more effective for Verbal learners with no differences between Verbal and Visual learners using the whole-word treatment.
From a logical perspective, we have every reason to expect that many ATIs exist in teaching students with LD. Ostensibly, "verbal" learners should learn more efficiently and effectively under verbal instruction and "visual" learners should learn more efficiently and effectively under visual instruction. Unfortunately, there is little empirical support for the differential prescription of treatments based on different abilities or aptitudes like these and others found in the literature. This lack of support continues to surprise many professionals who interpret test results and recommend treatments based on the presumption of largely mythical ATIs (Gresham & Witt, 1997; Reschly & Ysseldyke, 1995).
A comprehensive review of the ATI research literature is far beyond the scope of the current paper; however, a number of reviews of this literature support the unfeasibility of matching aptitudes to treatments for children with LD or other learning difficulties. Comprehensive reviews of the modality matching literature (Arter & Jenkins, 1979; Kavale & Forness, 1987, 1995; Ysseldyke & Mirkin, 1982) fail to consistently show significant ATIs. Studies and reviews conducted within the cognitive style/processing literature fail to consistently demonstrate ATIs (Ayres & Cooley, 1986; Ayres, Cooley, & Severson, 1988; Das, 1995; Das, Naglieri, & Kirby, 1995; Good, Vollmer, Creek, Katz, & Chowdhri, 1993).
Finally, the use of a neuropsychological model within ATI research focuses on inferred brain strengths or functioning. For instance, a child having left hemispheric strength might be presumed to learn more efficiently using methods that capitalize on this strength (e.g., phonics, verbally presented material) whereas children with right hemispheric strengths might perform better using other methods (e.g., holistic, visually presented material). Despite the proliferation of this ATI logic in the neuropsychological literature (see D'Amato, Rothlisberg, & Work, 1999; Hynd, 1989; Reynolds & Fletcher-Jantzen, 1989), I was unable to locate a single, methodologically sound empirical study demonstrating a significant ATI based on neuropsychological assessment, interpretation, and treatment with children having mild learning problems. In fact, reviews by Reschly and Gresham (1989) and Teeter (1987, 1989) question the entire enterprise of applying ATI logic in neuropsychological assessment practices to children with mild learning problems.
Considering the disappointing results of ATI studies using modality matching, cognitive style/processing, and neuropsychological assessment, there is little, if any, empirical support for prescribing different treatments based on the assessment of different aptitudes. Cronbach (1975) expressed his frustration with ATI research by stating: "Once we attend to interactions, we enter a hall of mirrors that extends to infinity" (p. 119). Abandoning the quest for ATIs, Cronbach (1975) suggested context-specific evaluation and short-run empiricism: "One monitors responses to treatment and adjusts it" (p. 126). The approach recommended by Cronbach forms the conceptual basis for responsiveness to treatment as the criterion in making LD eligibility determinations. Yet before describing specific research using this approach for students with LD, I provide a conceptual basis for responsiveness to intervention in the following section.
Responsiveness to intervention can be defined as the change in behavior or performance as a function of an intervention (Gresham, 1991). The concept of responsiveness to intervention uses a discrepancy-based approach; however, the discrepancy is between pre- and postintervention levels of performance rather than between ability and achievement scores. Given that a goal of all interventions is to produce a discrepancy between baseline and postintervention levels of performance, the failure to produce such a discrepancy within a reasonable period (an inadequate response to intervention) might be taken as partial evidence for the presence of an LD. Responsiveness to intervention has received a great deal of attention over the past 25 years in the experimental analysis of behavior literature (see Nevin, 1988, 1996 for comprehensive reviews).
In an analogy to Newtonian physics, Nevin (1988) used the term behavioral momentum to explain a behavior's resistance to change. That is, a moving body possesses both mass and velocity and will maintain constant velocity under constant conditions. The velocity of an object will change only in proportion to an external force and in inverse proportion to its mass. Considering the momentum metaphor, an effective intervention ("force") results in a high level of momentum ("responsiveness") for the behavior in question (e.g., learning to read).
For example, a reading intervention designed to produce oral reading fluency would be considered successful if it produced reading fluency rapidly and reliably during intervention and if reading fluency persisted after the intervention is withdrawn. In contrast, if oral reading fluency deteriorated after the intervention is withdrawn, teachers would not be satisfied with the rate of oral reading fluency no matter how well a student read during intervention. Also, if oral reading performances occurred at low rates with numerous errors (omissions, substitutions) during intervention, teachers would likely conclude that the student had not established automaticity in oral reading and would seek to extend, intensify, or change the reading instruction.
In the field of LD, the goal for all students is to facilitate the momentum of academic performances, primarily in reading. One can conceptualize response to intervention as being determined by response strength ("momentum") in relation to an intervention implemented to change behavior ("external force"). Most children at risk for LD exhibit poor performances in the area of reading (e.g., poor fluency, lack of phonological awareness). That is, their reading behavior has low velocity, which does not change when they are exposed to typical reading instruction in the general education classroom. A response to intervention approach to eligibility determination identifies students as having an LD if their academic performances in relevant areas do not change in response to a validated intervention implemented with integrity.
As we shall see later, much sound empirical work has been done on the idea of identifying treatment-adequate and -inadequate responders to intervention in the field of reading disabilities (Fuchs, Fuchs, & Hamlett, 1989a; Vellutino et al., 1996, 1998). The following section describes the concept of treatment validity and how it can be incorporated into the notion of responsiveness to intervention.
Treatment validity (sometimes referred to as treatment or instructional utility) is the degree to which any assessment procedure contributes to beneficial outcomes for individuals (Cone, 1989; Hayes, Nelson, & Jarrett, 1987). Although the concept of treatment validity evolved from the behavioral assessment camp, it shares several characteristics and concepts found in the traditional psychometric literature: (a) Treatment validity contains an aspect of Sechrest's (1963) notion of incremental validity in that it requires an assessment procedure to improve prediction over and above existing procedures; (2) treatment validity contains the idea of utility and cost-benefit analysis that is common in the personnel selection literature (Mischel, 1968; Wiggins, 1973); and (c) treatment validity is related to Messick's (1995) evidential basis for test interpretation and use, particularly as it relates to construct validity, relevance/utility, and social consequences. It is possible for a particular test interpretation to have construct validity, but have little or no relevance or utility for a particular use of that test (e.g., recommendations for treatments based on the test). Finally, as previously noted, the whole idea behind ATI research is based on the notion of treatment validity, the matching of instructional treatments to aptitudes.
The ATI literature on modality matching, cognitive style/processing, and neuro-psychological assessment provides little evidence that the information gathered about aptitudes results in "incremental advance information" that helps in recommending instructional interventions for students with learning difficulties. More than 15 years ago in a review in the Buros Mental Measurement Yearbook of the Wechsler Intelligence Scale for Children-Revised (WISC-R), Witt and Gresham (1985) wrote: "The WISC-R lacks treatment validity in that its use does not enhance remedial interventions for children who show specific academic skill deficiencies... For a test to have treatment validity, it must lead to better treatments (i.e., better educational programs, teaching strategies, etc.)" (p. 1717). This statement could be extended to all cognitive ability measures based primarily on their inability to inform or guide instructional interventions (Gresham & Witt, 1997; Reschly & Grimes, 1995). Voicing a similar sentiment regarding using IQ tests in the diagnosis of reading disability, Share, McGee, and Silva (1989) argued:
It may be timely to formulate a concept of reading disability that is independent of IQ. Unless it can be shown to have some predictive value for the nature of treatment outcomes, consideration of IQ should be discarded in discussions of reading difficulties. (p. 100, emphasis added)
In describing the value of using a treatment validity criterion in the field of LD, Fuchs and Fuchs (1998) suggested that this approach focuses on maximizing regular education's potential effectiveness for all students. Judgment about the need for special education is reserved until the effects of instructional adaptations have been assessed in the regular classroom and data verify that a special education program would enhance learning. One promising assessment approach that meets the treatment validity criterion and can be used to make eligibility decisions is curriculum-based measurement (CBM) (Fuchs & Fuchs, 1997, 1998; Reschly & Grimes, 1995; Shinn, 1995).
There is a great deal of empirical support for adopting a treatment validity approach rather than a discrepancy-based approach to defining LD (Clay, 1987; Foorman, Francis, Fletcher, Schatschneider, & Mehta, 1998; Fuchs & Fuchs, 1997, 1998; Torgesen et al., 2001; Vellutino, Scanlon, & Lyon, 2000; Vellutino et al., 1996, 1998). Vellutino et al. (1996) noted that the discrepancy approach to defining LD does not screen out those children whose reading difficulties might be due to either inadequate schooling or limited exposure to effective reading instruction. Instead, Vellutino et al. argued for using exposure to intensive reading instruction as a "first-cut" diagnostic aid in distinguishing between reading problems caused by cognitive deficits versus those caused by experiential deficits (poor or inadequate reading instruction).
Vellutino et al. (1996) conducted a longitudinal study of 183 kindergarten children composed of poor readers (n =118) and normal reader controls (n = 65). Poor readers were selected on the basis of scoring below the 15th percentile on measures of word identification or letter-sound correspondences using nonsense words. Children in the poor reader group (a subsample of 74 children) were given daily one-to-one tutoring (30 minutes per day) for a total of 15 weeks over 70-80 sessions (35-40 hours of tutoring). Using hierarchical linear regression analyses, Vellutino et al. calculated growth rates for each child from kindergarten to second grade. Slopes from these analyses were rank-ordered and used to place children into 1 of 4 groups: Very Limited Growth (VLG), Limited Growth (LG), Good Growth (GG), and Very Good Growth (VGG). Approximately half of the sample showed VLG (26%) or LG (24%).
If one accepts the proposition that "difficult to remediate" children can be considered LD and easily remediated children are not LD, then the entire questionable process of calculating ability-achievement discrepancies can be summarily abandoned. Vellutino et al. (1996, 2000) showed that IQ-achievement discrepancy scores did not reliably distinguish between disabled and nondisabled readers, did not distinguish between difficult-to-remediate (VLG and LG) and readily-remediated (VGG and GG) students, and did not predict response to remediation. In short, IQ-achievement discrepancy scores did not have treatment validity.
Adopting a treatment validity approach to the identification of students with LD has several technical requirements. These requirements include (a) ability of measures to model academic growth (Burchinal, Bailey, & Synder, 1994; Fuchs & Fuchs, 1997, 1998; Vellutino, Scanlon, & Tanzman, 1998; Vellutino et al., 1996), (b) availability of validated treatment protocols (Berninger & Abbott, 1994; Torgesen et al., 2001), (c) capability of distinguishing between ineffective instruction and unacceptable individual learning (Fuchs & Fuchs, 1997, 1998), (d) suitability in informing instructional decisions (Fuchs & Fuchs, 1997, 1998; Vellutino et al., 1996, 1998; Witt & Gresham, 1997), and (e) sensitivity to detection of treatment effects (Fuchs & Fuchs, 1997; Marston, Fuchs, & Deno, 1986; Marston, 1987-88; Vellutino et al., 1996, 1998). Each of these requirements for treatment validity will be described in the following sections.
All intervention investigations attempt to determine whether a change in a dependent variable is due to systematic and controlled changes in an independent (treatment) variable. Traditionally, this question has been addressed using a pretest/posttest design in which an experimental (treatment) and a control group are measured before and immediately after intervention. The effects of treatment in such designs are evaluated by comparing pretest and posttest scores using either repeated measures analysis of variance (ANOVA) or analysis of covariance (ANCOVA, using pretest scores as covariates), or by computing simple differences for groups between pretest and posttest scores (Kirk, 1994). Although these types of analyses can tell us whether or not a given treatment produced mean differences on a dependent variable relative to a control group, these analyses do not supply enough data to model individual change adequately (Burchinal et al., 1994).
A viable alternative to traditional pretest/posttest design comparisons is the use of growth curve analysis (GCA) using hierarchical linear models as a means of modeling academic growth. GCA is used to address three fundamental research questions (Bryk & Raudenbush, 1987; Burchinal et al., 1994). First, GCA is used to determine patterns of change for both individuals and groups. A common example is physicians charting height and weight of children to assess whether or not a child is displaying adequate growth compared to a matched reference group. Second, GCA is used to determine if certain groups show different patterns of change over time. For example, children exposed to a reading intervention emphasizing phonological awareness might be compared to a similar group of children receiving a reading program focusing on orthographic skills. Comparisons between these two groups would be expressed in terms of differences in rate (slopes) and level (intercepts) of change. Third, GCA is often used to study the correlates of change. For instance, a researcher might be interested in contrasting the patterns of change for LD and LA groups who receive the same reading intervention. In addition, the researcher may want to assess whether background characteristics (e.g., gender, ethnicity, socioeconomic status, IQ) moderate these patterns of change over time.
Several assumptions must be met in using GCA to model academic growth (Bryk & Raudenbush, 1987; Burchinal et al., 1994): (1) Growth parameters are assumed to be normally distributed and measured on either an interval or ratio scale; (2) dependent measures are expressed in the same units of measurement over time; (3) structure of the dependent variable does not change over time; (4) each group being compared has homogeneous variances (homogeneity of variance), and (5) an adequate model of change, whether it be linear, quadratic, or cubic, has been selected and fit to the data to model patterns of growth. It should be noted that GCA does not require the same data collection design for each participant in a study; that is, some individuals may be measured 4 times, others 6 times, and still others 8 times. Moreover, spacing between data collection points for each individual does not need to be equal. In short, GCA allows for a broader representation of the effects of an intervention on growth and is extremely flexible with respect to the number and timing of observations across research participants (Bryk & Raudenbush, 1987; Burchinal et al., 1994).
Fuchs and Fuchs (1998) describe the use of CBM as a promising measurement tool for modeling academic growth within the special education eligibility determination process. CBM meets many of the assumptions of GCA in that it provides equal scaling of the dependent variable for individuals over time, it measures the dependent variable on an interval scale, and the structure of the dependent variable remains constant over time. Use of the CBM model in LD eligibility determination will be described in detail in a subsequent section of this paper.
In order to adopt a responsiveness-to-intervention approach, validated treatment protocols must be implemented for students who might be considered learning disabled. Within both the general education and special education classroom, this may be a daunting task. For example, general education teachers often are not prepared to deal with the normal variation among students in the acquisition of reading and writing skills (Berninger, Hart, Abbott, & Karovsky, 1992). Moreover, a survey of state departments of education revealed that only 29 states require elementary teachers to take academic coursework in reading and no states require coursework in writing (Nolen, McCutchen, & Berninger, 1990). Many students classified as LD may fail to acquire basic academic skills not because of some underlying processing disorder, but rather because they have not been given adequate opportunities to learn. There is ample reason to believe that most reading difficulties (and children subsequently labeled as LD) are caused by woefully inadequate preliteracy experience, inadequate instruction, or some combination of both (Vellutino et al., 1996, 1998).
A number of validated treatment protocols can be used to differentiate adequate from inadequate treatment responders. Recently, Torgesen et al. (2001) compared two carefully designed instructional approaches to facilitate academic growth in reading for 8- to 10-year-old children. One intervention was the Auditory Discrimination in Depth (ADD) program that emphasized discriminations among phonemes, monitoring/representation of sound sequences in spoken syllables, and self-monitoring skills (Lindamood & Lindamood, 1998). The second intervention was Embedded Phonics (EP), which provided direct, explicit instruction in word-level reading skills and providing extensive opportunities to read and write meaningful text (Torgesen et al., 2001). The ADD and EP programs differed in depth and extent of instruction in phonemic awareness and phonemic decoding skills. Both the ADD and EP programs were provided to students on a 1:1 basis, in two 50-minute sessions, 5 days per week for 8-9 weeks and students were assessed at 1-and 2-year followups. Hours of intensive reading instruction for the ADD and EP groups totaled 67.5. Following training, all students received 8 weeks of generalization training consisting of a single 50-minute session each week.
The results of the Torgesen et al. study showed that the ADD and EP programs were equally effective in remediating reading difficulties based on the Woodcock-Johnson Broad Reading Cluster score (slope effect sizes = 4.4 and 3.9, respectively). In fact, these interventions "normalized" the reading skills of approximately one half to two thirds of the students, depending on the outcome measure used. Scores on reading comprehension (Woodcock-Johnson Passage Comprehension) were even better with 80-85% of students performing in the average range. About 40% of the students in this investigation were returned full-time to the general education classroom and were no longer considered in need of special education. Torgesen et al. concluded:
...the similarities in growth rate of the ADD and EP conditions in our study suggest that given the right level of intensity and teacher skill, it is possible to obtain these rates of growth via a variety of approaches to direct instruction in reading. We might even suggest that these rates could serve as a benchmark for "reasonable progress" in reading for students receiving remedial instruction in both public and private settings... [T]hey are clearly much higher than is typically achieved in most current special education settings. (p. 52)
The Torgesen et al. investigation provides insight into how we might define inadequate responders based on the responsiveness-to-intervention concept. Approximately 25% of students in this investigation were nonresponders to the intensive reading interventions with mean standard scores of about 70 on Word Attack, Word Identification, and Comprehension. Similarly, the Vellutino et al. (1996) study described earlier suggested that approximately 25% of students exposed to an intensive reading intervention of 37.5 hours showed VLG on measures of word identification and phonological skills. In using this resistance-to-intervention notion to diagnose reading disabilities, Vellutino et al. stated:
...to render a diagnosis of specific reading disabilities in the absence of early and labor-intensive remedial reading that has been tailored to the child's individual needs is, at best, a hazardous and dubious enterprise, given all of the stereotypes attached to this diagnosis... [O]ne can increase the probability of validating the diagnosis if one combines impressions and outcomes derived from early, labor-intensive, and individualized remediation with results of relevant psychological and educational testing in evaluating the etiology of a child's difficulties in learning to read. (p. 632)
Additional information on what constitutes a validated treatment protocol can be found in a recent meta-analysis by Swanson and Hoskyn (1999) who summarized 180 intervention studies for students with LD. Interventions were classified into one of four categories: (a) Direct Instruction (DI), (b) Strategy Instruction (SI), (c) Combined DI+SI, and (d) non-DI/non-SI. Swanson and Hoskyn (1999) defined DI as interventions that used fast-paced instruction in small groups; presented well-sequenced, highly focused lessons; provided numerous opportunities to respond; gave frequent performance feedback on accuracy and responses; and used frequent on-topic questions regarding academic material (Englemann & Carnine, 1991; Kame'enui, Jitendra, & Darch, 1995; Lovett, Borden, DeLuca, Lacerenza, Benson, & Brackstone, 1994; Slavin, 1987).
Studies were categorized as SI if they met the following three criteria: (a) They provided elaborate explanations of material (e.g., explanations, elaborations, and plans directing task performance), (b) they used modeling from teachers which included verbal modeling, questioning, and demonstration, and (c) they incorporated prompts or reminders or multiprocess instructions and dialogue between teachers and students (Borkowski & Turner, 1990; Graham & Harris, 1996; Levin, 1986; Pressley & Ghatala, 1990; Rosenshine, 1995). Finally, studies meeting both DI and SI criteria were categorized as Combined DI+SI and studies meeting neither of these criteria were classified as non-DI/non-SI.
On the basis of these 180 studies, a total of 1,537 effect sizes were calculated comparing LD students in the treatment groups with LD students in control groups. Overall, the mean effect size was 0.79 (SD = 0.52). Swanson and Hoskyn (1999) described the typical intervention study as including 22.47 minutes of daily instruction, 3.58 times per week, over 35.72 sessions. On average, students received 80 minutes per week over almost 10 weeks of intervention, or approximately 13.3 hours of instruction. With respect to the type of intervention, the Combined DI+SI group had greater effect sizes (M = 0.81) than the DI alone (M = 0.77), SI alone (M =0.67), and non-DI/non-SI (M = 0.62) interventions. There were no significant differences among these latter three intervention groups. Interestingly, studies producing the largest effect sizes reported only minimal discrepancies between IQ and reading achievement (M = 0.95) supporting the questionable use of the IQ-achievement discrepancy in predicting responsiveness to intervention described by Vellutino et al. (1998). Also, interventions were less effective with students having reading scores slightly higher than their IQ scores (reading scores > 90 and IQ 85-90).
Swanson and Hoskyn's (1999) meta-analysis suggests that there are several validated intervention approaches in reading for students with LD with effect sizes from 0.58 to 0.81. The Combined DI+SI interventions produced a large effect size (0.81) which indicates that 80% of students in the intervention groups had reading scores equal to or greater than students in control groups. This effect size, however, is substantially lower than those reported by Torgesen et al. (2001) and Vellutino et al. (1996). The lower effect sizes reported by Swanson and Hoskyn may be due, in part, to differences in the intensity of treatment. Torgesen et al. provided 67.5 hours of instruction over 8 weeks and Vellutino et al. provided 35-40 hours of instruction over 15 weeks. The prototypical intervention in the Swanson and Hoskyn meta-analysis provided only 13.3 hours of instruction over approximately 10 weeks. Regardless of these effect size differences, a substantial body of empirical research supports the validity of treatment protocols for remediating reading deficiencies of students with LD.
An important decision in using a responsiveness-to-intervention approach to defining LD is the differentiation of skill (acquisition) deficits from performance (motivational) deficits. Skill deficits refer to the absence of an academic skill in a student's repertoire ("can't do" problems) and performance deficits describe a lack of motivation to perform a given academic skill ("won't do" problems). Skill deficits most often result from inadequate, insufficient, or inappropriate instruction whereas performance deficits result from inadequate, insufficient, or inappropriate arrangement for contingencies for academic performance (Gresham, 1986; Lentz, 1988).
To determine an existing deficit for a particular child, Noell and Witt (1999) have suggested a straightforward process. First, a "test" for a performance deficit is conducted using CBM reading probes (i.e., 100-200-word passages) selected from a child's basal reader as well as two basal readers that immediately precede the current reader. The reading probes are administered under standard (nonreinforced) conditions and under conditions where a preferred reinforcer is given for reading above a prespecified criterion. If performance increases markedly under the reinforcement conditions, then the student is assumed to have a performance deficit rather than a skill deficit. If reinforcement does not markedly improve performance, the student is assumed to have a skills deficit because even under conditions of high motivation, the student still cannot perform the requisite reading skills.
A number of examples in the applied behavior analysis literature have addressed the issue of skill versus performance deficits (Ayllon & Roberts, 1974; Daly & Martens, 1994; Daly, Martens, Dool, & Hintze, 1998; Daly, Martens, Hamler, Dool, & Eckert, 1999; Lovitt, Eaton, Kirkwood, & Pelander, 1971). For instance, Lovitt et al. (1971) gave incentives to improve students' oral reading fluency and to encourage them to read faster. A similar procedure was used by Daly et al. (1998, 1999). Another approach to assess academic performance deficits is to offer students a choice among reading materials or a choice in the order in which they will complete assignments (silent reading first, followed by vocabulary drill) (Daly, Witt, Martens, & Dool, 1997; Dunlap et al., 1994; Kern, Childs, Dunlap, Clarke, & Falk, 1994). If performance improves dramatically under choice conditions relative to no-choice conditions, then one can assume the student has a performance rather than a skill deficit.