LD Summit Table of Contents


Responsiveness to Intervention: An Alternative Approach to the Identification of Learning Disabilities

Frank M. Gresham, University of California-Riverside
Learning Disabilities Summit: Building a Foundation for the Future White Papers

This paper is available in alternative formats: | Download Word | Download pdf |

Pages: | 1 | 2 | 3 | 4 | 5 | 6 |

MODELS OF RESPONSIVENESS TO INTERVENTION

Several models of intervention might be considered in adopting the responsiveness-to-intervention approach in defining LD. These models include (a) predictor-criterion models that use and teach those skills that best predict reading competency; (b) a dual-discrepancy model based on children's failure to respond to well-planned and implemented general education interventions, and (c) applied behavior analytic models which focus on manipulation of antecedent and consequent environmental events to improve reading competence.

Predictor-Criterion Models

These models of intervention focus on component skills or processes that represent the best predictors of skill in learning to read. Berninger and Abbott (1994) suggested that oral language skills (e.g., phonemic awareness, phonetic segmentation, rime) and orthographic skills (letter coding, letter cluster, word recognition) are among the best predictors of reading. Criteria used to evaluate reading competence include reading accuracy, reading rate, and reading comprehension. Similarly, direct instruction models (e.g., Englemann & Carnine, 1992; Kame'enui et al., 1995) and strategy training models (e.g., Graham & Harris, 1996; Levin, 1986; Pressley & Ghatala, 1990) focus on teaching those skills and strategies that best predict reading performances.

As reviewed previously, reading intervention programs having the most empirical support are those using a combination of direct instruction and strategy training (Swanson & Hoskyn, 1999). In addition, the work of Torgesen et al. (2001) showed strong and equal effects of reading programs focusing primarily on phonemic awareness and phonemic decoding versus programs emphasizing application of these skills in reading meaningful text. The intensity of this treatment may have influenced treatment outcome as well. Recall that these interventions were implemented for 67.5 hours over 8 weeks. Vellutino et al. (1996) used a similar intervention program that included a large component of strategy training. This intervention lasted 30-40 hours over 15 weeks. Swanson and Hoskyn's (1999) meta-analysis showed that the prototypical reading intervention lasted 13.3 hours over approximately 7 weeks.

Clearly, these models of intervention in the literature have produced rather strong effects in the literature with disabled readers. However, a key and unresolved question concerns how these models might be adopted within the LD eligibility process. The purpose of LD identification is to identify students who are inadequately responding to a validated intervention after a reasonable period, not to remediate or "normalize" reading skills. What must be determined is what constitutes a "reasonable period" and how to determine inadequate responsiveness. These issues are addressed in the final section of this paper.

Dual-Discrepancy Model

Fuchs and Fuchs (1997, 1998) have suggested using a CBM approach that measures a student's responsiveness (or lack thereof) to intervention delivered in the general education classroom. The logic behind the CBM approach to measure responsiveness to intervention is similar to that in endocrinology in which a child's growth over time is compared to that of a same-age group (Fuchs, 1995). A child who shows a large discrepancy between his or her height and that of a normative comparison group may be considered a candidate for certain types of medical intervention. In education, if a child is showing a discrepancy between the current level of academic performance and that of same-age peers, then that child may be a candidate for special education. It should be noted, however, that a low-performing child who shows growth rates similar to that of peers in the same classroom would not be a candidate for special education because the child is deriving similar education benefits (low though they may be) from that classroom (Fuchs, 1995).

Fuchs and Fuchs (1998) proposed a reconceptualization of the LD identification process based on a treatment validity notion. In this approach, students are not classified as LD unless and until it has been demonstrated empirically that they are not benefiting from the general education curriculum. Unlike traditional LD assessment, which assesses a student's status on ability and achievement measures at one point in time, the treatment validity approach repeatedly assesses the student's progress in the general education curriculum using CBM. Fuchs and Fuchs indicate that special education should be considered only when a child's performance shows a dual discrepancy--that is, the student both performs below the level evidenced by classroom peers and shows a learning rate substantially below that of classroom peers.

Fuchs and Fuchs (1998) state that the dual-discrepancy model is based on three related propositions. First, it assumes that because student ability varies widely, different students will experience different educational outcomes. Second, low academic performance is relative to the classroom in which the student is placed. If a student's growth rate is similar to peers, then that student would not be considered discrepant from peers' learning rates and would not be a candidate for special education placement. Conversely, a student whose growth rate is low relative to classroom peers would be considered a candidate for either an alternative intervention or special education placement. Third, if the majority of students in a general education classroom are demonstrating inadequate growth relative to local or national norms, then one must consider enhancing the educational program for the entire classroom before considering a student's unresponsiveness to intervention.

Use of this CBM dual-discrepancy approach to determine eligibility is a two-stage process: problem identification and problem certification (Fuchs & Fuchs, 1997; Marston & Magnusson, 1988; Shinn, 1989). Problem identification attempts to determine if a student's academic performance is sufficiently deficient to justify further assessment. Shinn (1989) recommended that three to five CBM tests in each academic area of concern be administered on consecutive days using the student's curriculum materials. On the basis of these brief assessments, the student's median score is used as an estimate of performance level. This performance level is then compared to the same assessment data collected from typical peers in the same classroom.

Fuchs and Fuchs (1997) suggest that procedures for sampling "typical peers" vary in completeness, elaboration, and time. Some districts routinely collect local CBM normative data and use this information to gauge progress in the curriculum and/or to determine special education eligibility (Shinn, 1989, 1995; Shinn, Tindal, & Stein, 1988). For districts not collecting normative CBM data, one can assess three same-gender peers selected randomly from students a teacher nominates as having adequate academic achievement in the classroom. With large-scale normative data, a referred student would be identified for further assessment if his/her median score fell at or below the 10th percentile or between 1 and 2 standard deviations below the mean. With data available only at the classroom level, discrepancies between actual and expected performance would be calculated by dividing the expected performance (based on the mean CBM performances of selected peers) divided by the referred student's median CBM score. A ratio of 2.0 or greater would suggest that further assessment is needed.

The problem certification phase is designed to determine whether or not the magnitude and severity of the student's academic deficiencies justify special education and related services (Shinn, 1995). In making this determination, three CBM probes are administered at successively lower levels of the student's curriculum. On the basis of these assessments, the highest level at which the student demonstrates successful performance is that student's grade placement. Fuchs and Fuchs (1997) suggest that "success" can be operationalized in two ways. First, if a large CBM normative data base is unavailable, success can be defined relative to fixed standards such as 40-60 words read correctly per minute in second-grade text. Second, if one has access to a large CBM data base, success is based on percentile ranks relative to the student's grade placement. If a student's median score falls between the 25th and 75th percentile for typical students at that grade level, then the student is demonstrating successful performance (Fuchs & Fuchs, 1997).

The longstanding and impressive research program using CBM by Lynn and Doug Fuchs of Peabody College at Vanderbilt University provides empirical support for the dual-discrepancy approach as a decision-making guide in LD eligibility determination (Fuchs, 1995; Fuchs et al., 1989a; Fuchs, Fuchs, & Fernstrom, 1993; Fuchs, Fuchs, Hamlett, Phillips, & Karns, 1995). Similarly, Douglas Marston of Minneapolis Public Schools has successfully used CBM to make eligibility determinations for students with LD (Marston et al., 1986; Marston & Magnusson, 1988; Marston, Mirkin, & Deno, 1984).

A recent investigation by Speece and Case (in press) provided additional data supporting the dual-discrepancy approach to defining LD. These authors identified children as at risk for reading failure if their mean performance on CBM reading probes placed them in the lowest quartile of their class. A contrast group was identified that was composed of five students from each classroom based on scores at the median (2 students) and the 30th, 75th, and 90th percentiles (1 student at each level). At-risk children were placed into one of three groups: CBM dual discrepancy (CBM-DD), regression-based IQ-reading achievement (IQ-DS), and low achievement (LA). Students in the CBM-DD group were given 10 CBM oral reading probes administered across the school year. Slopes (based on ordinary least squares regression) for each child and classroom were calculated, and each student's performance level was based on the mean of the last two data points. Children were placed in the CBM-DD group (n = 47) if their slope across the year and level of performance at the end of the year were >1 standard deviation below that of classmates. Students were placed in the IQ-DS group (n = 17) if their IQ-reading achievement discrepancy was 1.5 or more standard errors of prediction (approximately a 20-point discrepancy). Children were placed in the LA group (n = 28) if their total reading score was <90.

Results of this investigation showed that the CBM-DD group was more deficient on measures of phonological processing and was rated by teachers as having lower academic competence and social skills and more problem behaviors than the IQ-DS and LA groups However, the CBM-DD and IQ-DS groups were not different on a standardized measure of reading achievement demonstrating the specificity of the CBM-DD model. These data provided additional support for using the CBM-DD model to identify students with LD, specifically those with a phonological deficit. In summarizing their findings, Speece and Case (in press) suggested:

Most research on reading disability proceeds from the assumption of failure to learn despite adequate instruction, a tenet of most definitions of learning disability, but this assumption is rarely tested. The dual discrepancy method does not reject the importance of individual differences to reading disability, but, in our view, expands the conceptualization to include the importance of instruction in the expression of the disability. (p. 36)

Fuchs and Fuchs (1997) proposed a three-phase model for determining LD eligibility using the CBM-DD approach. Phase I involves the documentation of adequate classroom instruction and dual discrepancies. It begins with weekly CBM assessments for all students in each school. An assessment team composed of a principal, school psychologist, special education teacher, and social worker review these data after 6 weeks to reach two decisions. First, the team decides if the overall classroom performance is adequate relative to other classrooms and district norms. Second, if classroom performance is acceptable, the team reviews individual student data to determine which students meet the dual-discrepancy criteria defined as (a) a difference of 1 standard deviation between a student's CBM median score and that of classmates and (b) a difference of 1 standard deviation between the student's CBM slope of improvement (growth) and that of classmates. Assuming students meeting these criteria do not have accompanying low-incidence conditions (e.g., mental retardation, sensory disabilities, autism), they proceed to Phase II of the process.

Phase II involves a prereferral intervention in which one member of the assessment team works with the general education teacher to design an intervention to remediate the student's dual discrepancy. CBM data are collected to judge the effectiveness of the intervention with the provision that the teacher implement a minimum of two interventions over a 6-week period. If students do not show adequate progress, they enter Phase III of the process.

Phase III of this process involves the design and implementation of an extended intervention plan. Essentially, this phase represents a special-education diagnostic trial period in which the student's responsiveness to a more intense intervention is measured. This phase lasts approximately 8 weeks, after which the team reconvenes and makes decisions concerning the child's placement. The team could decide that the intervention was successful and an individualized education plan (IEP) would be developed and the plan continued. Or, the team could decide that the intervention was unsuccessful in eliminating the dual discrepancy and consider alternative decisions such as changing the nature and intensity of the intervention, collecting additional assessment information, considering a more restrictive placement, or changing to a school having additional resources that better address the student's needs.

In summary, Fuchs and Fuchs (1997) propose that in order to qualify a student for special education, a three-pronged test must be passed: (a) a dual-discrepancy between the student's performance level and growth (1 standard deviation for each) and that of peers must be documented, (b) the student's rate of learning with adaptations made in the general education classroom is inadequate, and (c) the provision of special education must result in improved growth.

Functional Assessment Models

Another approach to identifying students on the basis of responsiveness to intervention comes from the applied behavior analysis (ABA) camp (Daly, Lentz, & Boyer, 1996; Daly & Martens, 1994; Daly et al., 1997; Haring, Lovitt, Eaton, & Hansen, 1978; Howell, Fox, & Morehead, 1993). This approach attempts to offer a functional rather than a structural explanation for children's academic difficulties. I also include within the ABA approach the Direct Instruction (Englemann & Carnine, 1991; Gersten et al., 1986) as well as the Precision Teaching models of intervention (Lindsley, 1991). The field of LD has traditionally offered structural explanations in the form of labels or traits to explain academic problems (e.g., LD, dyslexia, processing disorders). Structural explanations are not particularly useful from an intervention perspective because student traits (inferred from performances) cannot be directly manipulated and because the explanations do not identify environmental factors that might be contributing to academic failure (Daly et al., 1997).

Alternatively, a functional approach to understanding academic failure attempts to relate academic performance to environmental events that precede and follow student performance (e.g., opportunities to respond, reinforcement for accurate responding, time allocated for instruction, modeling and feedback of academic behaviors). From a functional perspective, the job of the interventionist is to analyze those factors that may explain poor performance and implement an instructional intervention to improve academic responding. In a functional approach, academic responding is operationalized using curriculum-based measures of oral reading, mathematics computation, written expression, and spelling such as those recommended in the dual-discrepancy approach of Fuchs and Fuchs (1997, 1998).

Daly et al. (1997) identified five common reasons why students fail and provided rather straightforward methods for testing these hypotheses quickly and efficiently so as to lead to interventions. The reasons are as follows: (a) they do not want to do it ("won't do" problems), (b) they have not spent enough time doing it (lack of practice and feedback), (c) they have not had enough help to do it (insufficient prompting or poor fluency), (d) the student has not had to do it that way before (instructional demands do not promote mastery), and (e) it is too hard (poor match between student skill level and instructional materials).

An extremely important concept in a functional approach to remediating academic difficulties is the instructional hierarchy (Haring et al., 1978). The instructional hierarchy describes the relationship between intervention components and stages of skill mastery. In the instructional hierarchy, students move through states of acquisition, fluency, generalization, and adaptation. Strategies that use modeling, prompting, and error correction can be expected to improve acquisition (accuracy), and strategies including practice and reinforcement are expected to improve fluency. Generalization training involves discrimination training across stimuli and maintenance activities over time (Daly et al., 1996; Martens, Witt, Daly, & Vollmer, 1999).

There is an extensive research base supporting the ABA model for improving academic performances (Daly et al., 1997, 1999; Elliott, Busse, & Shapiro, 1999; Englemann & Carnine, 1991; Greenwood, 1991; Skinner, 1998). Swanson and Sachs-Lee (2000) summarized 85 studies using single-subject designs across the academic domains of reading, mathematics, writing, and language using direct instruction (DI), strategy training (SI), Combined DI+SI, and non-DI/non-SI described earlier in this paper (see Swanson & Hoskyn, 1999). Based on an analysis of 793 effect sizes, the mean effect size was 0.87 (SD = 0.32), suggesting a strong effect. The average age of participants was almost 11 years and the mean IQ and achievement levels of participants were 95 and 77, respectively (M = 100, SD = 15). Results of this meta-analysis showed that DI and SI were effective in remediating academic deficits (except handwriting) and all interventions were more effective with lower IQ students than higher IQ students in reading.

The use of the ABA approach for eligibility determination creates some measurement challenges because this model relies almost exclusively on single-case experimental design data. Both the predictor-criterion and CBM-DD models use well-established and straightforward quantitative approaches to determine treatment nonresponders. An unresolved issue in the ABA approach concerns the most appropriate way of quantifying the effects of intervention. Gresham and Lambros (1998) identified several methods for quantifying the effects of interventions using single-case experimental design data that are described below. Time-series analysis is not included here because fitting these regression models with relatively few data points often yields inaccurate results and it is often impossible to meet the statistical assumptions of these models in educational practice (Kazdin, 1984).

Visual Inspection

Visual inspection of graphed data is by far the most common way of analyzing data from single-case designs (Johnston & Pennypacker, 1993). Effects of intervention are determined by comparing baseline levels of performance to postintervention levels of performance to detect treatment effects. Unlike statistical analyses, this method uses the "interocular" test of significance. There is a considerable body of research, however, suggesting that even highly trained behavior analysts cannot obtain consensus in evaluating single case data using visual inspection (Center, Skiba, & Casey, 1985-86; DeProspero & Cohen, 1979; Knapp, 1983; Matyas & Greenwood, 1990, 1991; Ottenbacher, 1990). It would appear that visual inspection of graphed data often results in erroneous conclusions regarding the presence or absence of treatment effects, particularly given that the data points are serially dependent or autocorrelated.

A study by Matyas and Greenwood (1990) showed that Type I error rates ranged from 16 to 84% for autocorrelated data, suggesting that researchers often judge the presence of treatment effects where none exist. Given the interpretative problems with graphed data in determining treatment effects and unacceptably high Type I error rates, other procedures should be used to supplement or corroborate interpretation of graphed data (Fisch, 1998). These are described in the following sections.

Reliable Changes in Behavior

Another method of quantifying effects in single-case designs is to calculate the extent to which changes in academic performance are reliable. Nunnally and Kotsche (1983) first proposed a reliable change index (RCI) to determine the effectiveness of an intervention for individuals. The RCI is defined as the difference between a posttest score and a pretest score divided by the standard error of difference between posttest and pretest scores (Christensen & Mendoza, 1986; Jacobson, Follette, & Revenstorf, 1984). The standard error of difference is the spread or variation of the distribution of change scores that would be expected if no actual change had occurred. An RCI of +1.96 (p < 0.05) would be considered a reliable change in behavior.

With single-case data, RCIs must be computed for baseline (pretest) and intervention (posttest) phases of the design. For example, in an ABAB withdrawal design, pretest scores would be calculated from the initial baseline (A) and posttest scores from the mean of the two intervention phases (B1+B2). Similarly, in a multiple baseline design, pretest scores would be calculated from the baselines of each subject (setting or behavior) and posttest scores from the means of the respective intervention phases. The standard error of difference would be based on the autocorrelation and variation of baseline and intervention phases. Although the RCI approach can be used to detect reliable changes in academic performance (relative to baseline) for a single student, it does not provide specific decision rules that might be used in making an LD eligibility determination. Moreover, RCIs are influenced by the reliability of the dependent measures used. If a measure is highly reliable (0.90 or higher), then small changes in behavior could be considered statistically reliable. Conversely, if a measure has low reliability, then large changes in behavior might not be statistically reliable, but could be important.

Effect sizes. Another way of quantifying single-case data is through the use of effect sizes. Although effect sizes typically are used to integrate group design research studies, Busk and Serlin (1992) have proposed two methods for calculating effect sizes in single-case studies. The first approach makes no distributional assumptions and calculates effect sizes by subtracting the treatment mean from the baseline mean and dividing by the standard deviation of the baseline mean. The second approach, based on the homogeneity of variance assumption, is the same, except that it uses the pooled within-phase variances as the error term. Effect sizes calculated in this way are interpreted the same way as traditional effect-size estimates. They can be used to estimate the effects of one or more treatments for an individual or to summarize a body of single-case intervention.

Swanson and Sachs-Lee (2000) used an alternative approach to calculate effect size by using the last three data points in baseline and treatment phases to calculate the means. This difference was then divided by the correlation between baseline and treatment data points, taking into account the average standard deviation for repeated measures. These authors argue that the number of sessions may inflate or deflate effect sizes and are subject to fluctuations in the dependent variable that are not a result of the treatment (cyclicity).

Effect sizes also can be calculated by computing the percentage of nonoverlapping data points (PNOL) between baseline and treatment phases (Mastropieri & Scruggs, 1985-86). PNOL is computed by indicating the number of treatment data points that exceed the highest baseline data point and dividing by the total number of data points in the treatment phase. For example, if 8 of 10 treatment data points exceed the highest baseline data point, then PNOL is 80%. This method provides for quantitative synthesis of single-case data that is relatively easy. However, the method would be inappropriate in some situations, including unusual baseline trends, floor and ceiling effects, and students in the initial stages of skill acquisition (Strain, Kohler, & Gresham, 1998).

Yet another approach in quantifying the effects of interventions in single-subject designs is to analyze trends over time by using time-structured Markov chains (Fisch, 1998). Markov chains involve the analysis of two-dimensional matrices containing the probabilities of changing from one set of conditions (e.g., preintervention performances) to another set of conditions (postintervention performances). Haccou and Meelis (1992) indicate that Markov chains are used frequently in naturalistic settings to assess changes in "states" of behavior from one time period to the next.

Social Validation

Social validity deals with three fundamental questions faced by professionals in the field of LD: What should we change? How should we change it? How will we know it was effective? There are sometimes disagreements among professionals as well as between professionals and consumers on these three fundamental questions. Wolf (1978) described the social validation process as the assessment of the social significance of the goals of intervention, the social acceptability of the intervention procedures to attain these goals, and the social importance of the effects of the intervention. This last component of the social validation process is most relevant to quantifying a student's responsiveness to intervention in the LD eligibility determination process.

The social importance of the effects produced by an intervention established the practical or educational significance of changes in academic performance. Do the quantity and quality of the change in academic performance make a difference in the student's academic functioning? Does the change in academic performance have habilitative validity (Hawkins, 1991)? Is the student's academic performance now in the "functional" range? All of these questions capture the essence of establishing the social importance of intervention effects.

One means of establishing the social importance of intervention effects is to conceptualize academic functioning as belonging to either a functional or dysfunctional distribution. For example, we could socially validate a reading intervention by demonstrating that a student moved from a dysfunctional to a functional range of reading performance. This result could be established by calculating the probability that the student's reading score belonged to a functional rather than a dysfunctional distribution. We could base these calculations on norm-referenced achievement tests or locally normed CBM measures.

Fawcett (1991) suggested that in evaluating the social importance of effects, we should specify various levels of performance. For example, one could specify ideal (the best performance available), normative (typical or commonly occurring performance), or deficient (the worst performance available). Interventions moving a student from a deficient level of performance to normative or ideal levels of performance could be considered socially important.

Pages: | 1 | 2 | 3 | 4 | 5 | 6 |

Return to LD Summit papers table of contents