Psychology and Cognitive Sciences

Open journal

ISSN 2380-727X

Expected Agreement Coefficient for Norm-Referenced Tests With Classical Test Theory

Rashid S. Almehrizi*

Article Information

Rashid Saif Almehrizi, PhD

Associate Professor, Educational Measurement and Statistics, Department of Psychology, Assessment and Technical Support Unit, Director, College of Education Sultan Qaboos University, P.O. Box: 32, PC: 123 Al-Khoudh, Sultanate of Oman; Tel. +968 2414 1613; E-mail: mehrzi@squ.edu.om

Main Text

INTRODUCTION

Psychological tests can follow two frameworks for interpretation and uses of their results: Norm-referenced and criterion-referenced. With norm-referenced interpretation and uses, investigator’s interest focuses on the relative ordering of examinees with respect to the performance for the norm group which the examinee is associated.¹ In generalizability theory framework, relative error scores variance is defined as the expected squared difference between an examinee’s observed deviation score (from examinee’s true score) and the associated group’s observed deviation score. On the other hand, criterion-referenced interpretation suggests that the investigator’s interest focuses on absolute interpretations of scores and absolute error scores variance.¹^,2,3 Relative error scores variance is defined as the expected squared difference between an examinee’s observed deviation score and the examinee’s true score.⁴

Since the first distinction between norm-referenced and criterion-referenced interpretations of test results, many researchers including Glaser and Nitko⁵ and Popham and Husek⁶argued that reliability coefficients in the classical test theory are appropriate for norm-referenced tests. These coefficients (such as KR-20⁷ and coefficient alpha⁸) depend on the relative standing of an examinee on a norm group.^9,10

Kane and Brennan¹¹ introduced a very useful general agreement function that is used to summarize different existing agreement coefficients for different uses and interpretations of test scores. Using this general agreement function, Kane and Brennan¹⁰ defined the norm-referenced expected agreement coefficient for norm-referenced tests (called generalizability coefficient) with generalizability theory framework. Using the general linear model for design (all examinees take same set of items) in generalizability theory for examinee’s observed score on each item, , on a sample of items, Brennan and Kane derived the agreement coefficient for norm-referenced interpretation and showed that the estimator of this coefficient is equal to coefficient alpha developed by Cronbach.⁸

The concept of expected agreement and its derivation method is very useful to understand test results and enhance its interpretation and uses.¹² It helps to differentiate examinees’ error scores and accordingly examinees’ true scores and test score reliability. Brennan¹ explained that norm-referenced agreement coefficient is associated with relative error scores whereas criterion-referenced agreement coefficient is associated with absolute error scores. The two types of error scores differ in their definition and implication when estimating and interpreting test score reliability.

The current application and utilization of the expected agreement is limited to generalizability theory frameworks. However, generalizability theory involves both theoretical and practical complexities.¹ It is based on mixture of concepts of variance components in analysis of variance and concepts of classical test theory. Similarly, the estimation of the expected agreement coefficient requires estimation of mean squares.¹^,13

On the other hand, classical test theory is based on simpler concepts and estimation methods that are appreciated by many practitioners.⁴ The advantages and application of expected agreement are not yet introduced within classical test theory. One possible reason behind delaying usages of expected agreement coefficient in classical test theory might be traced to its conventional definition of equivalent test forms.

The paper introduced the expected agreement for norm-referenced interpretations of test scores within classical test theory framework. The paper presents the context and assumptions of randomly equivalent test forms that are necessary to develop the expected agreement coefficients. The paper derived the expected agreement/reliability coefficient for norm-referenced tests utilizing the general agreement coefficients pioneered by Kane and Brennan.¹¹ Moreover, the estimator of this expected agreement coefficient was outlined.

METHOD

Procedure

The paper used the procedure outlined by Kane and Brennan¹¹ for deriving the expected agreement between two randomly selected instances of a testing procedure. The procedure assumes that the instances or tests are randomly selected from a universe of possible instances, which support the assumption that the expected distribution of outcomes for the population is believed to be the same for each administration of the testing procedure. The agreement function, a(S_pi,S_pj) defines the degree of agreement between any two scores of an examinee on two testing procedures, S_pi and S_pj. This agreement function can take any form as long it satisfies three conditions:

(1) a(S_pi,S_pj)≥0,

(2) a(S_pi,S_pj)=a(S_pj,S_pi), and

(3) a(S_pi,S_pi)+a(S_pj,S_pj)≥2a(S_pi,S_pj).

Two general agreement indices of instances for the testing procedure are defined: One is corrected for chance while the other is not corrected. The index of agreement which is not corrected for chance is:

The term A is the expected agreement given by A=E_p,I,Ja(S_pI,S_pJ), where the expectation is taken over the population of examinees and over pairs of tests that are independently sampled from the universe of tests and administered to the same population of examinees. The term A_m is the expected agreement between the instance of the testing procedure and itself, A_m=E_p,I a(S_pI,S_pI), where A_m represents the maximum value of A. A is equal to A_m when each examinee in the population has the same score on every test. Kane and Brennan noted that A_m corrects the problem of the dependence of on the scale of a(S_pI,S_pJ).

The index of agreement which is corrected for chance is

where term A_c quantifies the agreement between the two instances of the testing procedure that is due solely to chance. It is defined as the expected agreement between the score, S_pi, for a random selected examinee p on one test and the score, S_qj, for another independently sampled examinee q on an independently another sample test. That is.,

A_c=E_p,q,I,J a(S_pI,S_qJ)=E_p,I a(S_pI)E_q,J a(S_qJ).

Also, Kane and Brennan¹¹ define the expected disagreement or loss as the difference between the maximum expected agreement and the expected agreement,

σ² (ϵ) = L = A_m−A.

This expected loss gives the error score variance associated with the expected agreement function.

RESULTS

In order to derive the expected agreement coefficient within the context of classical test theory, we need to first introduce the concept of randomly equivalent test forms instead of the classical equivalent test forms. Randomly equivalent test forms is evident when the test developer is able to build a very large or infinite number of different test forms from a large pool of items measuring the psychological construct. Hence, test forms of equal size are considered randomly equivalent forms if each is sampled randomly and independently from the large pool of items. These test forms are not expected to have equal mean scores nor equal variance. However, examinees error scores from these randomly equivalent test forms are expected to be uncorrelated. Moreover, it is assumed that any test form is administered to a large sample of examinees that are randomly selected from the population of examinees.

In order to derive the expected agreement/reliability of test scores on test form (say form X), we need to hypothesize that this test form and another hypothesized form (say form Y) are randomly equivalent test forms with different items but equal in terms of size (form X with I items and form Y with J items). Let us refer form X as a reference test form and the other test form (form Y) as a hypothesized test form. These two forms are then administered to the same sample of examinees of size N.

For a norm-referenced test where the decision is based on the relative position of examinees to their peer examinees, the agreement function is defined as the expected product of relative distance of the observed average scores ( X̅_p and Y̅_p) on two randomly equivalent test forms from the associated mean score for items on each test form ( T_I and T_J) over all examinees.

A(r) = E_P,I,J (X̅_p−T_I )(Y̅_p−T_J) = E_P,I,J ∑_i∑_j(X_pi−T_i)(Y_pj−T_j)

= E_P,I,J (X_pi−T_i )(Y_pj−T_j)

where the expectation is over infinite randomly equivalent test forms of X and Y; each with equal number of items from the domain, over infinite randomly independent samples of N examinees from the population, and E_P,I,J(X_pi-T_i)(Y_pj-T_j) is the expected mean pair wise covariance of items on X with items on Y with relative to their individual item mean scores.

For the reference test form X, E_P,I(X_pi−T_i)(X_pi’−T_i’) represent the expected mean pair wise covariance of distinct items on X (i≠i’) with relative to their mean scores. Similarly, let E_P,J (Y_pj−T_j )(Y_pj’−T_j’) have similar definition for items on test form Y. Because of randomly equivalent test forms,

E_P,I,J(X_pi−T_i)(Y_pj−T_j) = E_P,I(X_pi−T_i )(X_pi’-T_i’) = E_P,J(Y_pj−T_j)(Y_pj’−T_j’)

Hence, the expected agreement function, A(r) , becomes

A(r) = E_P,I(X_pi−T_i)(X_pi’−T_i’ ) = E_P,I∑∑_i≠i’(X_pi−T_i )(X_pi’−T_i’)

By simple algebra, A(r) becomes,

where T_I=E_I ∑_iT_i.

This expected agreement function gives the true score variance for norm-referenced tests, σ²(T_r). The maximum expected agreement for norm-referenced testing is,

A_m(r) = E_P,I(X̅_p−T_I )(X̅_p−T_I) = E_P,I(X̅_p−T_I)²

The expected agreement for norm-referenced testing due to chance is,

A_c(r) = E_P,Q,I,J(X̅_p−T_I )(Y̅_q−T_J ) = E_P,I(X̅_p−T_I) E_Q,J(Y̅_q−T_J)

= E_P,I( ∑_iX_pi − ∑_iT_i) E_Q,J( ∑_jY_qj− ∑_jT_j) = 0,

because E_P (X_pi) = T_i and E_Q (Y_qj) = T_j. Hence, the norm-referenced agreement coefficient is,

θ(r) = θ_c(r) =

or θ(r) = θ_c(r) =

This coefficient can be also written as,

θ(r) = θ_c(r) =

This result suggests that the correction for chance agreement has also no effect on the norm-referenced agreement.

The expected loss associated with the norm-referenced agreement coefficient is,

L(r) = A_m(r) − A(r) = [E_I ∑_iE_P (X_pi−T_i) − nE_P,I(X̅_p−T_I )²]

= E_P,I∑_i((X_pi−T_i) − (X̅_p−T_I))²

which equals the appropriate error score variance for norm-reference d tests, σ²(ϵ_r).

This error score variance is similar to the relative error score variance identified by Brennan and Kane² using Generalizability theory. This quantifies the expected squared difference between each examinee’s observed deviation score from the test average score and the deviation of an examinee’s true score from the test average score on the domain of items.

ESTIMATION

The components of all expressions of the expected agreement/reliability coefficients have the form of expected value of some terms over different random sets of items from the domain of items and over different random samples of examinees from the population of examinees. The sample counterparts of these terms can be used to estimate these expected values.

The expected norm-referenced agreement/reliability coefficients can be estimated by collecting data from administering one test form of n items to a representative sample of N examinees. If we substitute (X̅_p), T_i and T_I by their sample counterparts, x̅_p= ∑_ix_pi, x̅_i= ∑_px_pi, and x̅ = ∑_ix̅_i = ∑_px̅_p respectively, the estimator of the expected agreement coefficient for norm-referenced test is,

θ(r) = = 1 −

The associated loss is,

L̂(r) = [∑_iσ²(x_pi) − nσ² (x̅_p)] ,

Which gives the estimator of the relative error score variance for norm-referenced test

In these equations,

σ_ii_‘= ∑_p(x_pi−x̅_i)(x_pi’−x̅_i’),

σ²(x_pi) = ∑_p(x_pi−x̅_i)²,

σ²(x̅_pi) = ∑_p(x̅_p−x̅)².

DISCUSSION AND CONCLUSION

The paper derived the expected agreement coefficient for norm-referenced tests using classical test theory framework under the assumption of randomly equivalent test forms as replacement of the conventional equivalent test forms. The estimators of the resulted coefficient proved itself to be equal to coefficient alpha for Cronbach⁸ that was derived under different assumption of essentially tau-equivalent test form.

This result supports what Glaser and Nitko⁵ and Popham and Husek⁶ argued that reliability coefficients in the classical test theory such as coefficient alpha and KR-20 are appropriate for norm-referenced tests. The error scores associated with coefficient alpha is the relative error score variance that is defined as the difference between individual examinee’s performance and the performance of his/her peers who took the test.

The estimation of the expected agreement coefficient for norm-referenced tests can use either unbiased or biased estimators of its terms. It can be easily showed that if the biased estimators of the terms in the above equations are used, they would give identical estimates of the expected agreement coefficient for norm-reference tests. However, the estimation of the error score variances and the true score variance, however, are affected by whether the unbiased or biased sample variances are used (The unbiased estimators are preferred).

Reference

1. Brennan RL. Generalizability theory and classical test theory. Applied Measurement in Education. 2010; 24(1): 1-21. doi: 10.1080/08957347.2011.532417

2. Brennan RL, Kane MT. An index of dependability for mastery tests. J Educ Meas. 1977; 14(3): 277-289. doi: 10.1111/j.1745-3984.1977.tb00045.x

3. Brennan RL, Kane MT. Signal/noise ratios for domain-referenced tests. Psychometrika. 1977; 42(4): 609-625. doi: 10.1007/BF02295983

4. Gao X, Brennan R, Guo F. Modeling measurement facets and assessing generalizability in a large-scale writing assessment. GMAC Research Report. 2015.

5. Glaser R, Nitko AJ. Measurement in learning and instruction. In: Thorndike RL, ed. Educational measurement. Washington DC, USA: American Council on Education; 1971.

6. Popham WJ, Husek TR. Implications of criterion-referenced measurement. J Educ Meas. 1969; 6(1): 1-9. doi: 10.1111/j.1745-3984.1969.tb00654.x

7. Kuder GF, Richardson MW. The theory of the estimation of test reliability. Psychometrika. 1937; 2(3): 151-160. doi: 10.1007/BF02288391

8. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951; 16(3): 297-334. doi: 10.1007/BF02310555

9. Almehrizi RS. Coefficient alpha and reliability of scale scores. Applied Psychological Measurement. 2013; 37(6): 438-459. doi: 10.1177/0146621613484983

10. Cronbach LJ, Shavelson RJ. My Current thoughts on coefficient alpha and successor procedures. Educ Psychol Meas. 2004; 64(3): 391-418. doi: 10.1177/0013164404266386

11. Kane MT, Brennan RL. Agreement coefficients as indices of dependability for criterion-referenced tests. Applied Psychological Measurement. 1980; 4(1): 105-126. doi: 10.1177/014662168000400111

12. Almehrizi R. Normalization of mean squared differences to measure agreement for continuous data. Stat Methods Med Res. 2013. doi: 10.1177/0962280213507506

13. AlKharusi H. Generalizability theory: An analysis of variance approach to measurement problems in educational assessment. J Studies Educ. 2012; 2(1): 184-196. doi: 10.5296/jse.v2i1.1227

LATEST ARTICLES

2024 Jul

pdf

Full-Text

Study on Major Health and Constraints of Backyard and Commercial Poultry Production in Hawassa and Yirgalem Town, Southern Ethiopia

Amanuel P. Beta, Dereje Abera, Legese Belayneh and Isayas A. Kebede

Cross Sectional Study

doi.

2024 Jul

pdf

Full-Text

Unraveling the Mysteries of Type-A Aortic Dissection Using POCUS/Echocardiography

Syeda Rukh*, Sathyanarayana Machani and Milind Awale

Case Report

doi.

Stacked Bars of Social Behaviors in Dependence of Sender and Receiver

2024 Jul

pdf

Full-Text

Do they Play or Flirt? ‘Pawsitive’ Correlations of Castration Status and Social Behaviour of Male Dogs (Canis lupus familiaris): Video Analyses and Questionnaires

Carina A. Kolkmeyer* and Udo Gansloßer

Original Research

doi.

2024 Jul

pdf

Full-Text

Assessment of Hygienic Practice, Isolation and Antimicrobial Susceptibility Test of E. coli from Honey Bees Farms in and Around Haramaya University and Haramaya Woreda, Ethiopia

Ahmedyasin M. Aliyi*, Adem Hiko, Abdallahi Abdureman and Mohammedkemal M. Ame

Cross Sectional Study

peer reviewed

doi.

2024 Jul

pdf

Full-Text

Employee Retention Model for the IT/ITES Sector: Embed your Employees through C.A.R.E and Retain them

Tanvi Chaturvedi*

Original Research

doi.

2024 Jun

pdf

Full-Text

Hypertriglyceridemia-Induced Pancreatitis: A Case Report and Literature Review

Maarten Bulterys, Melvin Willems* and Agnes Meersman

Case Report

doi.

2024 Jun

pdf

Full-Text

From Neck Pain to a Life-Threatening Condition: A Case Report

Floris Vandewoude* and Sören Verstraete

Case Report

doi.

2024 Jun

pdf

Full-Text

Facial Rejuvenation and Patients Satisfaction with the Fourth Generation of Aptos P(LA/CL)–Hyaluronic Acid Threads: A 12-Month Study

Albina Kajaia*

Original Research

doi.

2024 Jun

pdf

Full-Text

Effective Management of Refractory Folliculitis Decalvans Using Secukinumab

Adel Al-Santali and Wasan Al-Qurashi*

Case Report

doi.

LATEST ARTICLES

Cross Sectional Study

2024 Jul

Study on Major Health and Constraints of Backyard and Commercial Poultry Production in Hawassa and Yirgalem Town, Southern Ethiopia

Amanuel P. Beta, Dereje Abera, Legese Belayneh and Isayas A. Kebede

Case Report

2024 Jul

Unraveling the Mysteries of Type-A Aortic Dissection Using POCUS/Echocardiography

Syeda Rukh*, Sathyanarayana Machani and Milind Awale

Original Research

2024 Jul

Do they Play or Flirt? ‘Pawsitive’ Correlations of Castration Status and Social Behaviour of Male Dogs (Canis lupus familiaris): Video Analyses and Questionnaires

Carina A. Kolkmeyer* and Udo Gansloßer

Psychology and Cognitive Sciences

Open journal

ISSN 2380-727X

Expected Agreement Coefficient for Norm-Referenced Tests With Classical Test Theory

Rashid S. Almehrizi*

INTRODUCTION

METHOD

RESULTS

ESTIMATION

DISCUSSION AND CONCLUSION

LATEST ARTICLES

Amanuel P. Beta, Dereje Abera, Legese Belayneh and Isayas A. Kebede

doi.

Syeda Rukh*, Sathyanarayana Machani and Milind Awale

doi.

Carina A. Kolkmeyer* and Udo Gansloßer

doi.

Ahmedyasin M. Aliyi*, Adem Hiko, Abdallahi Abdureman and Mohammedkemal M. Ame

doi.

Tanvi Chaturvedi*

doi.

Maarten Bulterys, Melvin Willems* and Agnes Meersman

doi.

Floris Vandewoude* and Sören Verstraete

doi.

Albina Kajaia*

doi.

Adel Al-Santali and Wasan Al-Qurashi*

doi.

LATEST ARTICLES

2024 Jul

2024 Jul

2024 Jul

company

Open Journals

Open discoveriEs

Open communities

others