REFORMULATION AND GENERALISATION OF THE COHEN AND FLEISS KAPPAS

Authors

  • Zheng Xie School of Engineering, University of Central Lancashire, Preston, UK
  • Chaitanya Gadepalli University Department of Otolaryngology, Central Manchester University Hospitals Foundation Trust and University of Manchester Academic Health Science Centre, Manchester, UK
  • Barry M.G. Cheetham School of Computer Science, University of Manchester, Manchester, UK

DOI:

https://doi.org/10.20319/lijhls.2017.33.115

Keywords:

Assessment of Consistency and Content Validity, Fleiss Kappa, Cohen Kappa, ICC, Gwet's AC1 Coefficient, Multi-Rater Assessments, CVI.

Abstract

The assessment of consistency in the categorical or ordinal decisions made by observers or raters is an important problem especially in the medical field.  The Fleiss Kappa, Cohen Kappa and Intra-class Correlation (ICC), as commonly used for this purpose, are compared and a generalised approach to these measurements is presented.  Differences between the Fleiss Kappa and multi-rater versions of the Cohen Kappa are explained and it is shown how both may be applied to ordinal scoring with linear, quadratic or other weighting.  The relationship between quadratically weighted Fleiss and Cohen Kappa and pair-wise ICC is clarified and generalised to multi-rater assessments. The AC coefficient is considered as an alternative measure of consistency and the relevance of the Kappas and AC to measuring content validity is explored.

References

Banerjee M., Capozzoli M., McSweeney L. & Sinha D. (1999). Beyond Kappa: A Review of Interrater agreement measures, The Canadian Journal of Statistics, 27 (1), 3-23.

https://doi.org/10.2307/3315487

Cohen J. (1960). A coefficient of agreement for nominal scales. Educational and Psychosocial Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104

Cohen J. (1968). Weighted Kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213. https://doi.org/10.1037/h0026256

Conger A.J. (1980). Integration and Generalisation of Kappas for Multiple Raters. Psychol Bull., 88, 322-328. https://doi.org/10.1037/0033-2909.88.2.322

Fleiss J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76 (5), 378-382.

Fleiss J.L. & Cohen J. (1973). The equivalence of weighted Kappa and the intra class correlation coefficient as measures of reliability. J Educational and Psychological Measurement, 33, 613-619. https://doi.org/10.1177/001316447303300309

Fleiss J.L. (2011). Design and analysis of clinical experiments, John Wiley & Sons.

Gwet K. L. (2014). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC.

Hubert l. (1977). Kappa Revisited, Psychol Bull, 84, 289-297. https://doi.org/10.1037/0033-2909.84.2.289

Jalalinajafabadi F. (2016). Computerised assessment of voice quality , Phd Thesis. University of Manchester.

Kitreerawutiwo, N. & Mekrungrongwong S. (2015). Health Behavior and Health Need Assessment among Elderly in Rural Community of Thailand: Sequential explanatory mixed methods study. LIFE: International Journal of Health and Life-Sciences, 1 (2), 62-69 https://doi.org/10.20319/lijhls.2015.12.6269

Koch G.G. (1982). Intraclass correlation coefficient. Encyclopedia of Statistical Sciences.

Light R.J. (1971). Measures of response agreement for qualitative data: some generalisations and alternatives. Psychol Bull, 76, 365-377. https://doi.org/10.1037/h0031643

Lee Rodgers J. & Nicewander W.A. (1998). Thirteen ways to look at the correlation coefficient. The American Statistician. 42(1), 59-66. https://doi.org/10.1080/00031305.1988.10475524

Müller R. & Büttner P.A. (1994). A critical discussion of intraclass correlation coefficients. Statistics in Medicine, 13(23-24): 2465-76 https://doi.org/10.1002/sim.4780132310

Polit, D.F. & Beck C.T. (2006), The Content Validity Index, Are You Sure You Know What’s Being Reported?: Critique and Recommendations, Research In Nursing & Health, 29, 489–497 https://doi.org/10.1002/nur.20147

Rödel E.(1971). Fisher R.A. Statistical Methods for Research Workers, 14. Aufl., Oliver & Boyd, Edinburgh, London. XIII, 362 S., 12 Abb., 74 Tab., 40 s. Biometrical Journal. 13(6), 429-30. https://doi.org/10.1002/bimj.19710130623

Sukron & Phutthikhamin N. (2016), The Development Of Caregivers’ Knowledge About Stroke And Stroke Caregiving Skills Tools For Stroke Caregivers In Indonesia, LIFE: International Journal Of Health And Life-Sciences, 2 (2), 35-47

Viera A.J., & Garrett J.M.(2005). Understanding interobserver agreement: the Kappa statistic. Fam Med, 37(5), 360-363.

Warrens M.J. (2010). Inequalities between Multi-Rater Kappas, Advances in Data Analysis & Classification, 4 no. 4, 271- 286. https://doi.org/10.1007/s11634-010-0073-4

Downloads

Published

2017-11-16

How to Cite

Xie, Z., Gadepalli, C., & Cheetham, B. M. G. (2017). REFORMULATION AND GENERALISATION OF THE COHEN AND FLEISS KAPPAS. LIFE: International Journal of Health and Life-Sciences, 3(3), 01–15. https://doi.org/10.20319/lijhls.2017.33.115