REFORMULATION AND GENERALISATION OF THE COHEN AND FLEISS KAPPAS
Keywords:Assessment of Consistency and Content Validity, Fleiss Kappa, Cohen Kappa, ICC, Gwet's AC1 Coefficient, Multi-Rater Assessments, CVI.
The assessment of consistency in the categorical or ordinal decisions made by observers or raters is an important problem especially in the medical field. The Fleiss Kappa, Cohen Kappa and Intra-class Correlation (ICC), as commonly used for this purpose, are compared and a generalised approach to these measurements is presented. Differences between the Fleiss Kappa and multi-rater versions of the Cohen Kappa are explained and it is shown how both may be applied to ordinal scoring with linear, quadratic or other weighting. The relationship between quadratically weighted Fleiss and Cohen Kappa and pair-wise ICC is clarified and generalised to multi-rater assessments. The AC coefficient is considered as an alternative measure of consistency and the relevance of the Kappas and AC to measuring content validity is explored.
Banerjee M., Capozzoli M., McSweeney L. & Sinha D. (1999). Beyond Kappa: A Review of Interrater agreement measures, The Canadian Journal of Statistics, 27 (1), 3-23.
Cohen J. (1960). A coefficient of agreement for nominal scales. Educational and Psychosocial Measurement, 20(1), 37-46. https://doi.org/10.1177/001316446002000104
Cohen J. (1968). Weighted Kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213. https://doi.org/10.1037/h0026256
Conger A.J. (1980). Integration and Generalisation of Kappas for Multiple Raters. Psychol Bull., 88, 322-328. https://doi.org/10.1037/0033-2909.88.2.322
Fleiss J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76 (5), 378-382.
Fleiss J.L. & Cohen J. (1973). The equivalence of weighted Kappa and the intra class correlation coefficient as measures of reliability. J Educational and Psychological Measurement, 33, 613-619. https://doi.org/10.1177/001316447303300309
Fleiss J.L. (2011). Design and analysis of clinical experiments, John Wiley & Sons.
Gwet K. L. (2014). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC.
Hubert l. (1977). Kappa Revisited, Psychol Bull, 84, 289-297. https://doi.org/10.1037/0033-2909.84.2.289
Jalalinajafabadi F. (2016). Computerised assessment of voice quality , Phd Thesis. University of Manchester.
Kitreerawutiwo, N. & Mekrungrongwong S. (2015). Health Behavior and Health Need Assessment among Elderly in Rural Community of Thailand: Sequential explanatory mixed methods study. LIFE: International Journal of Health and Life-Sciences, 1 (2), 62-69 https://doi.org/10.20319/lijhls.2015.12.6269
Koch G.G. (1982). Intraclass correlation coefficient. Encyclopedia of Statistical Sciences.
Light R.J. (1971). Measures of response agreement for qualitative data: some generalisations and alternatives. Psychol Bull, 76, 365-377. https://doi.org/10.1037/h0031643
Lee Rodgers J. & Nicewander W.A. (1998). Thirteen ways to look at the correlation coefficient. The American Statistician. 42(1), 59-66. https://doi.org/10.1080/00031305.1988.10475524
Müller R. & Büttner P.A. (1994). A critical discussion of intraclass correlation coefficients. Statistics in Medicine, 13(23-24): 2465-76 https://doi.org/10.1002/sim.4780132310
Polit, D.F. & Beck C.T. (2006), The Content Validity Index, Are You Sure You Know What’s Being Reported?: Critique and Recommendations, Research In Nursing & Health, 29, 489–497 https://doi.org/10.1002/nur.20147
Rödel E.(1971). Fisher R.A. Statistical Methods for Research Workers, 14. Aufl., Oliver & Boyd, Edinburgh, London. XIII, 362 S., 12 Abb., 74 Tab., 40 s. Biometrical Journal. 13(6), 429-30. https://doi.org/10.1002/bimj.19710130623
Sukron & Phutthikhamin N. (2016), The Development Of Caregivers’ Knowledge About Stroke And Stroke Caregiving Skills Tools For Stroke Caregivers In Indonesia, LIFE: International Journal Of Health And Life-Sciences, 2 (2), 35-47
Viera A.J., & Garrett J.M.(2005). Understanding interobserver agreement: the Kappa statistic. Fam Med, 37(5), 360-363.
Warrens M.J. (2010). Inequalities between Multi-Rater Kappas, Advances in Data Analysis & Classification, 4 no. 4, 271- 286. https://doi.org/10.1007/s11634-010-0073-4
How to Cite
Copyright (c) 2017 Authors
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright of Published Articles
Author(s) retain the article copyright and publishing rights without any restrictions.
All published work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.