• Zheng Xie School of Engineering, University of Central Lancashire, Preston, UK
  • Chaitanya Gadepalli University Department of Otolaryngology, Central Manchester University Hospitals Foundation Trust and University of Manchester Academic Health Science Centre, Manchester, UK
  • Barry M.G. Cheetham School of Computer Science, University of Manchester, Manchester, UK



Assessment of Consistency and Content Validity, Fleiss Kappa, Cohen Kappa, ICC, Gwet's AC1 Coefficient, Multi-Rater Assessments, CVI.


The assessment of consistency in the categorical or ordinal decisions made by observers or raters is an important problem especially in the medical field.  The Fleiss Kappa, Cohen Kappa and Intra-class Correlation (ICC), as commonly used for this purpose, are compared and a generalised approach to these measurements is presented.  Differences between the Fleiss Kappa and multi-rater versions of the Cohen Kappa are explained and it is shown how both may be applied to ordinal scoring with linear, quadratic or other weighting.  The relationship between quadratically weighted Fleiss and Cohen Kappa and pair-wise ICC is clarified and generalised to multi-rater assessments. The AC coefficient is considered as an alternative measure of consistency and the relevance of the Kappas and AC to measuring content validity is explored.


Banerjee M., Capozzoli M., McSweeney L. & Sinha D. (1999). Beyond Kappa: A Review of Interrater agreement measures, The Canadian Journal of Statistics, 27 (1), 3-23.

Cohen J. (1960). A coefficient of agreement for nominal scales. Educational and Psychosocial Measurement, 20(1), 37-46.

Cohen J. (1968). Weighted Kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213.

Conger A.J. (1980). Integration and Generalisation of Kappas for Multiple Raters. Psychol Bull., 88, 322-328.

Fleiss J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76 (5), 378-382.

Fleiss J.L. & Cohen J. (1973). The equivalence of weighted Kappa and the intra class correlation coefficient as measures of reliability. J Educational and Psychological Measurement, 33, 613-619.

Fleiss J.L. (2011). Design and analysis of clinical experiments, John Wiley & Sons.

Gwet K. L. (2014). Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters. Advanced Analytics, LLC.

Hubert l. (1977). Kappa Revisited, Psychol Bull, 84, 289-297.

Jalalinajafabadi F. (2016). Computerised assessment of voice quality , Phd Thesis. University of Manchester.

Kitreerawutiwo, N. & Mekrungrongwong S. (2015). Health Behavior and Health Need Assessment among Elderly in Rural Community of Thailand: Sequential explanatory mixed methods study. LIFE: International Journal of Health and Life-Sciences, 1 (2), 62-69

Koch G.G. (1982). Intraclass correlation coefficient. Encyclopedia of Statistical Sciences.

Light R.J. (1971). Measures of response agreement for qualitative data: some generalisations and alternatives. Psychol Bull, 76, 365-377.

Lee Rodgers J. & Nicewander W.A. (1998). Thirteen ways to look at the correlation coefficient. The American Statistician. 42(1), 59-66.

Müller R. & Büttner P.A. (1994). A critical discussion of intraclass correlation coefficients. Statistics in Medicine, 13(23-24): 2465-76

Polit, D.F. & Beck C.T. (2006), The Content Validity Index, Are You Sure You Know What’s Being Reported?: Critique and Recommendations, Research In Nursing & Health, 29, 489–497

Rödel E.(1971). Fisher R.A. Statistical Methods for Research Workers, 14. Aufl., Oliver & Boyd, Edinburgh, London. XIII, 362 S., 12 Abb., 74 Tab., 40 s. Biometrical Journal. 13(6), 429-30.

Sukron & Phutthikhamin N. (2016), The Development Of Caregivers’ Knowledge About Stroke And Stroke Caregiving Skills Tools For Stroke Caregivers In Indonesia, LIFE: International Journal Of Health And Life-Sciences, 2 (2), 35-47

Viera A.J., & Garrett J.M.(2005). Understanding interobserver agreement: the Kappa statistic. Fam Med, 37(5), 360-363.

Warrens M.J. (2010). Inequalities between Multi-Rater Kappas, Advances in Data Analysis & Classification, 4 no. 4, 271- 286.




How to Cite

Xie, Z., Gadepalli, C., & Cheetham, B. M. G. (2017). REFORMULATION AND GENERALISATION OF THE COHEN AND FLEISS KAPPAS. LIFE: International Journal of Health and Life-Sciences, 3(3), 01–15.