Relationships between measures of phraseological complexity and writing quality in a CEFR assessment context




Bigrams, collocation complexity, ELF assessment, assessment models.


The present study contributes to understandings of the relationships between bigram and collocation complexity and writing quality by analysing a corpus of student placement tests in the UAE. At the heart of the study lies the need to understand the relationships between bigram and collocation diversity and sophistication. In developing such an understanding, the study extends work in this area by examining an underrepresented CEFR grading context. Using correlation analysis, findings indicate that several bigram and collocation measures correlate positively and negatively with essay grades. When built into regression modelling, the 3 predictors of:  number of bigram types, Mean MI bigram type, and the number of non-collocation noun + noun bigram types emerge as significant measures that predict grade variation. The implications of these findings for assessment practices in ELF contexts are discussed.


Author Biography

Lee McCallum, University of Exeter

Lee McCallum is an EdD candidate at the University of Exeter.  She has extensive teaching experience in EAP from the Middle East, Europe and China.  Her research interests include language assessment and writing instruction with a focus on how corpus-based methods can enhance these areas. Her most recent work, forthcoming in 2020, is a co-authored book titled: Understanding Development and Proficiency in Writing: Quantitative Corpus Linguistics Approaches which will be published by Cambridge University Press.



Bamgbose, A. (1998). Torn between the norms: Innovations in world Englishes, World Englishes, 17(1), 1-14.

Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 28-41.

Bestgen, Y. (2017). Beyond single-word measures: L2 writing assessment, lexical richness and formulaic competence. System, 69, 65-78.

Bulté, B., & Housen, A. (2014). Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing, 26, 42-65.

Clear, J. (1993). Tools for the study of collocation. In M. Baker, G. Francis & E. Tognini-Bonelli (Eds.), Text and technology: in honour of John Sinclair (pp.271-292). Amsterdam: John Benjamins.

Coombe, C., & Davidson, P. (2014). Common Educational Proficiency Assessment (CEPA) in English. Language Testing, 31(2), 269-276.

Crossley, S.A., & McNamara, D.S. (2012). Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35(2), 115-135.

Crossley, S.A., Cai, Z., & McNamara, D.S. (2012). Syntagmatic, paradigmatic and automatic n-gram approaches to assessing essay quality. Proceedings of the 25th International Florida Artificial Intelligence Research Society Conference (pp.214-219).

Crossley, S. A., Defore, C., Kyle, K, Dai, J., & McNamara, D. S. (2013). Paragraph specific N-Gram approaches to automatically assessing essay quality. In S.K. D’Mello., R. A, Calvo & A. Olney (Eds,). Proceedings of the 6th Educational Data Mining (EDM) Conference. (pp. 216-220). Heidelberg, Berlin, Germany: Springer.

Davies, A. (2013). Native speakers and native users: Loss and gain. Cambridge: Cambridge University Press.

Deane, P., & Quinlan, T. (2010). What automated analyses of corpora can tell us about students’ writing skills. Journal of Writing Research, 2(2), 151-177.

Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics, 47, 157-177.

Ellis, N.C., & Vlach-Simpson, R. (2009). Formulaic language in native speakers: Triangulating psycholinguistics, corpus linguistics, and education. Corpus Linguistics and Linguistic Theory, 5(1), 61-78.

Evert, S. (2004). The statistics of word cooccurrences: Word pairs and collocations. Unpublished Doctoral dissertation. Stuttgart, Germany: University of Stuttgart.

Fletcher, W.H. (2002-2007). KfNgram. Annapolis, MD: USNA.

Fussell, B. (2011). The local flavour of English in the Gulf. English Today, 27(4), 26-32.

Garner, J., Crossley, S., & Kyle, K. (2018a). Beginning and intermediate L2 writer’s use of ngrams: An association measures study. International Review of Applied Linguistics, Ahead of print. DOI:

Garner, J., Crossley, S., & Kyle, K. (2018b). Ngrams and L2 writing proficiency. System, 1-37. DOI: 10.1016/j.system.2018.12.001

Granger, S., & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study. International Review of Applied Linguistics, 52(3), 229-252.

Gries, S.T. (2008). Phraseology and linguistic theory: A brief survey. In. S. Granger and F. Meunier (Eds,). Phraseology: An interdisciplinary perspective (pp. 3–25). Amsterdam: John Benjamins.

Hamid, O.M. (2014). World Englishes in international proficiency tests. World Englishes, 33(2), 263-277.

Hasselgren, A. (1994). Lexical teddy bears and advanced learners. A study into the ways Norwegian students cope with English vocabulary. International Journal of Applied Linguistics, 4(2), 237-258.

Hawkins, J.A., & Filipovic, L. (2012). Criterial features in L2 English: Specifying the Reference Levels of the Common European Framework. Cambridge: Cambridge University Press.

Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Routledge.

Hunt, K.W. (1970). Do sentences in the second language grow like those in the first? TESOL Quarterly, 4(3), 195-202.

Jenkins, J. (2009). English as a lingua franca: Attitudes and interpretations. World Englishes, 28(2), 200–207.

Jenkins, J. (2014). English as a lingua franca in the international university: The politics of academic English language policy. London: Routledge.

Jeon, E.H. (2015). Multiple linear regression. In L. Plonsky (Ed.). Advancing quantitative methods in second language research (pp.130-158). New York & London: Routledge.

Kyle, K., Crossley, S.A., Dai, J., & McNamara, D.S. (2013). Native language identification: A key ngrams approach. Presentation at the Annual Conference of the North American Association for Computational Linguistics, Atlanta, GA.

Li, J., & Schmitt, N. (2009). The acquisition of lexical phrases in academic writing: A longitudinal case study. Journal of Second Language Writing, 18, 85-102.

Lowenberg, P. (2002). Assessing English proficiency in the expanding circle. World Englishes, 21(3), 431-435.

Mauranen, A. (2003). The corpus of English as a Lingua Franca in Academic Settings. TESOL Quarterly, 37(3), 513-527.

McCallum, L. (2019). Assessing second language proficiency under ‘unequal’ perspectives: A call for research in the MENA region. In S. Hidri. (Ed.), English Language Teaching in the Middle East and North Africa: Multiple Perspectives (pp.3-27). Palgrave Macmillan.

Ministry of Higher Education and Scientific Research. (2019). EmSAT Achieve test details. Available at: Last accessed: 28/03/2019.

Ministry of Higher Education and Scientific Research (2019). CEPA-English Public Test Specifications. Accessed at: Last accessed: 28/03/2019.

Myles, F. (2012). Complexity, accuracy and fluency: The role played by formulaic sequences in early interlanguage development. In A. Housen, F. Kuiken., & I. Vedder (Eds.). Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in Second Language Acquisition (pp.71-93). Amsterdam: John Benjamins.

Paquot, M. (2017). The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 121-145.

Paquot M. (2018). Phraseological competence: a useful toolbox to delimitate CEFR levels in higher education? Insights from a study of EFL learners’ use of statistical collocations. Language Assessment Quarterly, 15(1), 29-43.

Prodromou, L. (2007). English as a Lingua Franca: A corpus-based analysis. London: Continuum.

Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press.

Schneider, E.W. (2012). Exploring the interface between World Englishes and Second Language Acquisition – and implications for English as a Lingua Franca. Journal of English as a Lingua Franca, 1(1), 57-91.

Seidlhofer, B. (2001). Closing a conceptual gap: The case for a description of English as a lingua franca. International Journal of Applied Linguistics, 11(2), 133-158.

Seidlhofer, B. (2009). Common ground and different realities: World Englishes and English as a lingua franca. World Englishes, 28(2), 236-245.

Stanford Log-linear Part-of-Speech Tagger. (2019). Stanford POS. Available at: Last accessed: 28/03/2019.

Tabachnick, B.G., & Fidell, L.S. (2014). Using multivariate statistics (6th edition). Harlow, UK: Pearson Education Limited.

Taguchi, N., Crawford, W., & Wetzel, D.Z. (2013). What linguistic features are indicative of writing quality? A case of argumentative essays in a college composition program. TESOL Quarterly, 47(2), 420-430.

Treffers-Daller, J.T., Parslow, P., & Williams, S. (2018). Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39 (3), 302- 327.

UCREL. (2019). CLAWS tagger. Available at: Last

accessed: 28/03/2019.

Verspoor, M., Schmid, S.M., & Xu, X. (2012). A dynamic usage-based perspective on L2 writing. Journal of Second Language Writing, 21, 239-263.

Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.




How to Cite

McCallum, L. (2020). Relationships between measures of phraseological complexity and writing quality in a CEFR assessment context. Arab Journal of Applied Linguistics, 5(1).



New Articles