Previous studies have used psycholinguistic techniques such as eye-tracking and selfpaced reading in order to investigate the psychological validity of corpus-derived collocations (e.g. Conklin & Schmitt 2008; McDonald & Shillcock 2003a; 2003b; Underwood et al. 2004; Huang et al. 2012). The results of these studies reveal that sequences of words which form collocations are read more quickly and receive fewer fixations than sequences of words which do not form collocations. However, behavioural data and eye-tracking data can only ever provide an indirect measure of what is going on in the brain during language processing. In this thesis, I therefore investigate the psychological validity of corpus-derived collocations using a direct measure of neural activity, namely electroencephalography (EEG). More specifically, I use the event-related potential (ERP) technique of analysing brainwave data.
Very few ERP studies focus on collocation, and those that do focus on collocation conceptualize and operationalize the notion differently from how it is conceptualized and operationalized in this thesis, or indeed in most corpus linguistics work. For example, although Molinaro and Carreiras (2010:179-180) use corpus-derived collocations for an ERP study, they explicitly state that they only extract collocations which are “idioms or clichés”. By contrast, in this thesis, collocation is conceptualized as a more fluid phenomenon, as compositional or non-compositional word pairs where the words have a high probability of occurring together.
In Experiment 1, which is the first of four ERP experiments presented in this thesis, I aim to pilot a procedure for determining whether or not there is a neurophysiological difference in the way that the native speaker brain processes collocational adjective-noun bigrams compared to non-collocational adjective-noun bigrams. In Experiment 2, I aim to replicate the results of the pilot study using another group of native English speakers; while, in Experiment 3, I aim to investigate the processing of collocational adjective-noun bigrams and noncollocational adjective-noun bigrams in non-native speakers of English (specifically, native speakers of Mandarin Chinese). In Experiment 4, the final experiment of this thesis, I then aim to investigate the gradience of the ERP response as well as the psychological validity of different association measures, namely transition probability, mutual information, loglikelihood, z-score, t-score, Dice-coefficient, MI3, and raw frequency.
The results of these studies reveal that there is a neurophysiological difference in the way that the brain processes corpus-derived collocational bigrams compared to matched noncollocational bigrams, suggesting that the phenomenon of collocation can be seen as having psychological validity. An important finding of this thesis is the discovery of the ‘Collocational N400’: an ERP component reflecting the increase in cognitive load associated with reading a collocational violation. This increase in cognitive load is greater for non-native speakers compared to native speakers, as non-native speakers have less flexibility than native speakers in their use of (non-)collocational patterns. Moreover, while there is a strong correlation between the amplitude of the collocational N400 and all of the measures of collocation strength that I investigate in Experiment 4, the strongest correlations exist between amplitude and the hybrid association measures, including z-score, MI3, and Dice co-efficient. This suggests that mutual information and log-likelihood, which are two of the most commonly used association measures in corpus linguistics (Gries 2014a:37), are not necessarily always the optimal choice. I discuss these results in relation to prior literature from the fields of corpus linguistics and cognitive neuroscience.