Exploring word order in learner corpora: The Woslac Project

Lancaster University

Electronic data

CORPUS_RESEARCH_SEMINAR-ultima.ppt
2.65 MB, application/vnd.ms-powerpoint

Keywords

learner corpora, word order, unaccusaticity, heaviness, topic and focus.

View graph of relations

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper

Unpublished

Standard

Exploring word order in learner corpora: The Woslac Project. / Mendikoetxea, Amaya.
2006. Paper presented at Corpus Research Group, Lancaster.

Research output: Contribution to conference - Without ISBN/ISSN › Conference paper

Harvard

Mendikoetxea, A 2006, 'Exploring word order in learner corpora: The Woslac Project', Paper presented at Corpus Research Group, Lancaster, 20/11/06.

APA

Mendikoetxea, A. (2006). Exploring word order in learner corpora: The Woslac Project. Paper presented at Corpus Research Group, Lancaster.

Vancouver

Mendikoetxea A. Exploring word order in learner corpora: The Woslac Project. 2006. Paper presented at Corpus Research Group, Lancaster.

Author

Mendikoetxea, Amaya. / Exploring word order in learner corpora: The Woslac Project. Paper presented at Corpus Research Group, Lancaster.44 p.

Bibtex

@conference{2a4a52a33cfc41e69e0d50e6c1bd61c4,

title = "Exploring word order in learner corpora: The Woslac Project",

abstract = "This presentation reports on work in progress under the framework of a research project investigating word order in Second Language Acquisition (WOSLAC), based on two written learner corpora: WriCLE (L1 Spanish - L2 English) and CEDEL2 (L1 English - L2 Spanish). In the first part of the presentation I will discuss (i) the motivation and objectives of the project, (ii) data collection, (iii) query software and (iv) data analysis. In the second part, I will briefly present the results of a preliminary study on the production of postverbal subjects by Spanish learners of English. The purpose of this three-year project is to determine the properties which constrain word order in the interlanguage of L2 learners of English (with L1 Spanish) and L2 learners of Spanish (with L1 English). We examine both lexicon-syntax and syntax-discourse properties. Word order in English and Spanish differs significantly: in English word order is often said to be {\^a}��fixed{\^a}��, while Spanish allows for what is often referred to as {\^a}��free order{\^a}��. The two languages differ in the devices they employ to order constituents in the sentence In languages with free word order, information structure properties and discourse properties in general play a crucial role in the position occupied by constituents in sentences, while lexico-syntactic properties mostly determine the ordering of constituents in fixed word order languages. An in-depth investigation into word order in advanced learners of L2 English and L2 Spanish will thus offer answers to questions regarding the relative difficulty of acquiring lexical-syntactic and syntactic-discursive properties, as well as general issues related to L1 transfer and the occurrence of constructions which cannot be attributed to the L1 nor to the target language. Learner corpora are an invaluable tool to explore these issues. Our target is for WriCLE and CEDEL2 to reach 1 million words by the end of the three year period. The corpora will be annotated using UAM CorpusTool, which has been adapted for this study. The tool allows an analyst to select a text from the corpus, and annotate it in various ways. The analyst can highlight a segment (e.g., an it-cleft) and then assign features to that segment. The tool produces an XML-encoded version of the text file, including the features assigned to the segments. Because hand-annotation is slow, the tool will allow the analyst to associate lexico-syntactic patterns with each feature, allowing the tool to automatically detect instances of the pattern. For instance, a pattern like: {\^a}��it be# NP that{\^a}�� would match sentences in the corpus like {\^a}��It was John that we saw{\^a}��, and tentatively mark them with the feature it-cleft. The tool would then ask the user to eliminate false matches. This approach eliminates much of the corpus annotation effort. In the second part of the talk I will present briefly the results in Lozano & Mendikoetxea (in press) - a preliminary study whose purpose is to characterise the production of postverbal subjects in the Italian and Spanish subcorpora of ICLE (Granger et al. 2002). Our approach seeks to identify the conditions under which learners produce inverted subjects, regardless of problems to do with grammaticalition. Our findings reveal that Spanish and Italian learners of L2 English produce postverbal subjects in the same contexts in which these are found in native English, though they show persistent grammaticalisation errors. That is, postverbal subjects are found when (H1) the verb is unaccusative, (H2) the subject is long or {\^a}��heavy{\^a}��, and (H3) the subject is new (or relatively new) information or {\^a}��focus{\^a}��.",

keywords = "learner corpora, word order, unaccusaticity, heaviness, topic and focus.",

author = "Amaya Mendikoetxea",

year = "2006",

month = nov,

day = "20",

language = "English",

note = "Corpus Research Group ; Conference date: 20-11-2006",

}

RIS

TY - CONF

T1 - Exploring word order in learner corpora: The Woslac Project

AU - Mendikoetxea, Amaya

PY - 2006/11/20

Y1 - 2006/11/20

N2 - This presentation reports on work in progress under the framework of a research project investigating word order in Second Language Acquisition (WOSLAC), based on two written learner corpora: WriCLE (L1 Spanish - L2 English) and CEDEL2 (L1 English - L2 Spanish). In the first part of the presentation I will discuss (i) the motivation and objectives of the project, (ii) data collection, (iii) query software and (iv) data analysis. In the second part, I will briefly present the results of a preliminary study on the production of postverbal subjects by Spanish learners of English. The purpose of this three-year project is to determine the properties which constrain word order in the interlanguage of L2 learners of English (with L1 Spanish) and L2 learners of Spanish (with L1 English). We examine both lexicon-syntax and syntax-discourse properties. Word order in English and Spanish differs significantly: in English word order is often said to be â��fixedâ��, while Spanish allows for what is often referred to as â��free orderâ��. The two languages differ in the devices they employ to order constituents in the sentence In languages with free word order, information structure properties and discourse properties in general play a crucial role in the position occupied by constituents in sentences, while lexico-syntactic properties mostly determine the ordering of constituents in fixed word order languages. An in-depth investigation into word order in advanced learners of L2 English and L2 Spanish will thus offer answers to questions regarding the relative difficulty of acquiring lexical-syntactic and syntactic-discursive properties, as well as general issues related to L1 transfer and the occurrence of constructions which cannot be attributed to the L1 nor to the target language. Learner corpora are an invaluable tool to explore these issues. Our target is for WriCLE and CEDEL2 to reach 1 million words by the end of the three year period. The corpora will be annotated using UAM CorpusTool, which has been adapted for this study. The tool allows an analyst to select a text from the corpus, and annotate it in various ways. The analyst can highlight a segment (e.g., an it-cleft) and then assign features to that segment. The tool produces an XML-encoded version of the text file, including the features assigned to the segments. Because hand-annotation is slow, the tool will allow the analyst to associate lexico-syntactic patterns with each feature, allowing the tool to automatically detect instances of the pattern. For instance, a pattern like: â��it be# NP thatâ�� would match sentences in the corpus like â��It was John that we sawâ��, and tentatively mark them with the feature it-cleft. The tool would then ask the user to eliminate false matches. This approach eliminates much of the corpus annotation effort. In the second part of the talk I will present briefly the results in Lozano & Mendikoetxea (in press) - a preliminary study whose purpose is to characterise the production of postverbal subjects in the Italian and Spanish subcorpora of ICLE (Granger et al. 2002). Our approach seeks to identify the conditions under which learners produce inverted subjects, regardless of problems to do with grammaticalition. Our findings reveal that Spanish and Italian learners of L2 English produce postverbal subjects in the same contexts in which these are found in native English, though they show persistent grammaticalisation errors. That is, postverbal subjects are found when (H1) the verb is unaccusative, (H2) the subject is long or â��heavyâ��, and (H3) the subject is new (or relatively new) information or â��focusâ��.

AB - This presentation reports on work in progress under the framework of a research project investigating word order in Second Language Acquisition (WOSLAC), based on two written learner corpora: WriCLE (L1 Spanish - L2 English) and CEDEL2 (L1 English - L2 Spanish). In the first part of the presentation I will discuss (i) the motivation and objectives of the project, (ii) data collection, (iii) query software and (iv) data analysis. In the second part, I will briefly present the results of a preliminary study on the production of postverbal subjects by Spanish learners of English. The purpose of this three-year project is to determine the properties which constrain word order in the interlanguage of L2 learners of English (with L1 Spanish) and L2 learners of Spanish (with L1 English). We examine both lexicon-syntax and syntax-discourse properties. Word order in English and Spanish differs significantly: in English word order is often said to be â��fixedâ��, while Spanish allows for what is often referred to as â��free orderâ��. The two languages differ in the devices they employ to order constituents in the sentence In languages with free word order, information structure properties and discourse properties in general play a crucial role in the position occupied by constituents in sentences, while lexico-syntactic properties mostly determine the ordering of constituents in fixed word order languages. An in-depth investigation into word order in advanced learners of L2 English and L2 Spanish will thus offer answers to questions regarding the relative difficulty of acquiring lexical-syntactic and syntactic-discursive properties, as well as general issues related to L1 transfer and the occurrence of constructions which cannot be attributed to the L1 nor to the target language. Learner corpora are an invaluable tool to explore these issues. Our target is for WriCLE and CEDEL2 to reach 1 million words by the end of the three year period. The corpora will be annotated using UAM CorpusTool, which has been adapted for this study. The tool allows an analyst to select a text from the corpus, and annotate it in various ways. The analyst can highlight a segment (e.g., an it-cleft) and then assign features to that segment. The tool produces an XML-encoded version of the text file, including the features assigned to the segments. Because hand-annotation is slow, the tool will allow the analyst to associate lexico-syntactic patterns with each feature, allowing the tool to automatically detect instances of the pattern. For instance, a pattern like: â��it be# NP thatâ�� would match sentences in the corpus like â��It was John that we sawâ��, and tentatively mark them with the feature it-cleft. The tool would then ask the user to eliminate false matches. This approach eliminates much of the corpus annotation effort. In the second part of the talk I will present briefly the results in Lozano & Mendikoetxea (in press) - a preliminary study whose purpose is to characterise the production of postverbal subjects in the Italian and Spanish subcorpora of ICLE (Granger et al. 2002). Our approach seeks to identify the conditions under which learners produce inverted subjects, regardless of problems to do with grammaticalition. Our findings reveal that Spanish and Italian learners of L2 English produce postverbal subjects in the same contexts in which these are found in native English, though they show persistent grammaticalisation errors. That is, postverbal subjects are found when (H1) the verb is unaccusative, (H2) the subject is long or â��heavyâ��, and (H3) the subject is new (or relatively new) information or â��focusâ��.

KW - learner corpora

KW - word order

KW - unaccusaticity

KW - heaviness

KW - topic and focus.

M3 - Conference paper

T2 - Corpus Research Group

Y2 - 20 November 2006

ER -

Research

Electronic data

Keywords