語料庫協同語言學習法於英語語法教室之長期應用成效:以台灣EFL學生語法能力、自我效益、學習動機、以及學習經驗為研究因素
Effects of Corpus-Aided Language Learning in the Efl Grammar Classroom: a Longitudinal Study on Taiwanese Students$ Grammar Proficiency, Self-Efficacy, Learning Motivation, and Learning Perspectives/Perceptions
「語料驅動學習」(Data-Driven Learning: DDL)為近年外語教學的熱門研究議題,許多 專家學者皆高度肯定它在語言教室的應用,尤其在以英語為外語(English as a Foreign Language: EFL)的課程中,DDL強化學生語法學習效益的潛力更獲得眾多研究理論支 持。然而,在該教學法得以全面地推廣到更多EFL語法教室前,DDL應用於不同教育 文化脈絡的長期成效仍需更多的研究論據支持,尤其在傳統演繹教學法(Traditional Deductive Approach: TDA)相對盛行的教育現場(如許多的台灣英語文法教室),DDL 的實質教學效益更待多元向度的檢視。有鑑於此,本研究擬採長期程比較實徵研究(DDL 實驗組與TDA對照組),從多元變項因素檢視DDL與TDA應用於台灣高等教育EFL 語法教室的成效差異,以客觀呈現DDL在台灣語法教室的適切性。本計晝擬以兩組大 一英文系主修生為研究主體,各組預計30名受試者(共60人),進行一學年(36週) 的教學實驗,檢驗兩組學生在語法能力、語法學習自我效益(self-efficacy)、以及語法 學習動機各方面的表現。此外,本研究亦將採用質性訪談,探究兩組學生的英語語法學 習經驗,並佐以長期實地課堂觀察筆記,深度比較、呈現台灣DDL語法學生的學習經 驗本質,全方位質量化DDL協同英語語法教學的成效。
Over the past years data-driven learning (DDL) has become a popular research topic in the field of foreign language teaching. Many scholars highly approve of its pedagogical applications in language education. It has also received various theoretical supports for its potential effects, particularly in teaching grammar to speakers of English as a second/foreign language (ESL/EFL). However, before DDL can be safely embraced in universal English grammar classes, its pedagogical efficacy must also be tested in wider education-cultural contexts, in specific those where grammar tends to be taught with traditional deductive approach (TDA), such as that in Taiwan. Given this, the current research project entails a long-term (36 weeks) comparative experiment on two groups of first-year English majors (30 students each) who will be taught grammar with either a DDL approach or TDA. The differences (if any) found between the groups will serve as relatively objective evidence judging the pedagogical suitability of DDL-centered treatments on Taiwanese EFL grammar students. To be specific, three critical factors will be quantitatively investigated, including participants5 grammar abilities, self-efficacy in grammar learning, and motivation towards learning grammar. One-on-one interviews, complemented with longitudinal in-class observation notes, will also be conducted to explore in depth the nature of both groups’ grammar learning experience. It is anticipated that the magnitude of this comparative research project will bring forth satisfactory results reflecting the efficacy of DDL in Taiwan’s EFL grammar classroom to a fuller extent.
https://www.grb.gov.tw/search/planDetail?id=11921542
「語料驅動學習法」對不同學習風格者的語法學習發展影響:以台灣EFL學生之語法成就表現、學習動機、自我效益、以及質性學習歷程為研究變項
Effects of Data-Driven Learning on Grammar Students with Different Styles of Learning: an Experimental Assessment of Taiwanese Efl Students' Grammar Performance, Learning Attitudes, and Qualitative Learning Experiences
語料驅動學習(DDL)的研究需求日趨重要,已成為第二外語學習領域的主要研究議題之一。該教學法在多項學習理論上扎根,學者多數肯定其應用於語言教室的可行性,並在各實驗場域中檢驗其教學效力。然而,在DDL得以推廣至更多的英語教室之前,它對不同學習風格的學生之教學效益仍待更多實徵論據支持。此一研究面向相當重要,已有學者於研究建議中共同提出,但目前相關的報告仍甚匱虞,致使該議題成為確立DDL全面效益的未解之題之一。有鑑於此,筆者擬執行一實徵研究,以24週的教學實驗為本,檢驗DDL對不同學習風格的台灣語法學生之效益,並以傳統演繹法(Traditional Deductive Approach: TDA)的學生進行比較。受試者將以四班大一非英文系學生為主,每班75人,其中兩班為DDL 實驗組,另兩班為 TDA 控制組,並以多元變項檢視各組不同學習風格的受試者表現。主要量化變項包括學習風格、語法成就表現、學習動機、自我效益、及教學適切性評量。筆者亦將以團體訪談探究各組不同學習風格的學生在實驗處置過程的學習經驗本質。本研究預計能以此質量多元設計,深度呈現DDL對本國不同學習風格的語法學生之學習成效。
The data-driven learning (DDL) approach has become one of the most popular topics of recent research in the field of second language acquisition. The approach has hitherto depended on various kinds of theoretical support, and many researchers and scholars have also strongly endorsed its pedagogical feasibility in language classrooms. The potential effects of the DDL approach have also been widely examined in a number of experimental sites, particularly in contexts where the teaching of grammar to speakers of English as a second/foreign language (ESL/EFL) is the main focus of investigation. The research has mostly been consistent in finding that DDL is pedagogically effective with ESL/EFL grammar students. However, before DDL can be more widely introduced to English grammar students, its pedagogical effects on students with different learning styles should be tested. This line of inquiry is important not only because it has been proposed frequently but seldom undertaken, even though scholars have deemed it one of the major concerns to verify before approving the efficacy of the method. Looking into this issue is particularly meaningful in educational contexts such as Taiwan’s, where grammar tends to be taught with the traditional deductive approach (TDA). Given these, the principal investigator aims to conduct a research project that entails a long-term (24-week) comparative experiment on four classes of first-year non-English majors (75 students in each class). Two classes will learn grammar with the DDL approach while the other two will be taught grammar with TDA. The differences (if any) found between the DDL and TDA classes are to serve as relatively objective evidence in judging the pedagogical suitability of DDL-centered treatments on Taiwanese EFL grammar students who have different learning styles. Both quantitative and qualitative approaches will be employed for this purpose. To be precise, five major quantitative variables will be accessed. They are participants’ learning styles, grammar performance, motivation to learn grammar, self-efficacy in learning grammar, and pedagogical judgments of the instructional suitability of the treatments they receive. Interviews will then be conducted with groups of learners who have the same learning preferences. The interview accounts will generate evidence for comparing in depth the learning experiences of students who have different learning styles. It is expected that the magnitude of this comparative research project will yield satisfactory results reflecting more amply the overall efficacy of DDL in Taiwan’s EFL grammar classrooms.
https://www.grb.gov.tw/search/planDetail?id=13131193
語料驅動學習:複合學習模式在英語為外語課堂的運用:以假設語氣文法為例
Learning Conditional Clauses with Data-Driven Learning: Integrated Learning Mode Implemented in EFL Classes in Taiwan
王睿賢 , Masters Advisor:林銘輝
在過去的數十年中,已經有許多關於語料驅動學習(Data-Driven Learning; DDL)的實證研究證實了DDL的效果;然而,現時的研究調查大多主要在研究DDL用於教授相對固定的語言結構的成效,對於DDL在相對抽象的語法教學(如假設語氣)的功效所知甚少。為了彌平研究缺口,且更全面地評估DDL的成效,本研究對臺灣兩組大學一年級的EFL學生進行了實證比較,他們分別進入以DDL或文法翻譯法(Grammar-Translation Method; GTM)教學的兩種課堂,以實體課程搭配遠距教學活動的方式(部分遠距教學模式),學習英文假設語氣的四種條件句。兩組學生在語法上的表現透過前測、後測和延遲後測檢驗,然後透過SPSS的相依樣本t檢定和獨立樣本t檢定進行分析。研究結果指出,兩組學生均顯著改善了他們英文假設語氣的文法使用能力。DDL的表現優於GTM,但DDL的成就測驗分數在延遲後測中退步地比GTM多,由此可推測,DDL可以幫助學習者提高語法能力,但GTM有助於長期表現的發展。在學習動機與自我效能方面,GTM在實驗介入後有統計學上的提升,而DDL則持平,這反映出臺灣學生在某種程度上仍然習慣於GTM等類型的傳統演繹教學法。建議未來的教育工作者在使用DDL或GTM教學時可以權衡兩種方法的優、缺點。
Over the past decades, many empirical studies have been conducted and thrown evidence in support of DDL’s efficacy in general. However, current investigations have mostly been performed to examine the effects of DDL on relatively fixed linguistic patterns. Little has been known about DDL’s efficacy on relatively abstract grammatical rules, such as conditional clauses. To fill this research gap so as to assess the effects of DLL in a fuller degree, this study carried out an empirical comparison between two groups of first-year Taiwanese EFL students who were taught four types of conditional clauses, separately with either DDL or GTM, integrated with distance learning activities. Both groups’ performances in grammar was examined by using pre-, post- and delayed post-tests and then analyzed by means of paired-sample and independent-sample t-tests in SPSS. The results indicated that both groups significantly improved their use of English conditional clauses. DDL outperformed GTM, but DDL’s scores regressed more than GTM’s did in the delayed post-test, which was believed that DDL could help for learners’ grammar improvement, whereas GTM could help with long-term performance. In terms of learners’ motivation and self-efficacy, GTM had statistical gains after the treatment, whereas DDL stayed at the same level, which reflected the fact that to some extent, Taiwanese students are still accustomed to conventional deductive approaches, such as the GTM used here in this study. Future educators may like to weigh the pros and cons of both methods when using either of them.
Cutting out the middleman in Data-Driven Learning

Importance is placed on empirical data when taking a corpus-informed and data-driven approach to language learning and teaching. Moving away from subjective conclusions about language based on an individual’s internalized cognitive perception of language and the influence of generic language education resources, empirical data enable language teachers and learners to reach objective conclusions about specific language usage based on corpus analyses. Tim Johns coined the term Data-Driven Learning (DDL) in 1991 with reference to the use of corpus data and the application of corpus-based practices in language learning and teaching (Johns, 1991). The practice of DDL in language education was appropriated from computer science where language is treated as empirical data and where “every student is Sherlock Holmes”, investigating the uses of language to assist with their acquisition of the target language (Johns, 2002:108).
A review of the literature indicates that the practice of using corpora in language teaching and learning pre-dates the term DDL with work carried out by Peter Roe at Aston University in 1969 (McEnery & Wilson, 1997, p.12). Johns is also credited for having come up with the term English for Academic Purposes (Hyland, 2006). Johns’ oft quoted words about cutting out the middleman tell us more about his DDL vision for language learning; where teacher intuitions about language were put aside in favor of powerful text analysis tools that would provide learners with direct access to some of the most extensive language corpora available, the same corpora that lexicographers draw on for making dictionaries, to discover for themselves how the target language is used across a variety of authentic communication contexts. As with many brilliant visions for impactful educational change, however, his also appears to have come before its time.
This post will argue that the original middleman in Johns’ DDL metaphor took on new forms beyond that of teachers getting in the way of learners having direct access to language as data. An argument will be put forward to claim that the applied corpus linguistics research and development community introduced new and additional barriers to the widespread adoption of DDL in mainstream language education. Albeit well intentioned and no doubt defined by restrictions in research and development practices along the way, new middlemen were paradoxically perpetuated by the proponents of DDL making theirs an exclusive rather than a popular sport with language learners and

teachers (Tribble, 2012). And, with each new wave of research and development in applied corpus linguistics new and puzzling restrictions confronted the language teaching and learning community.
The middleman in DDL has presented himself as a sophisticated corpus authority in the form of research and development outputs, including text analysis software designed by, and for, the expert corpus user with complex options for search refinement that befuddled the non-expert corpus user, namely language teachers and learners. Replication of these same research methods to obtain the same or similar results for uses in language teaching and learning has often been restricted to securing access to the exact same software and know-how for manipulating and querying linguistic data successfully.
Which language are you speaking?
He has been known to speak in programming languages with his interfaces often requiring specialist trainers to communicate his most simple functions. Even his most widely known KWIC (Key Word In Context) interface for linguistic data presentation with strings of search terms embedded in truncated language context snippets remain foreign-looking to the mostly uninitiated in language teaching and learning. In many cases, he has not come cheap either and requirements for costly subscriptions to and upgrades of his proprietary soft wares have been the norm, especially in the earlier days.
In particular, with reference to English Language Teaching (ELT), he has criticized many widely used ELT course book publications and their language offerings for ignoring his research findings based on evidence for how the English language is actually used across different contexts of use. In response, a few ELT course book publishers have clamored around him to help him get his words out for a price but in so doing have rendered his corpus analyses invisible, in turn creating even more of a dependency on course books rather than stimulating autonomy among language teachers and learners in the use of corpora and text analysis tools for DDL. And, because publishers were primarily confining him to the course book and sometimes CD-ROM format there were only so many language examples from the target corpora that could possibly fit between the covers of a book and only the most frequent language items made it onto the compact disc.
The Oxford Collocation Dictionary for Students of English, (2nd Edition from 2009 by Oxford University Press) based on the British National Corpus (BNC) is one example where high frequency collocations for very basic words like any and new predominate and where licensing restrictions permit only one computer installation per CD ROM. Further restrictions compound the openness issue with the use of closed corpora in leading corpus-derived ELT books such as the Cambridge University Press (CUP) publication, From Corpus to Classroom (O’Keeffe, McCarthy & Carter, 2007), which might have been more aptly entitled, From Corpus to Book, as it draws heavily on the closed Cambridge and Nottingham Discourse Corpus of English (CANCODE) from Cambridge University Press and Nottingham University and recommends the use of proprietary concordancing programs, Wordsmith Tools and MonoConc Pro, thereby rendering any replication of analyses for the said corpus inaccessible to its readers.
Mainstream language teacher training bodies continue to sidestep the DDL middleman in the development of their core training curricula (for example, the Cambridge ESOL exams) due to the problems he proposes with accessibility in terms of cost and complexity. Instead, English language teacher training remains steadily focused on how to select and exploit corpus-derived dictionaries with reference to training learners in how to identify, for example: definitions, derivatives, parts of speech, frequency, collocations and sample sentences. In the same way that corpus-derived course books do not render corpus analyses transparent to their users, training in dictionary use does not bring teachers and their learners any closer to the corpora they are derived from.
Cambridge English Corpus
Michael McCarthy presented, ‘Corpora and the advanced level: problems and prospects’ at IATEFL Liverpool 2013. One of the key take-away messages from his talk was the fact that learners of more advanced English receive little in the way of return on investment once the highest frequency items of English vocabulary had been acquired (he referred to the top 2000 words from the first wordlist of the British National Corpus that make up about 80% of standard English use). To learn the subsequent wordlists of 2000 words each the percentage of frequency in usage drops considerably, so in terms of cost for the time and money you might end up spending if you sign up to yet more English language classes may not be affordable or feasible. This has particular implications in learning English for Specific Purposes (ESP), including English for Academic Purposes (EAP) which many would argue is always concerned with developing specific academic English language knowledge and usage within specific academic discourse communities.
Catching Michael McCarthy on the way out of the presentation theatre he kindly agreed to walk and talk while rushing to catch his train out of Liverpool. Would the Cambridge English Corpus be made available anytime soon for non-commercial educational research and materials development purposes, I asked? I hastened to add the possibilities and the real world need for promoting corpus-based resources and practices in open and distance online education as well as in traditional classroom-based language education. He agreed that the technology had become a lot better for finally realising DDL within mainstream language teaching and learning and within materials development. Taking concordance line printouts into ELT classrooms had never really taken off in his estimation and I would have to agree with him on that point. He indicated that it would be unlikely for the corpus to become openly available anytime in the foreseeable future, however, due to the large amount of private investment in the development of the corpus with restricted access for those participating stakeholders on the project only.
But what would the real risk be in opening up this corpus to further educational research and development for non-commercial purposes with derivative resources made freely available online? Wouldn’t this be giving the corpus resource added sustainability with new lives and further opportunities for exploitation that could advance our shared understanding of how English works? – across different contexts, using current and high quality examples of language in context? More importantly, wouldn’t this give more software developers the chance to build more interfaces using the latest technology, and for more ELT materials developers, including language teachers, the chance to show different derivative resource possibilities for effectively using the corpus in language teaching and learning?
A non-commercial educational purpose only stipulation could be used in all of the above resource development scenarios. Indeed, these could all be linked back to the Cambridge English Corpus project website as evidence of the wider social and educational impact as a result of their initial investment. This is what will be happening with most of the publicly funded research projects in the UK following recommendations from the Finch report which come into effect in April 2014. It follows that Open Educational Resources (OER) and Open Educational teaching Practices (OEP) will allow for expertise to be readily available when Open Access research publishing is compulsory for all RCUK and EPSRC funding grants for the development of research-driven open teaching and learning derivatives. Privately funded research projects like this one from CUP could also be leading in this area of open access.
Corpora such as the British National Corpus (BNC), the British Academic Written English (BAWE) corpus, Wikipedia and Google linguistic data as a corpus are some of the many valuable resources that have all been developed into language learning and teaching resources that are openly available on the web. In the following sections, I will refer to leading applied corpus linguistics research and development outputs from leading researchers who have been making their wares freely available if not openly re-purposeable to other developers, as in the example of the FLAX language project’s Open Source Software (OSS). And, hopefully these corpus-based resources are getting easier to access for the non-expert corpus user.
“For the time being” CUP are providing free access to the English Vocabulary Profile website of resources based on the Cambridge English Corpus (formerly known as the Cambridge International Corpus), “the British National Corpus and the Cambridge Learner Corpus, together with other sources, including the Cambridge ESOL vocabulary lists and classroom materials.” Below is a training video resource from CUP available on YouTube, which highlights some of the uses for these freely available resources in language learning, teaching and materials development. This is a very useful step for CUP to be taking with making corpus-based resources and practices more accessible to the mainstream ELT community.
Open practices in applied corpus linguistics
Enter those applied corpus linguistics researchers and developers who have made some if not all of their text analysis tools and Part-Of-Speech-tagged corpora freely accessible via the Web to anyone who is interested in exploring how to use them in their research, teaching or independent language learning. Well-known web-based projects include Tom Cobb’s resource-rich Lextutor site, Mark Davies’ BYU-BNC (Brigham Young University – British National Corpus) concordancer interface and the Corpus of Contemporary American English (COCA) with WordandPhrase (with WordandPhrase training videos resources on YouTube) for general English and English for Academic Purposes (EAP), Laurence Anthony’s AntConc concordancing freeware for Do-It-Yourself (DIY) corpus building (with AntConc training video resources on YouTube), and the Sketch Engine by Lexical Computing which offers some open resources for DDL. Open invitations from the Lextutor and AntConc project developers seeking input on the design, development and evaluation of existing and proposed project tools and resources are made by way of social networking sites, the Lextutor Facebook group and the AntConc Google groups discussion list. Responses usually come from a steady number of DDL ‘geeks’, however, namely those who have reached a level of competence and confidence with discussing the tools and resources therein. And, most of those actively participating in these social networking sites are also engaging in corpus-based research.Data-Driven Learning for the masses?
My own presentation at IATEFL Liverpool was based on my most recent project with the University of Oxford IT Services for providing and promoting OSS interfaces from the FLAX language project for increasing access to the BNC and BAWE corpora, both managed by Oxford. In addition to this, the same OSS developed by FLAX has been simplified with the development of easy-to-use interfaces for enabling language teachers to build their own open language collections for the web. Such collections using OER from Oxford lecture podcasts, which have been licensed as creative commons content, have also been demonstrated by the TOETOE International project (Fitzgerald, 2013).
The following two videos from the FLAX language collections show their OSS for using corpus-based resources in ELT that are accessible both in terms of simplicity and in terms of openness. The first training video demonstrates the Web as corpus and how this resource has been effectively mined and linked to the BNC for enhancement of both corpora for uses in DDL. The second training video demonstrates how to build your own Do-It-Yourself corpora using the FLAX OSS and Oxford OER. With open corpus-based resources the reality of DIY corpora is becoming increasingly possible in DDL research and teaching and learning practice (Charles, 2012; Fitzgerald, in press).
So, go ahead, and cut out the middleman in data-driven learning.
FLAX Web Collections (derived from Google linguistic data):
The Web Phrases and Web Collocations collections in FLAX are based on another extensive corpus of English derived from Google linguistic data. In particular, the Web Phrases collection allows you to identify problematic phrasing in writing by fine-tuning words that precede and follow phrases that you would like to use in your writing by drawing on this large database of English from Google. This allows you to substitute any awkward phrasing with naturally occurring phrases from the collection to improve the structure and the fluency of writing.
FLAX Do-It-Yourself Podcast Corpora – Part One:
Learn how to build powerful open language collections through this training video demonstration. Featuring audio and video podcast corpora using the FLAX Language tools and open educational resources (OER) from the OpenSpires project at the University of Oxford and TED Talks.
References
Anthony, L. (n.d.). Laurence Anthony’s Website: AntConc. Retrieved from http://www.antlab.sci.waseda.ac.jp/software.html
Cobb, T. (n.d). Compleat Lexical Tutor. Retrieved from http://www.lextutor.ca/
Charles, M. (2012). ‘Proper vocabulary and juicy collocations’: EAP students evaluate do-it-yourself corpus-building. English for Specific Purposes, 31: 93-102.
Davies, M. (1991-present). The Corpus of Contemporary American English (COCA). Retrieved from http://corpus.byu.edu/coca/
Davies, M. & Gardener, D. (n.d.) WordandPhrase. Retrieved from http://www.wordandphrase.info
Fitzgerald, A. (2013). TOETOE International: FLAX Weaving with Oxford Open Educational Resources. Open Educational Resources International Case Study. Commissioned by the Higher Education Academy (HEA), United Kingdom. Retrieved from http://www.heacademy.ac.uk/projects/detail/oer/OER_int_006_Ox%282%29
Fitzgerald, A. (In Press). Openness in English for Academic Purposes. Open Educational Resources Case Study based at Durham University: Pedagogical development from OER practice. Commissioned by the Higher Education Academy (HEA) and the Joint Information Systems Committee (JISC), United Kingdom.
FLAX. (n.d.). The “Flexible Language Acquisition Project”. Retrieved from http://flax.nzdl.org/
Johns, T. (1991). From printout to handout: grammar and vocabulary teaching in the context of data-driven learning. In: T. Johns & P. King (Eds.), Classroom Concordancing. English Language Research Journal, 4: 27-45.
Johns, T. (2002). ‘Data-driven learning: the perpetual challenge.’ In: B. Kettemann & G. Marko (Eds.), Teaching and Learning by Doing Corpus Analysis. Amsterdam: Rodopi. 107-117.
Hyland, K. (2006). English for Academic Purposes: An Advanced Handbook. London: Routledge.
McEnery, T. & A. Wilson. (1997). Teaching and language corpora. ReCALL, 9 (1): 5-14.
O’Keeffe, A., McCarthy, M., & Carter R. (2007). From Corpus to Classroom: language use and language teaching. Cambridge: Cambridge University Press.
Oxford Collocation Dictionary for Students of English (2nd Edition) (2009), Oxford University Press.
Tribble, C. (2012). Teaching and Language Corpora Survey. Retrieved from http://www.surveyconsole.com/console/TakeSurvey?id=742964
























