Using Corpus Analysis Software to Analyse Specialised Texts
1. What is a corpus?
In corpus linguistics, a corpus can be generally defined as… ‘a collection of naturally-occurring texts in a computer-readable format which can be retrieved and analyzed using corpus analysis software’ (Kennedy, 1998; McEnery & Wilson, 2001; O’Keeffe, A., McCarthy, M., & Carter, R. , 2007; Teubert & Cermakova, 2007)
2.Sources of
language corpora
3. Designing a specialized corpus
- http://www.natcorp.ox.ac.uk/ ·
- http://corpus.leeds.ac.uk/protected/query.html
- http://corpus.byu.edu/
- http://lextutor.ca/conc/eng/
- Antconc’ (http://www.antlab.sci.waseda.ac.jp/software.html) (http://www.lexically.net/wordsmith/)
- ‘Paraconc’ (http://www.athel.com/para.html)
3. Designing a specialized corpus
Corpus
size
- There are no fixed ruled; depending on research purposes, availability of data and time.
- Large, general corpora may be less useful than small, focused corpora if searches are made on context-specific terms.
- There are limitations of ‘too small’ corpora e.g. not enough concepts, terms, or patterns under investigation.
- It is preferable to create a ‘monitor’ or ‘open’ corpus because specialized words/usage are dynamic.
Text
extracts vs. full texts
- Depends on the aim of corpus compilation.
- Whole text offers more coverage because words or terms to be looked at may be randomly distributed throughout the text.
- Specific sections may be helpful if we are looking for words or phrase under particular content areas or want to create purposeful sub-corpora.
Number of texts
- Choices can be made between collect few texts of large size or a number of texts with smaller sizes.
- Choices can also be made between selecting texts written by one or two key writers or sources, or texts retrieved from different sources or written by different authors.
- Depends on your research focus e.g. to study overall language use or to study idiosyncrasy or linguistic choices preferred by particular writers.
Medium
- Can be spoken or written texts or mixed.
- Depends on research questions.
- Some practical factors should also be considered e.g. compiling spoken corpora can be time-consuming and needs special types of tagging.
Subject and text type
- Should mainly focus on the specialized text under investigation, although this is less clear-cut in multidisciplinary subjects.
- Texts may come from different subject if the research focus is on the study of particular language features rather than term extraction.
- Text types within a specialized subject field may vary from ‘expert-to-expert’ texts to ‘expert-to-non-expert’ texts, or in other words, from technical to popular texts.
Other considerations
- Authorship: Texts written by experts in a field tend to present more reliable and authentic examples of specialized language.
- · Language: Specialized texts can be stored and retrieved in the form of monolingual, comparable, or parallel corpora.
- Publication date: Texts should come from recent publications unless queries are made in relation to particular periods of time.
ไม่มีความคิดเห็น:
แสดงความคิดเห็น