Chapter 4      Lexis: From Collocation to Colligation


4.1 Introduction


This thesis is primarily concerned with two main interrelated concepts: business and lexis. The last chapter surveyed the literature concerning business language research and also noted the limitation of lexical studies into business. Unlike business language, however, research into lexis has both a long tradition and a broad and comprehensive literature. As lexis is central to this research, this chapter will review work that has been carried out on it in relation to the three main aspects of lexis that form the basis for the analysis of Business English in this thesis: collocation, semantic prosody and colligation. A further element important to this work - prefabricated language - will also be reviewed here, leading to the final section of the chapter where the pedagogical implications of studies into lexis will be discussed in relation to the lexical approach (Lewis 1993, 1997).   The chapter is formed from three main parts.


·    The first part of the chapter briefly covers the history of lexical research and thought, reviewing the major movements of nineteenth and twentieth century vocabulary research. 

·    The second part of the chapter looks in detail at the notion of collocation, from its first definitions to more recent definitions and usage. It will be seen that whilst there is some general consensus on what collocation is, the concept has been defined and used differently by its researchers. Aspects of collocation important for the lexical analysis of Business English will be discussed and Sinclair’s (1987, 1991) notion of the idiom principle will be looked at in detail.

·    Discussion on collocation leads logically to a closer look at semantic prosody which was briefly defined in Chapter 2. This shows how words not only typically collocate with certain other words, but also typically collocate with semantic sets.

·    After semantic prosody, colligation will be discussed. Collocation and semantic prosody are both concerned with the typical lexical patterning of words. Colligation is concerned with the typical grammatical patterning of words (or word classes). It will be seen that collocation, semantic prosody and colligation are not totally separate concepts, but are, rather, interdependent and together create a network of meaning.

·    The section ends with a working definition of collocation, semantic prosody and colligation for the purpose of this research. It further defines how all these concepts are used in the analysis of Business English in this thesis.

·    The final, third section, reviews work on prefabricated language - notably multi-word  items (MWIs). MWIs - longer chunks or clusters of language - form part of the analysis of Business English in this research, thus necessitating a review of the key concepts related to them. Research into MWIs is both expansive and, due to the inability of researchers to agree on uniform terminology, complex and confusing. A brief chronological overview of the development of thought on MWIs will be presented, and the central tenets on which research has rested will be discussed. The pedagogical implications of MWI research will then be discussed by relating it to the lexical approach to language teaching as put forward by Lewis (1993, 1997). This section will end with a definition of how MWIs have been studied in this thesis with regard to Business English.


4.2 Vocabulary and pedagogy: a brief history


When looking back over the history of vocabulary research it soon becomes obvious that many of the ‘new’ ideas of today were indeed thought of long ago. Defoe’s stress of the importance of business language mentioned in Chapter 2 is matched by the insights of writers from both the nineteenth and early twentieth centuries. Howatt (1984) describes how, as early as the mid-1800s, Thomas Prendergast (1806-1886), a British civil servant who spent time working in India, noted that children learn not just words, but ‘chunks’ of language and utilise these fluently in their speech. In another example, Howatt notes how the linguistic descriptions of Harold Palmer (1887-1949) sound distinctly Chomskyan fifty years before Chomsky. Palmer presented the idea of known units - a database of language that learners need to acquire (similar to Chomsky’s notion of competence),  and from which can be generated an infinite set of sentences - secondary matter (similar to Chomsky’s notion of performance) (Howatt 1984:237). A further idea originating in the nineteenth century, through the work of Pitman, was to shape thinking on the nature and learning of lexis and its pedagogical application, lasting until the mid-twentieth century. This was the notion of vocabulary control.


The basic idea behind this movement was one of lexical choice - the most important words of a language should be given to students to learn first and these words should be limited. Another linguist of note - Henry Sweet (1845-1912) - declared that vocabulary should be controlled and that ‘3,000 common words would probably suffice for all except specialist purposes’ (Howatt 1984:187). Interest in vocabulary control grew after Sweet, continuing through the first half of the twentieth century, notably through the work of Thorndike in the USA and British linguists Harold Palmer, C.K. Ogden, I.A Richards, and Michael West. Thorndike (1921) used a corpus of 4.5 million words to create a word count that was ‘designed to lead to better pedagogical materials for teaching native speakers of English in the United States to read’  (Kennedy 1992: 336). This word count ‘helped provide a foundation for the vocabulary control movement’ (Kennedy 1992:336). 


Palmer, Ogden and West were all working at around the same time (1920s-1950s) in an area that was basically the same - vocabulary control - but a severe enmity grew up, especially between Ogden and West. The reasons were firstly academic, as both were espousing different systems of the limitation of vocabulary to learners, and secondly commercial - whoever got the upper hand would, it was assumed, reap the financial rewards that success and acceptance brought with it.


Ogden, and later Richards, worked on what was termed Basic English (the term ‘Basic’ here standing for British American Scientific International Commercial). Basic English consisted of 850 words that, although not purporting to be full English, attempted to be not un-English. Carter & McCarthy (1988) note several problems with Basic English: despite the fact that there were only 850 words, there were potentially over 12,000 meanings attached to them not covered by Ogden - thus polysemy was not taken into account. Many normally-used verbs were missing e.g. smoke and walk, and more damagingly, many everyday phrases such as goodbye and thank you were not included in the list.


A controlled vocabulary was also advocated by Harold Palmer, mentioned briefly above, in the 1920s and 1930s. He proposed a list of 3,000 words which would consist of a ‘dartboard’ approach with a minimum vocabulary of 1,000 words and an outer ring of another two thousand. Palmer worked with A.S. Hornby - working together on what later became Thousand Word English (Palmer & Hornby 1937) - and was, more famously, to work with Michael West in work that led to the General Service List  in 1953.


Michael West[1] (1888-1973) worked with Bengali children in India and attempted to reduce what he saw as the gross wastage of the then prevalent educational system of ‘filtering’ out only the top students (Howatt 1984:245). He concentrated on reading skills, and found that by substituting the more archaic words in the children’s readers, and lowering the amount of new words encountered by readers in a text, he was able to increase their learning dramatically. Both Palmer and West collaborated in the Interim Report on Vocabulary Selection (1935) or the Carnegie Report as it became known, following meetings of top linguists, including Thorndike, in New York in 1934 and London in 1935.[2] Subsequently, the first General Service List (GSL) was published in 1936. The report from these two conferences


... clearly outlined the principle that items with a likely high frequency

of occurrence in texts should be learned first to avoid memory overload

and confusion and to lighten the learning burden.  (Kennedy 1992:337)


The final General Service List of 1953 stated that the main criterion for selection of items for learning should be that of the frequency of each word in written English and also that ‘information should be provided about the relative prominences of the various meanings and uses of a word form’ (Carter & McCarthy 1988:7). Likewise, Jeffery, in the Foreword to the 1953 version of the GSL, stated that the main aim was to


... find the minimum number of words that could operate together in constructions capable of entering into the greatest variety of contexts.

                                                                                                Jeffery (1953/67:v)


The GSL was created by a mix of intuition, experience and hard data - the fact that Thorndike took part in the meetings gave access to corpora of empirical value. However, like Basic English, the GSL was not without problems: there was no collocational information at all, the concept of coverage was not fully developed, and simply the fact that because a word is frequent does not necessarily make it useful for learners to know it.


Despite these criticisms,[3] the influence of the GSL has continued up to the present day and Howatt (1984:258) mentions the Council of Europe’s Threshold level of 1975 as being influenced by it.  Carter & McCarthy (1988:9) term the GSL as ‘one of the most innovative examples of foreign language pedagogy and lexiometric research this century’. The publication of both graded readers for students and also many dictionaries has been brought about largely by this pioneering work.


It should also be noted at this point that Michael West was also aware of ‘specialist’ lexis. Together with W.E. Flood, he produced a supplementary section to the GSL containing ‘scientific and technical vocabulary’ (West 1953). This had originated in the 1936 version of the GSL and a fuller version is found in the 1952 collaborative work  An Elementary Scientific and Technical Dictionary  (Flood & West 1952). Flood was later to go further into the examination of scientific vocabulary in his 1957 work The Problem of Vocabulary in the Popularisation of Science. Discussion in Chapter 9 of this thesis refers back to vocabulary control movement when presenting a potential core lexis of Business English.

4.2.1 The 1950s to the present day


After the flourishing of vocabulary research and application in the first half of the twentieth century, there followed a period of ‘limbo’: ‘In summary, it can be said that the period 1945-1970 was a limbo for vocabulary as an aspect of language teaching in its own right’ (Carter & McCarthy 1988:41). Carter & McCarthy note that vocabulary was relegated to a lowly place in the order of things, mainly due to the influence of American structural linguistics ‘with its emphasis on phonology and syntactic patterning’ (1988:40). The emphasis on transformational grammar brought about by Chomsky did little to remedy this situation.  However, important developments concerning the study of text-linguistics had taken place in the 1950s and 1960s that were to profoundly affect later work and the ‘re-birth’ of vocabulary. These developments join together the two key themes under discussion in this thesis - corpus linguistics and lexis.


The first factor in the re-emergence of lexis was the influence of British linguist J.R. Firth. His interest in collocation, which he defined in articles in the 1950s, engendered two key articles in 1966, one by John Sinclair, the other by M.A.K. Halliday.[4] These articles showed remarkable foresight in espousing the use of computer corpora and stressing the importance of collocation in the study of lexis. In this early work by Sinclair the origins of later lexical work including the COBUILD project can be seen. Vocabulary also came into focus in the register analysis movement which began in the 1960s, as was discussed in the previous chapter (Barber 1962, Herbert 1965, Cowan 1974, Martin 1976, Friel 1978). The third factor accounting for the revived interest in lexis coincided with a renewed interest in corpora.  The ability of the computer to give access to large amounts of data facilitated corpus-based studies in lexis never before thought possible.


The next section will, therefore, go on to look at more recent developments in the study of lexis that would have been largely impossible without the use of both corpora and computers:  collocation, semantic prosody and colligation.


4.3 Collocation


It would be desirable, when beginning a section on such an important concept to this thesis, to begin with a clear and unambiguous definition of collocation. Even a brief glance at the vast literature on the subject, however, reveals that forming a precise definition is difficult. There are both conflicting definitions and conflicting terminologies: ‘Regrettably, collocation is a term which is used and understood in many different ways’ (Bahns 1993:57). Yet, despite these variations, a general and workable definition of collocation can be reached and in doing this, certain key factors regarding collocation that are central to later analysis will be considered:


·    a preliminary definition of collocation

·    the development of the concept of collocation including grammatical and lexical descriptions of collocation

·    key elements of collocation:

- the notion of upward and downward collocation

       - the strength of the collocation - from strong to weak or from fixed to free-form

       - the notion of collocational span

       - collocation as an embodiment of the idiom principle (Sinclair 1987, 1991)

            - collocations, the idiom principle and Business English


4.3.1 A preliminary definition of  collocation


The following quotations offer a varied view on the concept of collocation, variously defining it as a lexical, grammatical or research phenomenon, but all containing a focus on the co-occurrence of words:

You shall know a word by the company it keeps. (Firth 1957:179)


We may use the term node to refer to an item whose collocations we

are studying, and we may define a span as the number of lexical items

on each side of a node that we consider relevant to that node. Items

in the environment set by the span we will call collocates.

                                                                                  (Sinclair 1966:415)


... the study of lexical patterns ...                                   (Brown 1974:1)


... a sequence of words that occurs more than once in identical form

.... and which is grammatically well structured.     (Kjellmer 1987:133)


... the meaning of a word has a great deal to do with the words with

which it commonly associates.                               (Nattinger (1988:68)


... a recurrent co-occurrence of words.                         (Clear 1993:277)


... the way individual words co-occur with others.

                                                                                       (Lewis 1993:93)


Collocates are the words which occur in the neighbourhood of your search word.                                     (Scott 1999 WordSmith Help File)


... the way in which words occur together in predictable ways.

                                                                             (Lewis & Hill 1998:1)



Thus, whilst it can be seen that the definitions of collocation are somewhat varied, there is still a common core of agreement to be found. An initial working definition for this thesis thus could be that collocations refer to words that keep company with one another. This definition of course will be refined as the chapter continues, and a final definition of collocation for the purpose of this thesis will be presented at the end of this part of the chapter.


Collocation is a complex subject and, before more detailed discussion can take place, two important characteristics of collocation need to be presented. The first characteristic is that collocation operates on the syntagmatic rather than on the paradigmatic plane. The second is that collocation is not necessarily reciprocal. Syntagmatic/paradigmatic relations.


Collocation refers to lexical relations on the syntagmatic or horizontal plane, as opposed to relations on a paradigmatic, or vertical plane. This is shown in Fig. 19 below, taken from Walker (1996). Walker explains that ‘On the syntagmatic dimension we can see the relationship between words’. Therefore, looking at Fig. 19 we can see a syntagmatic relationship between writhed, ground, excruciating and pain. Conversely, ‘The paradigmatic dimension looks at the way in which one word can be replaced with another’ (Walker 1996). This is shown in the diagram, where four separate paradigmatic choices are presented, e.g. auntie could be replaced by uncle, cousin, mother or milkman. 



            Syntagmatic: Horizontal relationships


                        It writhed on the ground in excruciating pain.


                                    Syntagmatic sequence


            Paradigmatic (Substitution): Vertical dimension


            My      auntie has       bought             a          red                   automobile


                        uncle                sold                             green               car

                        cousin              purchased                   black               Ford

                        mother             hired                                                    bike


                        Paradigm 1      Paradigm 2                  Paradigm 3      Paradigm 4


Fig. 19 Syntagmatic/paradigmatic relationships adapted from Walker (1996). In the example four paradigmatic choices of lexis are exemplified.  Reciprocal/non-reciprocal collocation.



The second central characteristic of collocations is that they are often non-reciprocal - the strength of collocation between words is not equal on both sides. As an example, the words blonde and hair can be seen to be in different relationships. Blonde will only collocate with a very limited number of words - hair (or words that in this instance in some way relate back to hair, e.g. girl, woman), but hair will collocate with many words, e.g. brown, long, short and mousy. Thus, the strength of the bond between words is not equal. Other examples show that the bond between words can be unilateral, for example, in the phrase vested interest. Vested only ever collocates with interest, but interest collocates with many other words.[5]


Now that these points have been raised, it is time to look at collocation in more detail, and the next section reviews how the notion of collocation has developed since its first major explication by Firth (1951/7).


4.3.2 Development of the concept of collocation


The term ‘collocation’ has been used since the 18th century[6] (Carter & McCarthy 1988:32), but as a fully formed concept it is firmly grounded only in the 20th century. Harold Palmer, mentioned earlier in this chapter, was perhaps the first to pay attention to collocation and was keen to include in his teaching materials word partnerships such as tomorrow morning that he thought should be taught as one linguistic item (Howatt 1984:238). Kennedy (1992) reveals how Palmer made a list of over 6,000 collocations, believing them to advance the then current definitions of vocabulary (Kennedy 1992:336-337). Palmer also understood the importance of longer phrases and collocations, terming them polylogs.


However, the father of collocation is widely regarded to be J.R. Firth, and Firth is central to the lexical composition approach - the first of three schools of thought on collocation discussed by Gitsaki (1996). The two later approaches to collocation she termed the semantic approach and the structural approach. Each approach will now be discussed in turn.


1. The lexical composition approach: Methodologically, this approach ‘is based on the assumption that words receive their meaning from the words they co-occur with’ (Gitsaki 1996:10). It thus sees lexis as independent of grammar and the Neo-Firthians, as they were called (represented by Halliday and Sinclair), also kept grammar and lexis separate, though they did not try to devalue grammar in any way. In his often quoted paper Modes of Meaning in 1951, Firth provided a more detailed explanation:


Meaning by collocation is an abstraction at the syntagmatic level and is not directly concerned with the conceptual or idea approach to the meaning of words. One of the meanings of night is its collocability with dark, and of  dark, of course, collocation with night.            

                                                                              (Firth 1951/1957:196)         


Thus part of the meaning of a word is the fact that it collocates with another word. The other words with which it collocates, however, are often strictly limited. Firth gave the example of the word ass, saying that ‘There are only limited possibilities with preceding adjectives, amongst which the commonest are you silly, obstinate, stupid’ (Firth 1951/1957:195).[7] The revolutionary part of Firth’s thinking was to look at lexical relationships at a syntagmatic rather than paradigmatic level, whereas previous grammars had considered only structural relations at the paradigmatic level (Gitsaki 1996:1). 


Firth’s ideas were taken up by Halliday and Sinclair (1966)[8] in articles that have been since regarded as landmark. Halliday (1966) reiterated Firth’s idea that part of the meaning of a word  is the fact that it collocates with others:  ‘it is part of the meaning of ‘past’ that it contrasts with ‘present’, and it is part of the meaning of strong that it collocates with tea’ (Halliday 1966:160). He also noted that collocation cuts across grammatical boundaries, giving the example of he argued strongly / the strength of his argument, where the collocation between strong and argument survives the grammatical change in the sentence (Halliday 1966:150-151).


Collocation was also covered in depth by Sinclair in his article in the same volume. Although over thirty years old at time of writing, this article still seems very modern in its approach. Sinclair defined such terms as node and span as they are used today and analysed the collocates of the words money, pay and ticket, producing a very modern-looking frequency list of collocates. He termed the list a Total Environment Table which showed all the collocates in the order of their frequency. Sinclair’s and Halliday’s articles also stressed the need for computer-based corpus linguistics and, in doing so, were way ahead of their time (their articles will be returned to later in this chapter). The next approach to collocational analysis tried to go beyond purely observing collocation, to saying why it occurred as it did.


2. The semantic approach: This is an approach where ‘linguists attempted to investigate collocations on the basis of a semantic framework, also separate from grammar’ (Gitsaki 1996:13). The crux of this approach was to try and find out not just that certain words collocate with each other, but why they collocate: why we can say blonde hair but not  blonde car. The inability to say why words collocate had been a failing with the lexical composition approach and it still represents a challenge today, though research has been done on this, for example, Gitsaki cites work done by Mel’cuk (1988).


3. The structural approach: The third approach to collocation says that ‘collocation is influenced by structure, and collocations occur in patterns. Therefore...the study of collocations should include grammar’ (Gitsaki 1996:17). Thus, in contrast to the two previous approaches, grammar is seen as a central factor that cannot be separated from lexis.[9] Lexical and grammatical collocation thus represent two different but related aspects of the same phenomenon, and Bahns (1993) defines the difference between them as follows:


            Examples of grammatical collocations include: account for, advantage

            over, adjacent to, by accident, to be afraid that... They consist of a noun,

            an adjective, or a verb, plus a preposition or grammatical structure such

            as an infinitive or clause. Lexical collocations on the other hand, do not

            contain prepositions, infinitives or clauses, but consist of various

            combinations of nouns, adjectives, verbs and adverbs. (Bahns 1993:57)



There is general agreement in the literature on the division of collocates into lexical or grammatical categories, though less agreement on their relative importance. Lexical collocation is defined by Lewis & Hill (1998) as having five main categories: adjective/noun, verb/noun, noun/verb, adverb/adjective and verb/adverb. Gitsaki  (1996:23) is able to define 37 categories of collocation, eight of which could be considered as lexical collocation and 29 grammatical (she thus largely accepts the structural view of collocation). Benson, Benson & Ilson (1997) in the BBI Dictionary of English Word Combinations differentiate between lexical and grammatical collocation as Bahns above, and designate eight main kinds of grammatical collocation and seven kinds of lexical (1986/1997: xv-xxxiii).


Collocational study in this structural category has looked both at the collocation of grammatical classes of words (Kjellmer 1987, 1990), and, more significantly for this research, also how grammar is integrated into lexis and vice versa through collocation and patterning (Sinclair 1991, Renouf & Sinclair 1991, Hunston et al. 1997, Hunston & Francis 1998,  Hoey 1997, 2000).


Grammatical word classes: In terms of looking at classes of words, Kjellmer, using a tagged corpus (the Brown Corpus) set out to establish ‘to what extent individual word-classes are ‘collocational’ or ‘non-collocational’ in character’ (Kjellmer 1990:164). The results of his study showed that articles, prepositions, singular and mass nouns and the base forms of verbs were collocational in nature, whereas adjectives, single proper nouns and adverbs were not (1990:185). It is important to remember here that collocation is defined in a grammatical sense, thus the base forms of verbs, for example, are seen as ‘collocable’ because they occur often in the infinitive form and so must collocate with ‘to’ or a modal auxiliary. Kjellmer concludes his article discussing the gradation of collocation:


There is a continuum in English words .... from those whose contextual company is entirely predictable (Angeles, Fidel)[10] to those whose contextual company is entirely unpredictable (therefore), but the evidence indicates that most words are to be found towards the Angeles end of the scale.[11]                                       (Kjellmer 1990:172)



Lexical and grammatical integration: Perhaps a better term for a lot of the work on collocations that considers both grammatical and lexical elements might be the  integrated approach. Sinclair, the leading exponent of this view, did not always see grammar and lexis as inseparable and it is interesting to note that his views have changed since the 1966 article, where he still kept grammar and lexis apart.[12] However, his later work integrates grammar and lexis and examines the generative power of grammatical words. For example, Renouf & Sinclair (1991) examined the generative power of collocational frameworks such as a + ? + of and be + ? + to and found them to be a key part of language creation.[13]


Other writers, too, have noted the interrelationship of grammar and lexis (Hunston et al. 1997, Hunston & Francis 1998). Working with the COBUILD 250 million word corpus, Hunston et al. found a distinct correlation between grammatical patterning and lexical meaning. They say that their work  ‘does not rely on a distinction between grammar and vocabulary, but provides connections between the two’ (1997:208). They go on to elucidate: ‘There are two main points about patterns to be made: firstly, that all words can be described in terms of patterns; secondly, that words which share patterns also share meanings’ (Hunston et al.1997:209). They continue:


Although some senses of some words have several patterns, some senses have only one pattern and are identified by it. This means that a word only means a particular thing when it is used with a particular word.                                                               (Hunston et al.1997:209)


The relationship between lexis and grammar found by the work Hunston and Francis forms a central part of the analysis of business lexis in this thesis. Lexis is not regarded as separate from grammar and vice versa - rather, the thesis attempts to see how they interact in the business lexical environment.


The work of Hunston et al. is re-inforced by Hoey (1997, 2000), who found that certain senses of a word will have their own grammatical patterning or colligation. These last aspects of the relationship between grammar and lexis will be returned to in discussing both the concept of colligation and longer collocations or multi-word units later in this chapter. However, before that, further elements of collocation that are important to this thesis are briefly reviewed. Whilst each section considers a separate element, it must be stressed that all these parts go to make the whole that is collocation.


4.3.3 Key elements of collocation


The key elements of collocation presented here relate to the following: the notion of upward and downward collocation; the strength of collocation - from strong to weak and from fixed to free-form; the notion of collocational span; collocation as an embodiment of the idiom principle (Sinclair 1987, 1991) and, finally, the relationship of collocations, the idiom principle and Business English. The review begins with a further connection between grammar, lexical (semantic-based) words and collocation - the notion of upward and downward collocation. The notion of upward and downward collocation


Firth (1951/7) saw that the possible collocates for words are limited, sometimes even very strictly limited and this chapter has shown that these collocates can be both grammatical and lexical. A further distinction that needs to be made in relation to lexical and grammatical aspects of collocation is put forward by Sinclair (1991:115-116) - upward and downward collocation.


Upward collocation: This concept basically states that words will habitually collocate with other words that are more frequently used than they are themselves in the English language. For example, Sinclair notes that the word back collocates with at, down, from, into, on and then, all of which are more frequent words than back.

Downward collocation: Similarly, words will also habitually collocate with words that are less frequent than they are. Again, Sinclair uses the example of the word back giving arrive, bring and climbed as examples of less frequently occurring words that collocate with back.


There is, however, a difference in the grammatical nature of the collocates these two types of collocation attract:


There appears to be a systematic difference between upward and downward collocation. Upward collocation, of course, is the weaker pattern in statistical terms, and the words tend to be elements of grammatical frames, or superordinates. Downward collocation by

contrast gives us a semantic analysis of a word.      (Sinclair 1993:116)


In terms of grammatical classes, therefore, Sinclair notes the collocates of back: upward collocates are, for example, prepositions, adverbs, conjunctions and pronouns, whilst downward collocates consist of a large number of nouns and verbs.


Clear (1993), in his research on the word taste using MI and t-score statistics, confirms Sinclair’s views on the grammatical classes of words found in upward and downward collocation. However, he does not go as far as Sinclair in saying that downward collocation gives a semantic analysis of a word. He found that some upward collocates can also help in the semantic analysis of a word (here the word taste):


            Many of the pairs identified by the t-score are upward collocations, and

            several of these are indicators of the different senses of taste which one

            would expect to find discriminated in a dictionary.    (Clear 1993:281)



The phenomena of upward and downward collocation was also noted in this research into Business English and will be briefly discussed again (see Key word 1.9  boss in Appendix 6 in Vol. II). At the beginning of this section on collocation, the distinction between reciprocal and non-reciprocal collocation was made. This now needs to be considered in more detail and forms the next key element of collocation discussed here. The strength of collocations


It can be deduced from the work of Kjellmer (1990), discussed above in Section 4.3.2 that there is a continuum of collocability, from words that always collocate with a given other word (e.g. vested interest, moot point) to words that are more free and unpredictable in their partners (e.g. therefore) - a point also discussed by Widdowson (1989:133). Kjellmer’s ideas of collocation may be represented thus:


            Free choice      ---------->------------>------------->      Closed classes                         word classes                                                                fixed collocation


Fig. 20 Kjellmer’s (1990) ideas on collocational fixedness


Whilst Kjellmer was concerned with grammatical classes, the same phenomenon is equally relevant to studies of lexical collocations. Study of lexical collocation can also produce a sliding scale of collocability, though lexical collocates are not tied in like manner to collocation between certain grammatical classes, but the focus is more on the level of semantics. The cline can be represented as follows:[14]


            Weak   ------->--------->-----> Medium strength ----->-----> Strong

            collocation                               collocation                               collocation      


Fig. 21  A sliding scale of collocability


It is important to note here two things: firstly collocations in this sense are not idioms - the meaning of the partnership of words is transparent and can be deduced from its constituent parts, whereas this cannot be said of idioms.[15] Secondly, most collocations lie in the middle ground of the cline: there are relatively few very strong collocations (Lewis & Hill 1998:2-3). Lewis & Hill determine three grades of collocation: strong collocations (avid reader, budding author), common words that collocate widely (fast car, have dinner, a bit tired) and medium strength collocations (magnificent house, significantly different, relatively strong) which they say make up the majority of collocations necessary for language learners to know. Hill (1999) adds one more to these categories: that of unique collocations such as foot the bill, shrug one’s shoulders. Thus, whilst the collocation of words is on a sliding scale of fixedness from totally free to totally fixed, the majority of collocates, it is suggested, can be found somewhere in the middle of this spectrum. 


All the examples of collocation given so far in this chapter show collocates of words that immediately precede or follow each other. This close proximity of collocating words presents, perhaps, the most accessible definition of collocation. Yet there has been a long-standing discussion in the literature on what is known as collocational span - the notion that words can be considered collocates even if they are two, four or even eight words away from the head word under examination. The next section, therefore, looks at this discussion. The notion of collocational span - what makes a collocation ?


The quotations which this section on collocation started with refer to the co-occurrence of words, but there is little specification of what this actually means in practice. Scott (1999) notes that


The literature on collocation has never distinguished very satisfactorily between collocates which we think of as ‘associated’ with a word (letter - stamp) on the one hand, and on the other, the words which do

actually co-occur with the word (letter - my, this, a, etc.). We could call the first type ‘coherence collocates’ and the second ‘neighbourhood collocates’ or ‘horizon collocates’.

        (Scott 1999 WordSmith 3 Help File )           

Scott’s observation is indeed shown clearly in the literature. In 1966, Sinclair and Halliday had set no limit on collocational span. Sinclair already then realised that years of study would be needed in order to set the optimal span and stated, therefore, that ‘we reject, for the moment, the suggestion that degree of proximity within the chosen boundaries of collocation should be considered of primary importance’ (Sinclair 1966:414). Sinclair gave examples of the words  post, letter and pillar box (Sinclair 1966:413), noting the inherent connection between these words: where there is one, it is expected to find the other. However, in 1966, there was no way of statistically determining the relationship between them. Thus, Sinclair chose a 3:3 span to study the collocates of money, pay and ticket in the same article. By 1991, Sinclair was operating with a 4:4 span and mentions that he was still engaged in research to see what would be the optimal setting for searching out collocates (1991:106). Clear (1993) stated that


Intuitively, one would expect that a given node word would

associate more strongly with immediately adjacent words,

and that the associative link would be weak  or non-existent

the further removed are the collocating words. (Clear 1993:276)

Thus both Clear and Sinclair (in practice) are concerned with the ‘neighbourhood’ collocates of Scott. In contrast, Scott (1997) suggests a concept of associates which ties in closely in definition with Sinclair’s early consideration of examples such as post, letter and pillar box. Unlike the pre-computer days of 1966, Scott’s (1999) computerised lexical analysis software[16] is able to generate the associates of words - that is, words that co-occur with the head word within the same text or texts to a pre-determined statistical significance. These associate words may or may not occur in close proximity of the node word - habitual occurrence is enough (Scott 1999). Scott (1997) remarks that this idea is very close to the original 1950s - 1960s definition of collocation and notes that ‘co-associates are not the same as Firthian collocates, but they represent a level of lexical patterning which inherits from Firth’s traditions’ (1997:240). Sinclair, too, had clearly recognised the possibility of this kind of lexical relationship, as was mentioned above, but saw no way of actually achieving an analytical framework for it.


This thesis adopts an approach that takes into account both aspects of collocation: both the ‘neighbourhood’ collocates and associates ‘which are a pointer to coherence collocates’ (Scott 1999) are studied. Scott’s associates are computed for key business lexis in order to gain a clearer picture of the lexical environment of Business English. However, the main focus of the research has been on neighbourhood collocates.[17]


The discussion on collocation has thus far centred on more technical elements of its nature. Discussion now turns to a more general, but perhaps more important level. In the following sections, collocation will be linked to work done by Sinclair (1987, 1991) on the idiom principle and also show how research has combined aspects of the idiom principle, collocation and Business English. Collocation as an embodiment of the ‘idiom principle’


It has been noted in the literature that the study of collocation has presented certain theoretical problems in that it does not, or has not fitted into accepted models of linguistic description. Clear notes


The study of collocation as a linguistic phenomenon has not found a central place in theoretical linguistics, perhaps because its proper province is the rather ill-defined area of linguistic patterning that is neither clearly syntactic nor clearly semantic.             (Clear 1993:271)


However, a clear methodological grounding can be offered for collocation by viewing it as a natural part of the idiom principle. This notion of the idiom principle was first put forward by Sinclair in 1987 and again in Corpus, Concordance, Collocation in 1991,[18] as well as in other articles, for example, Renouf & Sinclair (1991).


Renouf & Sinclair (1991) examined the generative power of collocational frameworks such as a + ? + of and be + ? + to. They found that some of the triplets were commonly subsumed as part of a larger lexical unit. They went on to explain this phenomenon by saying that it can be seen as ‘a series of collocational units flowing into each other - that in fact, the last element or elements of one frame form the beginning of the next’ (Renouf & Sinclair 1991:140). These collocational units are then available for the language user:


The principle of idiom is that a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments.                                                                   (Sinclair 1991:110)


This, obviously, is very different to the traditional ‘slot and filler’ grammatical approach to linguistic description where grammatical structure is seen as paramount. Sinclair elaborates on this theme (Sinclair 1991:110-115) and contrasts the idiom principle with what he calls the open principle. The open principle represents the traditional way of viewing language where text is seen ‘as the result of a large number of complex choices. At each point where a unit is completed (a word or a phrase or a clause), a large choice opens up and the only restraint is grammaticalness’ (1991:109). Sinclair rejects this model as a single explanation of language: on its own it is not enough. The open principle does not account for the fact that linguistic choice is not random in the language:


It is clear that words do not occur at random in a text, and that the open-choice principle does not provide for substantial enough restraints on consecutive choices.                             (Sinclair 1991:110)


He goes on to say


Once a register choice is made, and these are normally social choices, then all the slot-by-slot choices are massively reduced in scope or even, in some cases, pre-empted.                                          (Sinclair 1991:110)



Even allowing for register, there is still too much choice available in the open model.

Sinclair stresses the fact that most language use is made up of idiom principle usage with occasional ‘switches’ to the open principle. The two definitions of language are thus ‘diametrically opposed’ and there is no shading of one to the other (1991:114).


The idiom principle is a rejection of Chomsky’s dualist approach to language analysis and its focus on collocation and prefabricated language provides a firm methodological basis for this work.  This methodological base has been more recently analysed in work that brings together the concepts of collocation, the idiom principle and Business English. Collocation, the idiom principle and Business English


The idea of limited collocational choice, central to the idiom principle discussed above, can be traced back to the writings of J.R. Firth. Firth had noted in 1957 that collocations ‘will often be found to be characteristic and help justify the restriction of the field’ (1957:180). It could thus be expected, therefore, that in Business English, collocational limitation could be found. This limitation is noted by Alejo & McGinity (1997), who suggest that when choosing vocabulary we


... use the idiom principle, that is, we severely limit the choice of what comes next. This tendency is very important where business English is concerned, for in this discipline concordance and collocation are considerably limited.  (Alejo & McGinity 1997:216)



For Alejo & McGinity, then, one of the possible characteristics of the semantic field of Business English may be the limitation imposed on choice of collocations. They even give examples of them, e.g. domestic consumption, capital equipment, framework agreement. They also specify a group of words which almost always appear together, for example,  input / output, supply /demand, gross /net, and import /export (1997:226).[19]


The importance of collocation in Business English is reiterated by Conzett (2000:81) when she says that ‘Perhaps the very nature of ... business training - the pragmatic and functional notions ... puts collocations at the forefront of its language work; certainly the relevant books and training materials emphasize lexical phrases as a matter of course’. Yet the abundance of business-related collocations in Business English is called into question by Berber Sardinha (1994 a,b,c) in work that combines aspects of collocation, the idiom principle and Business English.


In a series of three articles in 1994, Berber Sardinha looked at collocation and the idiom principle in relation to the Business English environment. It is worth looking in detail at these short studies because, although confirming the idiom principle, they do point to certain problems with it. Berber Sardinha (1994a) considered the lexical patterns for the word year in Annual Business Reports. Berber Sardinha believed that ‘users that share the same discourse community make choices in terms of constantly selecting ‘semi-reconstructed phrases’, they shape up a text type’ (1994a:2). Thus certain key linguistic choices can be expected to be made in genre-specific texts - in this case annual business reports, because


... each word is to be considered a node exerting influence within a given radius around it; and the words which fall within that radius are also nodes with other areas of influence around them, and so on. The organization of language as envisioned by the idiom principle is really a recursive set of overlapping territories having as their centers lexical items which constrain each others’ territories

                                                                      (Berber Sardinha 1994a:2)


Berber Sardinha used 17 annual reports from a larger corpus of 74. The word year was the most common lexical item in the reports and he was able to find recurring patterns for its use. He concluded that ‘these patterns are made up in such a way that each of their constituents selects which word or, sometimes, which grammatical category will accompany them. Therefore, these lexical patterns in the text analyzed do not seem to fit into the traditional structural descriptions which prescribe that as far as lexis goes, each choice is free’ (1994a:4-5). This short piece of research, therefore, confirms the use of the idiom principle in this environment.


This theme was continued in the second article (Berber Sardinha 1994b), where he considered pragmatic problems in determining the accuracy of the idiom principle. These problems related to the bidirectionality of collocates, the notion of lemma and the difficulty in delimiting the end of one collocate and the beginning of another.


Bidirectionality: If it can be assumed that words engender collocates, there arises the problem of bidirectionality in that it may be difficult to determine the ‘exact domain of the collocation, since collocations can be seen to form in both directions of the node’ (1994b:2).

Lemmas: Additionally, problems are presented when deciding whether members of the same lemma[20] group should be treated only as part of the lemma, or whether they should be treated as separate and analysed as independent collocates.

Delimitation: ‘collocates also nest inside each other, making the delimitation of each one rather difficult’ (1994b:2).


Despite presenting these problems, Berber Sardinha does not really give any answer to them in his study which takes, rather, a short news story from a Brazilian newspaper for analysis. The news story was analysed by identifying the flow and limits of collocations in the text:


The frequencies were counted for the first collocation as it began with the first word of the text, and followed until a frequency of 1 was found, signalling a ‘free choice’. At that point another collocation was considered to have begun and the process of finding the frequencies from left to right resumed.                            (Berber Sardinha 1994b:5)



The results of the study showed that ‘there are both long and short collocations and a consistent sub-patterning. One can also identify free choices that mark the end of several collocations’ (1994b:8). Predictably, short collocations were found to be much more frequent than long ones. A further interesting finding was that some words act as ‘flow barriers’ (1994b:8) demarcating the span of one collocational chain on one side of it and the start of another chain on the other. This ‘discourse’ collocational function would certainly warrant further detailed analysis. Berber Sardinha’s study again supports the idiom principle view ‘‘Free choices’, or word sequences that have not occurred elsewhere in the corpus are very few’ (1994b:11) and that ‘connected discourse seems to flow by means of several interlocking semi-fabricated word sequences’ (1994b:11).


In the final article, which examined introduction sections in annual reports, two additional points of interest can be found. Firstly, that ‘business language’, at least that found in the text under discussion, was mostly made up of collocations common to general English. On top of this were sprinkled ‘choices that are specific to the phraseology which one would by experience associate with annual business reports’ (1994c:10). Secondly, Berber Sardinha noted the role of intertextuality in that ‘texts are shaped by prior texts, by repetitions or by being oriented to routines and conventions’ (1994c:12).


Berber Sardinha’s point, that most collocations found in the Business English environment can be considered to belong to general English, has also been found in this study. However, it is argued in this thesis that the importance of these subject-specific collocations outweighs their frequency. It will be seen that as this research is based on unusual frequency, as opposed to actual frequency, the collocates of business, though not necessarily of high overall frequency, do in fact occur significantly more often in Business English than in general English and are, therefore, important for learners. Collocation and beyond


This last section has been concerned with defining collocation as it has been found in the literature and how it has been placed at the centre of the idiom principle. At the end of this second part of the chapter, a working definition of collocation for the purposes of this thesis will be presented.  However, before this can be done, it is important to look ‘beyond collocation’ (Hoey 2000) and see how collocates form into what are known as semantic prosodies.


4.4  Semantic Prosody


Semantic prosody, put simply, refers to the fact that words, as well as collocating with a given other word (e.g. as when arms collocates with akimbo), can also collocate with  semantic classes of words that are often either positive or negative in meaning.[21] Semantic prosody is a concept only recently ‘discovered’ as the advent of computerised studies into language was needed before these semantic patterns could be noticed. Semantic prosody has been defined variously by Sinclair (1991) - though not named as such, Louw (1993), Stubbs (1995), Tribble (1998) and Hoey (1997, 2000). Each definition is basically the same, but the scope of semantic prosody has been expanded by each new definition.


Sinclair (1991) noted the fact that certain words seemed to collocate with semantic classes of other words that were decidedly either positive or negative.


Many uses of words and phrases show a tendency to occur in a certain semantic environment. For example, the verb happen is associated with unpleasant things - accidents and the like.

                                                                                  (Sinclair 1991:112)


However, Sinclair never came out publicly with the term semantic prosody and it was not until 1993 that it was first discussed in any detail by Louw as a concept in its own right.[22] Louw states that semantic prosody is the ‘consistent aura of meaning with which a form is imbued by its collocates’ (Louw 1993:157). In his article, Louw concentrated on the use of irony in poetry and showed how the use of semantic prosody creates a mood or atmosphere within the poem that adds to its meaning and its effectiveness. It is the very fact that the prosodies are so strong and fixed in the language, for example the negative prosody of the word utterly used by poet Phillip Larkin, that they can be used to such dramatic effect. Louw also noted a correlation between semantic prosody and grammar:


... prosodies based on very frequent forms can bifurcate into ‘good’ and ‘bad’, using a grammatical principle like transitivity  in order to do so. For example, where build up is used transitively, with a human subject, the form of the prosody is uniformly good....Where things or forces, such as cholesterol, toxins, and armaments build up intransitively, of their own accord, they are uniformly bad. 

                                                                                     (Louw 1993:171)


Just as in the idiom principle, language is seen to flow on from one ‘chunk’ to the next. Louw concludes his article by noting:


First, it is clear that in many cases semantic prosodies ‘hunt in packs’ and potentiate and bolster one another in rather the same way that images are forced to cluster in poetry in order to prevent full ‘intuitive’ meaning from ebbing away into delexical meaning.

                                                                                 (Louw 1993:172)


The notion of semantic prosody was taken up and expanded by Stubbs (1995) who suggested that as well as collocating with purely positive or negative semantic groupings of words, words can also collocate with semantic sets:


Semantic prosodies may be of a very general kind: such as the shared semantic feature ‘unpleasant’. Alternatively, one may be able to predict that a node will most likely co-occur with collocates from a  restricted lexical set: for example, from the semantic field of ‘care’.  

                                                                                    (Stubbs 1995:249)


Stubbs studied several words using two separate corpora and was able to assign their collocates to either positive, negative or to a lexical set. For example, the word job he found to be both positive and negative whilst career was only positive (1995:253) and the word unemployment was found to collocate with the semantic set of statistics (1995:254). Both Louw and Stubbs also realised the theoretical problems that a concept like this poses - at present there is no linguistic theory that adequately explains it:


It is a purely lexical, yet syntagmatic, relation, of a type which cannot be captured by current descriptive theory. Indeed it undermines conventional views on the relation between syntagmatic and paradigmatic. In addition the statements which I have given are probabilistic. Again, conventional linguistic description usually assumes categorical relations between units, and has no theory of typicality.                                                                  (Stubbs 1995:255)


Despite the difficulties at the level of linguistic theory, Stubbs does offer a use for semantic prosody at a more pragmatic level, saying that ‘one might start to use such descriptive methods for integrating semantic, pragmatic, and cultural information into language teaching materials’ (1995:256).[23]


This call for a practical usage of semantic prosody was answered by Tribble (1998), who argued that it can play an important role in the teaching of written genres. He proposed that the semantic prosodies of a given word are both universal and local, that is, there may be a global semantic prosody for a word in relation to the whole language, but that in a given context or genre, there may be a local semantic prosody that is only to be found there. Thus, it is ‘proposed that words in certain genres may establish local semantic prosodies which only occur in these genres, or analogues of these genres’ (Tribble 1998:11). As an example, Tribble analysed the use of the word experience in a corpus of European Union Phare proposals and was able to show that there is a local meaning used in this genre of experience as ‘professional capital’. This definition of the word is not to be found in any dictionary.


Hoey, in two articles (1997, 2000), largely concurs with former definitions of semantic prosody, but criticises both Sinclair and Louw in terms of semantic prosody:


Louw’s term is potentially a helpful one but we need to broaden the category a little. Sinclair (1991), who does not use the term, discusses the phenomenon under the heading of the idiom principle, but this is not a satisfactory categorisation, since there is no requirement that a semantic-prosodic association should be in the case of any particular item a regular association. When a new disease is found, it can immediately be added, for example, to the list of things that can be caused; we do not have to wait until it has become a common enough disease for it to figure in calculations of collocations.     (Hoey 1997:2)


This is an interesting point. Louw had suggested that the fact that prosodies are built up over time gave them their power, for example, to use them to display irony. However, Hoey’s point is not at odds if it is considered at the level of lexical set or category, rather than at the level of word and individual collocates. Thus, whilst an individual word may be new to a language, as in the case of a disease, it falls into the category of ‘disease’ and therefore joins the prosodic group of ‘diseases’ with little trouble. In the sentence: She came down with a bad case of flu it can easily be extended to She came down with a bad case of disease X. [24] This aspect, Hoey says, is why semantic prosody is so important:  ‘what makes the notion so useful and important is that it cannot be subsumed by its collocations’ (Hoey 1997:2). He goes on to say:


I am suggesting that semantic prosody is the label that we might generally give to answers to the question ‘Does the word regularly associate with other meanings?’  (Hoey 1997:2)


In Hoey (2000) a teaching-orientated, pragmatic approach is taken towards the use of semantic prosody. His article criticises present EFL vocabulary textbooks for presenting language that is not typical of actual use. He analyses the word chilly and suggests that ‘if a learner wants to learn chilly they would do best to learn that it occurs in certain kinds of context rather than all contexts’ (Hoey 2000:232-233).  Hoey does not stop at semantic prosody though. The survey of collocation in this chapter has shown that lexis and grammar are interrelated. Hoey realises that semantic prosody alone is not enough to account for the typical patterning of words, and that the concept of colligation is also required. A review of this concept forms the next section of the chapter and is divided into discussion on the more technical aspects of colligation and then its pedagogical implications.






4.5  Colligation   


4.5.1 Technical aspects of colligation


Colligation was another major idea first put forward by Firth (1957), and Hoey provides a straightforward definition: ‘Colligation can be defined as ‘the grammatical company a word keeps and the positions it prefers’; in other words, a word’s colligations describe what it typically does grammatically’ (Hoey 2000:234 - Hoey’s use of bold). Thus, colligation is a similar idea to collocation, but with a different emphasis. For example, Hargreaves (2000:213) compares colligation to collocation ‘verb + to infinitive is a colligation, dread + think a collocation which exemplifies the colligation’.


Colligation is concerned with the relationship between grammatical classes, whereas collocation is concerned with the words that belong to these grammatical classes. Hoey (1997) further divided colligation itself into two main classes:


Textual position: The notion that a lexical item may have a strong tendency to occur in a certain textual position rather than others, e.g. at the beginning or end of a text.

Grammatical context: A lexical item will tend to ‘co-occur with a particular grammatical category of items’ (1997:4). The implication of this is that when a word has more than one sense, each sense is found in a different grammatical context, with sense and a specific grammatical context in a direct relationship.


Looking at the second of these two categories, Hoey examined the word reason and its relationship to specific deictics (e.g. this, that, whichever, his, her) and to non-specific deictics (e.g. each, every, some, any). He found that, for example, when reason was used in the ‘cause’ sense of the word - i.e. the reason for something  - it occurred with demonstrative, but not possessive deictics. Further, the interrelationship between colligation and semantic prosody was also noted:


...colligational and semantic prosody statements come together in some, in that in the structure for some <adj> reason, there is a strong prosodic tendency for the adjective to express the strangeness of the reason. Out of 104 adjectives occurring between some and reason, 87 expressed the oddness, the inexplicability or the craziness of the reason.                                                                              (Hoey 1997:5)


Hoey then presented what he termed the ‘Drinking Problem’ hypotheses:


a) Where it can be shown that a common sense of a word favours common colligations, then the rare sense of that word will avoid those colligations.

b) Where two senses of a word are approximately as common (or as rare) as each other then both will avoid colligational patterns of the other.

c) Where either (a) or (b) do not apply, the effect will be humour, ambiguity (momentary or permanent), or a new meaning combining the two senses.                                                        (Hoey 1997:6)


Hoey analysed the word cause in order to prove/disprove these hypotheses (the first two only) and found them to be correct. His analysis further showed that prosodic and colligational elements intertwine to the extent that they cannot be separated. For example, the word cause is in a colligational relationship with for and together they link up to form idioms, e.g. cause for concern which is part of the negative semantic prosody: ‘cause + for + something negative’ (e.g. cause for alarm, cause for concern).


4.5.2 Pedagogy and colligation


Hoey continued the theme of colligation in Hoey (2000), but this time from a more pedagogical perspective. His criticisms of teaching materials in this article were noted earlier, where he suggested that learners be presented with words as they naturally occur. He continued the theme with a colligational analysis of different professions: accountant, actor, actress, architect and carpenter. Despite the similarity of category, Hoey found that all these lexical items behave differently in terms of the grammatical company they keep:


The word carpenter has a much higher likelihood of occurring with an indefinite article or in parenthesis ... than does, say, architect. The  word accountant is much more likely to occur with a classifier ... and actress is more likely to appear in apposition.          (Hoey 2000:235)



Hoey suggests that this kind of information needs to be relayed to students and presents ideas related to teaching colligation through concordancing (Hoey 2000:238-242). This pedagogical emphasis is shared by Hargreaves in the same volume when he notes that in the relationship between collocation and colligation: ‘knowledge of a collocation, if it is to be used appropriately, necessarily involves knowledge of the patterns or colligations in which that collocation can occur acceptably’ (2000:214).


Other pedagogical work has also been done. Comprehensive cataloguing of the relationship between meaning and grammatical patterning has been carried out by the COBUILD team (Francis et al. 1996a, 1996b) where the grammatical patterning of verbs and nouns have been matched to related meaning. These works take the form of student reference books so that previous theoretical work can find its way into the classroom, and they have been utilised in this thesis in the analysis of lexico-grammatical patterning in Business English.


In summary, it is suggested here that word sense, meaning and grammatical patterning are all interrelated and this interrelation is important for learners to grasp in order to be able to produce fluent and appropriate English.


4.6  A final view of collocation, colligation and semantic prosody


Firth (1957)  suggested that ‘In the study of selected words, compounds and phrases in a restricted language for which there are restricted texts, an exhaustive collection of collocations must first be made. It will then be found that meaning by collocation will suggest a small number of groups of collocations for each word studied’ (Firth 1957:181). This process of investigating collocation and collocational groups is central to this thesis.

The previous discussion on both collocation, semantic prosody and colligation now allows the following working definitions and applications to be made:


1. Collocation is regarded in this study as the habitual co-occurrence of words with each other on the syntagmatic level. This co-occurrence can be both reciprocal and unilateral.

2. The study features both lexical and grammatical collocation.

3. The definition of collocation presented here takes in the ‘traditional’ view of nodal span with study using a 5:5 span.[25] However, the associates of Scott (1997, 1999) will also be featured in the analysis.

4. Collocates are organised by the wider factor of semantic prosody. Semantic prosody is defined as occurring when ‘a word associates with a particular set of meanings’ (Hoey 2000:232). Therefore, not only negative or positive prosodies, but also words’ associations with lexical and semantic sets form a central part of the research.

5. Collocation is seen within the wider theoretical setting of the idiom principle.

6. Colligation is considered as the typical grammatical patterning that a word is found in. Thus rather than looking at pure word class colligation, this study looks at the typical grammatical patternings of key Business English words and the relationships these patterns form with specific meanings.


This last section has focused on language at the level of individual collocates, where one word collocates with another. Yet another aspect of language central to the idiom principle is the notion of even longer ‘chunks’ of language which have been variously known as prefabricated phrases, multi-word items and word clusters. The next part of the chapter, therefore, focuses on this aspect of language.





4.7  Multi-word items, prefabrication and the lexical approach


4.7.1  Introduction


The idea that language is learned in a series of pre-fabricated blocks or chunks is commonly associated with post-1970s language methodology. However, like several other ideas related to lexis this, too, is not a new idea. Thomas Prendergast (1806-1886)[26] noted that children learn not just words, but ‘chunks’ of language and can make use of them fluently in their speech.  ‘They [these pre-fabricated chunks] seemed so well-learnt that the only explanation he could offer was that they had been memorized as complete units’ (Howatt 1984:157). Howatt goes on to quote from Prendergast as follows:


When they (i.e. the children) utter complete idiomatical sentences with fluency, with accurate pronunciation, and with decision, while they are still incapable of understanding any of the principles according to which they unconsciously combine their words in grammatical form, it is obvious that they must have learnt, retained, and reproduced them by dint of imitation and reiteration.

                                    (Prendergast 1864:11 cited in Howatt 1984:157)


Despite this remarkably modern-sounding description of first language acquisition, it was to be another hundred years[27] before it was to become an accepted part of language research and teaching methodology, where it has been subsumed as an integral part of what has become known as the Lexical Approach to language teaching. This section of the thesis will deal in some detail with the concept of pre-fabricated language and chunks, and it may be useful at this point to specify exactly what will be discussed and what will not. There is a wealth of literature in this area and so this section will concentrate only on the following issues:


·    an overview of definitions and types of multi-word items (MWIs) as found in the literature

·    characteristics of MWIs: fixedness and non-fixedness, form and function, competence and performance

·    pedagogical application of the above in the form of the Lexical Approach


Other important issues, those of first language acquisition, and the development of collocational and MWI knowledge in second language learners will only be dealt with in relation to the issues listed above. This is not to belittle their importance, but this thesis is less concerned with how people learn, and more with what they need to learn.[28]


4.7.2 What are multi-word items?


The definitions and terminology around multi-word items are disparate and confusing. Moon notes ‘There are many different forms of multi-word item, and the fields of lexicology and idiomatology have generated an unruly collection of names for them, with confusing results’ (1997:43). She then attempts a definition:


A multi-word item is a vocabulary item which consists of a sequence of two or more words... This sequence of words semantically and/or syntactically forms a meaningful and inseparable unit. Multi-word items are the result of  lexical (and semantic) processes of fossilisation and word-formation, rather than the results of the operation of grammatical rules.                                                                                   (Moon 1997:43)


In suggesting a definition for MWIs, Moon is drawing on a long tradition of creating different terminology for describing essentially the same phenomenon.[29] The more recent notion of the importance of prefabricated language came from studies into first language acquisition (e.g. Peters 1983, Krashen & Scarcella  1978). The idea was also discussed by Corder (1973) who used the term holophrase to describe the phenomenon where ‘idioms, clichés, and non-canonical forms are stored as patterns’ (Nattinger 1980:338). Since then, definitions have included gambits (Keller 1979), conventionalised language forms (Yorio 1980), prefabricated language (Pawley & Syder 1983), fixed expressions (Alexander 1984 and Cowie 1988), lexical phrases (Nattinger 1980, 1988, Nattinger & DeCarrico 1992), ideational, interpersonal and relational expressions (Fernando 1996) and word combinations (Howarth 1998).  Other definitions of MWIs that will be examined in this section are given by Kjellmer (1991), Lewis (1993, 1997), Weinert (1995), Moon (1997) and Williams (1998). The next section reviews this varied literature on MWIs, taking an essentially chronological perspective.


4.7.3 Gambits


One of the earlier works of note was that of Keller (1979) on gambits. Gambits were described essentially as phrases used by speakers to introduce what they are about to say. More specifically, Keller defined them as


... a certain set of signals in the conversationalist’s speech, used to introduce level shifts within the conversation, or to prepare listeners for the next turn in the logical argument.  (Keller 1979:220)


Keller had gathered authentic data of everyday Canadian speech and analysed the transcripts to arrive at the concept of gambits. He stressed that gambits are not idioms, but function at the ‘psychological level’ of discourse in four main ways:


·    Firstly as semantic framing whereby the speaker ‘delimits the type of discourse he is choosing: questions, answers, exposition, digression etc.’ (1979:225). An example would be when initiating a speech turn, a person would say This reminds me or Speaking of.

·    The second category signals social context, that is they ‘signal the speaker’s special social role status, or his claim to such a status’ (1979:226). These include turn-taking signals such as Wait a second, or May I interrupt you for a moment. 

·    The third category, state of consciousness signals, ‘indicates a person’s state of consciousness concerning information, opinions or emotions’ (1979:228). Thus a person can indicate they are ready to receive information by saying I’d like to know more about.

·    The final category is that of communication control signals that ‘serve to assure that the listener is in a state of consciousness permitting the reception of the message’ (1979:229). Examples of this would be Are you following me? or Is that clear?  Keller noted that gambits force a top-down approach to looking at the whole of discourse and that more research was needed.[30]


The definitions of aspects of MWIs that followed Keller’s are many and, as noted above, confusing. They are laid out here in a more or less chronological order, though there is some overlap. Each author is given a separate section. This leads to a discussion in the second part of this section where the most important underlying factors involved in the discussion on MWIs will be elucidated.


4.7.4  Other definitions of MWIs


Yorio: Yorio (1980) took a much broader view of MWIs than Keller had done, and proposed the term conventionalised language forms to cover two areas of language: idioms and routine formulas. Yorio saw language as arbitrary, but suggested that ‘a certain form can be said to be conventional when it is predictable and expected by the members of a speech community in a certain situation’ (1980:434). Routine formulas were then broken down further into five categories (including the gambits of Keller), and euphemisms. A key factor with regard to prefabricated language is also mentioned here - that of the increased efficiency of language processing afforded by use of these blocks of language:

Conventionalized forms make communication more orderly because they are regulatory in nature. They organize reactions and facilitate choices, thus reducing the complexity of communicative changes.[31]                                                                                      (Yorio 1980:438)


Pawley & Syder: This notion of the way chunks of language speed up processing had been made earlier by Peters (1983), who noted that children took advantage of chunks to cope with an imbalance of memory capacity and processing speed. It was also noted by Pawley & Syder (1983), who concluded that this does not detract from the creativity of spoken discourse but rather that ‘Freed from the task of composing such sequences word-by-word, so to speak, the speaker can channel his energies into other activities’ (Pawley & Syder 1983:208). In their 1983 article, Pawley & Syder were trying to understand two key abilities of native speakers of a language: nativelike selection - the native speaker’s ability to produce not just grammatical but also ‘nativelike’ sentences and nativelike fluency - the native speaker’s ability ‘to produce fluent stretches of spontaneous connected discourse’ (Pawley & Syder 1983:191). Their answer was that native speakers utilise knowledge of a vast amount of pre-stored sentence stems which are institutionalised or lexicalised. These are divided into:

·    memorized sentences: for example, Can I come in?, There’s no pleasing some people, which are part of a speaker’s ‘performance’ and;

·    lexicalised stems: these belong to the realm of ‘competence’. Lexicalised stems, unlike the memorized phrases, are capable of expansion and substitution. They give the example of a base form (NP tell - TENSE the truth)  and how it can be expanded: Tell the truth/ Jo seldom tells the truth (1983:211).


Pawley & Syder continued by giving three criteria by which a sentence or sentence stem is lexicalised:


1. it denotes a culturally accepted concept

2. it is recognized as a standard expression for the meaning in question

3. it is a linguistically arbitrary choice  (1983:211).


A lexicalised sentence or sentence stem can be a complete sentence or, more often, an expression that is smaller than a complete sentence (1983:210). They can be, but are not usually, true idioms. Rather they are more commonly literal expressions (i.e. their meaning is clear from the words themselves). Pawley & Syder identify two main ways by which native speakers integrate these lexical sentences into speech: a clause integrating strategy, where the speaker takes account of previous and forthcoming grammatical constructions, and secondly a clause chaining style where clauses are ‘strung’ together relatively independently of each other.


Alexander: Whilst Pawley & Syder saw both fixedness and flexibility within their lexical phrases, other writers have wanted to stress the fixed aspect of MWIs. Alexander (1984), in an article concerned with reference books and teaching with regard to fixed expressions in English, described five broad categories of fixed expressions ‘ranging from lexically-oriented idioms and their many subcategories, through discourse-structuring devices, such as gambits and proverbs and proverbial idioms, to the more encyclopedia-oriented expressions such as catchphrases and quotations’ (Alexander 1984:128).


Cowie: Cowie (1988) also stressed the fixed nature of many expressions.[32] He identified two major groups of word combinations: pragmatically specialized combinations such as in greetings - how are you ? and semantically specialized combinations such as kick one’s heels and pass the buck which ‘have developed more or less unitary referential meanings by virtue of their use as invariable units in grammatical constructions’ (Cowie 1988:133).


Nattinger & DeCarrico: Two of the most notable proponents of a lexical approach to language research and teaching have been Nattinger & DeCarrico. As early as 1980 Nattinger had noted that ‘we need to pay more attention to the importance of prefabricated speech routines in language behaviour’ (1980:337). In Nattinger (1980,1988) and Nattinger & DeCarrico (1992), the various kind of lexical phrases, as they term them, were elucidated. In the 1980/88 articles Nattinger placed lexical phrases into the six categories shown here:


1. Polywords: short fixed phrases, whose meaning is often not analysable by the regular rules of syntax.  They can substitute for single words e.g. kick the bucket, powder room, put up with.

2. Phrasal constraints: short, relatively fixed phrases with slots that permit some variation.

3. Deictic locutions: short to medium length phrases of low variability to monitor conversation, e.g.  as far as I know, if I were you.

4. Sentence builders: phrases of up to sentence length - highly variable phrases containing slots, e.g. not only X but Y.

 5. Situational utterances: usually complete sentences e.g. I’ll see you next week.

6. Verbatim texts: e.g. numbers, alphabet, days of week, aphorisms and proverbs.


By 1992 Nattinger & DeCarrico proposed a more refined definition. They firstly defined lexical phrases in relation to collocation, distinguishing lexical phrases from collocations by saying that


Prefabricated phrases are collocations if they are chunked sets of lexical items with no particular pragmatic functions; they are lexical phrases if they have such pragmatic function.  

                                                           (Nattinger & DeCarrico 1992:37)



Thus a key aspect of MWIs for Nattinger & DeCarrico is their pragmatic function.[33]

Nattinger & DeCarrico (1992) continued by suggesting that lexical phrases can be discerned according to four structural criteria:


·    Firstly, length and grammatical status: how long they are and if they operate at word or sentence level.

·    Secondly, canonical/non-canonical shape: if the phrases conform to regular rules of syntax in their formation or if they are deviant in some way. For example, the phrase off with his head can be considered deviant (there is no verb) and therefore non-canonical in shape.

·    Thirdly, variable or fixed: if any variation is allowed in the lexical phrase or not. For example, in the phrase hold your horses no variation is allowed, but in the phrase I wouldn’t touch that with bargepole quite a lot of variation is allowed, whilst still retaining its recognisable character.

·    Finally, if a phrase is continuous or discontinuous: if there is an unbroken sequence of words or if the phrase is interrupted by lexical fillers.


Based on these four criteria, lexical phrases are then divided into four categories as opposed to the six earlier. These are summarised in the figure below (Nattinger & DeCarrico (1992:45):





Type of phrase

Grammatical level








word level






sentence level




Phrasal constraints

word level


somewhat variable

mostly continuous

Sentence builders

sentence level


highly variable

often discontinuous



For the sake of clarity these categories are now further exemplified:

1. Polywords: ‘short phrases which function very much like individual lexical items’ (1992:38). They operate at the word as opposed to sentence level. Examples would be for the most part, in a nutshell.

2. Institutionalised phrases: ‘lexical phrases of sentence length, usually functioning as separate utterances’ (1992:39). Examples would be how do you do?, there you go, long time no see.

3. Phrasal constraints: ‘short- to medium-length phrases associated with a wide variety of functions. Examples would be a ____ ago, as I was ______, see you _______.

4. Sentence builders: ‘lexical phrases that provide the framework for whole sentences’ (1992:42). Examples would be I think (that) X _________, My point is that __________, It’s only in X that Y ....


There are two points of note to be made here. Firstly, these categories are interrelated: institutionalised phrases are sentence-level versions of polywords, and sentence builders are sentence-level versions of phrasal constraints. Secondly, there are no sharp boundaries between these groups: ‘the differences between them are frequently ones of degree rather than kind’ (1992:46).


Lewis: Lewis (1993:92-95) built on earlier work in the field using similar terminology to Nattinger & DeCarrico. He sees both a pragmatic purpose and message-oriented purpose for MWIs and suggests the following three categories of multi-word item.


1. Polywords: Short, 2- or 3-word compounds ranging from opaque to totally transparent meaning. Examples would be taxi rank, record player, put off, of course.

2. Collocations: These range from free collocations (red car) to totally fixed collocations (vested interest) - the latter category being one kind of polyword. Collocations are non-reciprocal[34] (as was noted by Kjellmer (1991) earlier) and are not pragmatically tied and thus differ from the next category of institutionalised phrases.

3. Instutionalised expressions: These are pragmatic in character and ensure efficient processing in speech and writing. These include three categories: firstly, short utterances: Not yet, certainly not; sentence heads or frames: Sorry to interrupt, but can I just say ....

and finally full sentences with a readily identifiable pragmatic meaning.


Weinert: Most of the definitions have so far been quite complex and this perhaps led Weinert (1995) to take a more simplistic view of categorisation:[35]


A variety of labels have been used to describe formulaic language:

formulas, prefabricated or ready-made language, chunks, unanalysed

language or wholes, etc. I will use these terms interchangeably.

                                                                           (Weinert 1995:182)


Fernando: A similar, more straightforward categorisation is presented by Fernando (1996). Her study of idioms and idiomaticity presented three broad Hallidayan categories that expressions can be placed in: ideational, interpersonal and relational. Ideational expressions ‘contribute to the content of a discourse’ (1996:185). They function as ‘impressionistic packages of information’ (1996:188) and examples would be red herring, spill the beans, walk on air and make up. Interpersonal expressions function as organizers of interaction between language users, with greetings and farewells being the most common. Relational expressions have a discourse function and ‘make explicit the semantic unity of a discourse’ (1996:185). Examples would be phrases like in a jiffy, round the clock, not only x but also y.


Moon: Simplicity of categorisation, however, is criticised by Moon (1997).  Moon herself identified three main criteria by which MWIs can be recognised:

1. Instiutionalisation: the degree to which  an MWI is conventionalised in the language.

2. Fixedness: the degree to which it is frozen as a sequence of words.

3. Non-compositionality: ‘the degree to which an MWI cannot be interpreted on a

word-by-word basis, but has a specialised unitary meaning’ (Moon 1997:44). This is usually semantic, as in kick the bucket but can also be grammatical and pragmatic, as in of course (of course is ungrammatical).   


Moon uses MWI as a superordinate term to cover several kinds of linguistic item, which she then lists:


1. Compounds: The largest and most tangible group, but the least interesting. In this category it is possible to see the movement of language at work, for example in the case of  car park - the words are slowly being pushed together to form a compound: car park, car-park, carpark (1997:45). Moon continues by saying that ‘Compounds are generally fixed but their institutionalisation can vary as widely as any other lexical items. The degree to which they are compositional varies too’(1997:45).

2. Phrasal verbs: combinations of verbs and particles: bring out, send back, head off.

3. Idioms: ‘multi-word items that are not the sum of their parts: they have holistic meanings which cannot be retrieved from the individual meanings of the component words’ (1997:46), for example, spill the beans. However ‘idioms are by no means as fixed as conventional accounts suggest’ (1997:47). Idioms will be returned to later in the chapter.

4. Fixed phrases: MWIs that fall outside the previous categories. Grammatical and discourse items e.g. of course, at least, by far can be included here. Also similes: white as a sheet and greetings - how do you do. Many are strongly institutionalised, have a high frequency and are strongly fixed (1997:47).

5. Prefabs:


Prefabs are preconstructed phrases, phraseological chunks, stereotyped collocations, or semi-fixed strings which are tied to discoursal functions and which form structuring devices.                                   (Moon 1997:47)


Moon shows here the confusion in terminology in that these ‘prefabs’ were called ‘sentence stems’ by Pawley & Syder (1983) and lexical phrases by Nattinger & DeCarrico (1992) ‘although they use this as a superordinate term to encompass other kinds of multi-word item’ (1997:47). Inevitably, therefore, there are overlaps between the categories.


Williams: It was noted previously that Nattinger & DeCarrico took a pragmatic approach to the study of MWIs and were thus presenting a ‘functional model’ (Moon 1997) to their study. Williams (1998) also takes a pragmatic stance in terms of prefabricated language. Her study of negotiating language from a 117,000 word corpus of 24 simulated case study negotiations utilised three categories of prefabricated chunks, which were then matched to ten ‘golden rules’ of negotiating.  Her pragmatic stance also extended to her main focus - that of getting her research results into the classroom. Williams’ three categories of prefabricated language were functional stems, purely lexical chunks and semi-lexical chunks. These are defined below:


1. Functional stems: These are ‘recognisable components of what are presented as functional exponents in published materials’ (1998:48). Examples would be I don’t think and Can I just with a subject and would have to, don’t want to without a subject.

2. Purely lexical chunks: These are described as ‘fully lexical items which are of little interest because they are linked  to a specific context’ (1998:51), e.g. the new system, though Williams says two types are of interest:

a. Chunks used for commenting on the negotiation: we can come to

b. Chunks with pragmatic meaning: at the moment/at this moment

3. Semi-lexical chunks: Williams notes that ‘the traditional division of lexis into content and function words is over-simplistic. Many lexical items are delexicalised or semi-lexical and find their reference and meaning in their context’ (1998:54), e.g. in terms of, on the basis, on the basis of.  Many are used with pragmatic function. e.g. one of the  is multi-functional.



4.7.5  Discussion


The above descriptions of MWIs and their various definitions lead to unavoidable confusion. This confusion can be neither good for researchers nor students. Williams (1998) noted quite succinctly that ‘there is little agreement on the categories used; moreover, many of the distinctions, particularly those relating to form, seem rather subtle’ (Williams 1998:24). The categories can be seen to be perhaps too many (Moon, Nattinger) or too few (Weinert). The table below gives a brief outline of some of the key definitions of MWIs noted in this review.





Categories of Multi-Word Items

Prendergast  1864

Chunks and idiomatic sentences taken in and then used by children

Palmer 1917

Polylogs: collocations and longer fixed phrases

Keller 1979

Gambits: Phrases that ‘serve to introduce what a speaker is about to say’ (1979:220). Gambits have four main functions as:

1. semantic introducers; 2. signalling social context; 3. signalling a speakers state of consciousness; and 4. communication control function.

Nattinger 1980

1. Polywords: short fixed phrases, the meanings of which are often not analysable by the regular rules of syntax.  Can substitute for single words e.g. kick the bucket, powder room, put up with. Euphemisms, slang, 2-3 part verbs and idioms.

2. Phrasal constraints: short, relatively fixed phrases with slots that permit some variation.

3. Deictic locutions: short to medium length phrases of low variability to monitor conversation - as far as I know, if I were you etc.

4. Sentence builders: phrases of up to sentence length - highly variable containing slots - not only X but y. 

5. Situational utterances: usually complete sentences e.g. I’ll see you next week. The appropriate thing to say in certain circumstances.

6. Verbatim texts: numbers, alphabet, days of week, aphorisms, proverbs etc.

Yorio 1980

Idioms and routines formulas. Routine formulas defined by:

1. Situation formulas - associated with a specific situation - this hurts me more than it hurts you. 

2. Stylistic formulas - used when a certain register or style is used.

3. Ceremonial formulas.

4. Gambits -as in Keller above.

5. Euphemisms - avoidance formulas

Pawley & Syder 1983

Lexicalized sentence stems

Alexander 1984

Five categories: 1. idioms  2. discourse-structuring devices 3. proverbs 4. catchphrases  5. quotations/allusions

Cowie 1988

Two major groups:

1. Pragmatically specialized: good morning, how are you

2. Semantically specialized/idiomatic: kick one’s heels, pass the buck 

Nattinger 1988

As Nattinger 1980

Kjellmer 1991

Set Expressions: fossilized,  semi-fossilized and  variable phrases.

Nattinger & DeCarrico 1992

polywords, institutionalised phrases, phrasal constraints, sentence builders

Lewis 1993

1. Polywords: short, 2 or 3 words ranging from opaque to totally transparent meaning.

2. Collocations: fixed collocations are one kind of polyword. They are non-reciprocal. Collocations are not pragmatically tied and so differ from institutionalised phrases. Words and collocations are more interested with the content of what the language user expresses rather than what the language user is doing (1993:94).

3. Instutionalised expressions: pragmatic in character - ensure efficient processing in speech and writing:

short utterances: Not yet, certainly not 

sentence heads or frames: Sorry to interrupt, but can I just say ....

full sentences: readily identifiable pragmatic meaning

Weinert 1995

Formulaic language

Fernando 1996

Ideational, interpersonal and relational expressions

Moon 1997

Multi-word items:

compounds, phrasal verbs, idioms, fixed phrases, prefabs

Howarth 1998

Word combinations: functional expressions (idiomatic and non-idiomatic) / composite units (grammatical & lexical composites)

Williams 1998

Prefabricated chunks: functional stems, purely lexical chunks, semi-lexical chunks



The confusion of terminology generated in the literature is further exemplified in the table below, where the same lexical phenomena are referred to by different researchers using different terminology:






What one calls ...

The other calls ....

Keller:  semantic framing gambits       >>

Nattinger - deictic locutions

Keller: subject expansion semantic      >>

framing gambits

Nattinger & DeCarrico:

sentence builders

Lewis: institutionalised phrases          >>

Nattinger & DeCarrico:

phrasal constraints/sentence builders

Moon: fixed phrases                            >>

Nattinger & DeCarrico:

institutionalised phrases


The studies can be criticised for other reasons, too. With the exception of Moon (1997) and Williams (1998) all of this work is based on intuition and introspection. Keller (1979) did use a corpus but this was a) rather small - only 131,536 words b) from a very limited area - boardroom and media discussions and c) the final sorting of gambits for inclusion was not based on frequency but on native speaker intuition of a panel of three ‘judges’. The use of intuition is also very clear in the diverse terminology used. However, despite the conflicting terminology, analysis of this category of language forms an important part this thesis, and thus it is useful that the central features of MWIs are pulled out of the chaos of terminology and given more systematic treatment.


4.7.6  Characteristics of MWIs: making sense of the definitions


The following section will look at the underlying characteristics of the definitions presented in the previous section. In doing this three separate dichotomies will be used to discuss the methodological implications of adopting this view of language:


·    the fixedness and non-fixedness of lexical categories

·    the relation of form to function

·    competence and performance - is this dichotomy needed, and where does lexical methodology stand in this debate?


Following on from this, the implications of these discussions for teachers and language learners will be discussed in reference to the lexical approach (Lewis 1993, 1997). An explanation of why knowledge of pre-fabricated language is important and, therefore, why it is important to include MWIs in this study will then be presented, leading to a definition of how prefabricated language is investigated in this thesis.  Fixed and non-fixed: points on a continuum


It has been seen throughout this thesis that certain aspects of language can be viewed as forming continua. In the last chapter it was seen that Business English language could be considered to be layered, going from transparent to opaque meanings. It was also seen earlier in this chapter that collocations go from the unique and totally fixed, to free combinations of words. The same phenomenon has also been pointed out by several writers with regard to multi-word units. The concept of continua, therefore, affects not only words, but also classes[36] of words and strings of words that go to form the MWIs.


Pawley & Syder (1983) constantly stressed the fact that their categories should not be seen as independent units but that ‘in seeking discrete classes we are in danger of misrepresenting the nature of the native speakers’ knowledge’ (1983:212). Nattinger & DeCarrico, after reviewing different kinds of MWIs, noted that many linguists had previously seen idioms and other ‘frozen forms’ (1992:34) as separate from mainstream language, considering them as part of less creative language. They, however, take a different approach. They say that ‘it is more likely that what constitutes a pattern and what does not is relative, a matter of degree instead of kind, for one usually finds a continuum in the amount of variation involved, from more invariable and frozen forms (such as idioms and clichés) to less invariable (non-canonical) forms’ (1992:34).


Howarth (1998) also sees what he calls ‘formulaic language’ as being on a continuum, shown in Table VII below. Categorisation of the phrases is possible by creating a continuum  ‘derived from the application of such criteria as restricted collocability, semantic specialization, and idiomacity, each of which is gradable’ (Howarth 1998:28).




free combinations

restricted collocations

figurative idioms

pure idioms

lexical composites

verb + noun

blow a trumpet

blow a fuse

blow your own trumpet

blow the gaff

grammatical composites

preposition + noun

under the table

under attack

under the microscope

under the weather

                                                                                                            (Howarth 1998:28)


This gradability is represented in the table above. Howarth places ‘pure’ idioms at one end of the scale, sharing a similar view of them to Nattinger & DeCarrico: that of fixedness.


However, this traditional view of idioms is challenged by Moon (1997). She initially gives a quite traditional definition of idioms, saying that they are:


 ... multi-word items that are not the sum of their parts: they have holistic meanings which cannot be retrieved from the individual meanings of the component words.                               (Moon 1997:46) 



Examples of these idiomatic MWIs would be kick the bucket and rain cats and dogs.  She then goes on to issue a warning (noted earlier) by saying that ‘idioms are by no means as fixed as conventional accounts suggest’ (1997:47). Using the COBUILD corpus to analyse occurrences of idioms and fixed phrases in English she found that of the occurrences of idioms, forty per cent ‘regularly varied and were unstable in form’ (1997:52). She gives the example of not touch something with a bargepole and shows how considerable variation is allowed. She then says that might be better to have a notion of ‘preference of form’ or ‘preferred lexical realisation’ rather than ‘fixedness of form’, and better to build on the fact that there is a complex relationship between deep semantics and surface lexis, rather than it all being a simple case of individual anomalous strings with non-compositional meanings.                                                                               (Moon 1997:53)


This is also pure ‘Sinclarian’ thinking. As early as 1991 Sinclair had stated


One is struck first by the fixity and regularity of phrases, then by their flexibility and variability, then by the characteristically creative extensions and adaptations which occur, sometimes more than the ‘ordinary’ form.                                                       (Sinclair 1991:104)



Williams (1998), after Fernando (1996), also sees the variability of these phrases and places MWIs on two further continua. She notes that


            ... prefabricated chunks straddle the lexis/grammar divide and that two             fundamental  continua are involved:


            ideational chunks --->----->---->------chunks with pragmatic meaning


            fixed expressions ----->----->----->--- semi-fixed expressions

                                                                                    (Williams 1998:24-25)


Thus MWIs can be placed along a series of continua:


            fixed                            >>>>>>>>>>>>>                  non-fixed

            continuous                  >>>>>>>>>>>>>                  discontinuous

            compositional             >>>>>>>>>>>>>                  non-compositional

            ideational                     >>>>>>>>>>>>>                  pragmatic



The last of these continua - ideational to pragmatic meaning - will be returned to when discussing the competence/performance question. Before doing that, however, another point raised by Williams (1998) should be addressed - that of the relationship between linguistic form and function. The relationship of form and function


It was another Williams, Marion, in 1988 who pointed out the lack of correlation between the language used in meetings and the language used to teach them in Business English books of the time. The materials she studied worked on the assumption that there is a one-to-one relationship between a linguistic form or expression and the function to which it would be used. Williams was also quick to criticise teaching materials as using examples of functional exponents that were too long and overly polite.[37] This criticism was taken up by Anne Williams (1998). Anne Williams noted the contradiction whereby the old functional approach has been somewhat discredited by these criticisms, but at the same time, the ‘lexical approach’ that advocates very similar large-ish chunks of language for language learners is very much in vogue. Criticism is thus focused on how far there can be seen to be a relationship between linguistic form and language function. The literature shows diversity of thought on this matter. Keller (1979) seemingly placed form and function as compatible, as his taxonomies represent a one-to-one relationship between a gambit and its pragmatic usage. Cowie (1988) criticises this in Keller’s work, saying that many of the gambits ‘lack the fixity of form which is a precondition of complete specialization in a given discourse function’ (Cowie 1988:133). Pragmatic specialisation for Cowie is one of degree, but he does stress the large amount of language that is totally fixed.


Williams (1998) largely rejects the link between form and function, at least as it was seen in the traditional sense in language teaching materials. She prefers to link a variety of language chunks to ten ‘golden rules of negotiating’ rather than tie them to individual functions,[38] as she found, especially with the shorter chunks that


... not only were they used at various stages of the negotiation to express a variety of functions; a number of chunks can be linked to

more than one of the golden rules.                            (Williams 1998:78)


Williams does note that there are some cases of a one-to-one relationship especially within her class of ‘functional stems’ (she gives the example of Could you repeat that ? and asking for repetition). She finds in this category and in her other two, (‘purely lexical chunks’ and ‘semi-lexical chunks’) that chunks are far more likely to perform a variety of roles and thus be multi-functional.

In taking this view Williams is somewhat at odds with Nattinger & DeCarrico (1992). Their view of lexical phrases was that they are ‘form/function composites’, i.e. the phrases perform certain pragmatic functions within the language. They see lexical phrases as having three main kinds of ‘function’: social interactions,[39] necessary topics and discourse devices. These three functions are then used as umbrella terms under which the three formal categories (polywords, phrasal constraints and sentence builders) are used to further categorise the lexical phrases. However, although they lean heavily towards the view that form and function can be joined, they also admit that ‘the pairing of form and function remains to some extent arbitrary’ (1992:54). Henry (1996) concurs with Nattinger & DeCarrico, but also suggests that ‘there is likely to be a very close link between certain phrases and certain ‘genres’’ (1996:297). Henry gives the example of the relationship of banking transactions and the phrase how would you like the cash? It is indeed hard to imagine another typical situation where this phrase would typically occur.


There thus seems to be a range of thinking in the literature on the form/function debate, from that which places certain phrases firmly within a given context or pragmatic purpose, and a view that phrases and chunks are multi-functional/situational. The view taken in this thesis incorporates both ends of the spectrum. The evidence of Williams (1998) cannot be refuted and it is thus taken as fact that many chunks of language, especially shorter chunks, are indeed multi-functional. However, it is also held to be true that certain blocks of language are tied to a given genre as presented by Henry above, and consequently perform only one or a limited number of linguistic/pragmatic functions. It may be speculated at this point that shorter chunks tend to be multi-functional and longer ones tend to be genre- or function-specific. These points are returned to in Chapter 9.  Competence, performance, the idiom principle and multi-word items


It was noted above by Fernando (1996) and Williams (1998) that chunks of language range on a scale from pragmatic to ideational meaning. The ability to use this kind of language, that is, the ability to appropriately choose the correct chunk for the correct pragmatic situation has also given rise to some discussion in the literature. The debate has centred around a definition of what kind of competence is needed in these situations, and how this competence can be related to the traditional Chomskyan competence/ performance divide. Latterly, it has led to a complete rejection of Chomsky’s ideas by some writers.


In an influential article in 1989, Widdowson discussed the differences between Chomsky’s ideas and Hymes’ (1972) definition of communicative competence. He concluded that ‘for Hymes linguistics is about language and for Chomsky it is not’ (Widdowson 1989:129). He continued ‘For Chomsky, then, competence is grammatical knowledge as a deep-seated mental state below the level of language...For Hymes, on the other hand, competence is the ability to do something: to use language’ (1989:129). Hymes, therefore, believes knowledge of a language is not enough: there also has to be the ability to use it. Widdowson then presents a lexical view of language by saying that ‘there is a great deal that the native speaker knows of his language which takes the form less of analysed grammatical rules than adaptable lexical chunks’ (1989:132). If this lexical approach is adopted then


... communicative competence is not a matter of knowing rules for the composition of sentences and being able to employ such rules to assemble expressions from scratch as and when occasion requires. It is much more of knowing a stock of partially pre-assembled patterns, formulaic frameworks, and a kit of rules, so to speak, and being able to apply the rules to make whatever adjustments are necessary according to contextual demands.                          (Widdowson 1989:135)


Nattinger & DeCarrico see this kind of competence as what they call ‘pragmatic competence’. They do not reject Chomsky’s divisions and see them as still ‘valid’ (1992:7). However, they add pragmatic competence to the picture. This pragmatic competence, though, they see as separate from traditional views of competence and place it somewhere on a line from ‘strict grammatical competence on the one hand, and performance factors such as processing, memory limitations, false starts etc. on the other’ (1992:8).


A much firmer stance against Chomsky, however, is taken by other writers. Sinclair, in putting forward the idiom principle discussed in the previous section, completely rejects the competence/performance distinction. He argues that the distinctions made by Chomsky and Saussure before him were theoretical abstractions that helped organise the seeming chaos of language. However, with the advent of the computer they are unnecessary as the ‘chaos’ of language can be ordered by evaluating typical instances and selecting the most typical (Sinclair 1991:103). There has thus been a shift in emphasis from hypothetical language to real data (Lewis 1993:12).[40]


It can be seen therefore that the latest thinking tends to reject Chomsky’s ideas and points the way to a new kind of ‘competence’ - that of being able to know, understand and marshal the use of prefabricated blocks of language to generate fluent discourse. This has been the starting point of the Lexical Approach to language teaching, which has suggested that former teaching practices need to be re-thought and old grammar-based syllabuses need to be replaced by ones that place lexis at the forefront. The next section will look at this approach and why it is so important for language learning. This will also provide a justification for including the study of prefabricated language in this thesis.


4.8  The Lexical Approach


The work of Nattinger & DeCarrico (1992) has played a part in the wider movement popularising a lexical approach towards language teaching, but their work has been criticised for not going far enough. Henry (1996) criticised them on four counts:


1. Learners are only presented with chunks - they do not discover them themselves.

2. Phrases are learned but how can teachers create situations in which to practise them?

3. It is difficult for teachers to ensure students remember the right slot-fillers.

4. Not enough attention is paid to aspects of  speech: tone, rhythm, timing etc.


It was primarily to address some of these issues that Willis (1990), Lewis (1993, 1997, 2000) and others (Hill 1999, Morgan Lewis, Hill, Conzett, Woolard 2000) have advocated a lexical approach to language and language teaching. Lewis retrospectively defined the lexical approach in 1997 as follows:


... the Lexical Approach places communication of meaning at the heart of language and language learning. This leads to an emphasis on the main carrier of meaning, vocabulary. The concept of a large vocabulary is extended from words to lexis, but the essential idea is that fluency is based on the acquisition of a large store of fixed and semi-fixed pre-fabricated items, which are available as the foundation for any linguistic novelty or creativity.                         (Lewis 1997:15)


Lewis’s writings are broadly based and draw as much on philosophy as they do on linguistic literature. However, it is possible to summarise the lexical approach using the following four points:


·    language rests on a series of continua

·    language consists of grammaticalised lexis, not lexicalised grammar

·    collocations are central to language production and should be more actively taught

·    ‘used language’ language should be stressed: probable language rather than possible language.


Each of these four areas will now be looked at briefly in turn.


a)  Language rests on a series of continua


Lewis believes that concepts of language should not be polarised and presented as separate independent units. Instead their constituent parts should be seen as points on series of continua. These continua he calls ‘spectra’ (1993:37) and he presents seven spectra in order to put his views forward. Lewis thus concurs with the discussion held previously on the fluid nature of language and the range of lexical items from the fixed to the non-fixed (Pawley & Syder 1983, Nattinger & DeCarrico 1992, Howarth 1998).


1. Spectrum of generative power: Lewis believes in the generative power of some words as opposed to the structuralist view that it is grammar that generates meaning, and that words are fixed blocks to add onto this structure. Lexis for Lewis ranges from grammar words that generate phrases, to unique and precise terms of vocabulary that are totally fixed. In stressing the generative power of words, Lewis is concurring with Sinclair & Renouf (1988) who, in an article concerning the lexical syllabus, stress the importance of de-lexicalised verbs and their power to generate a multitude of meanings in combination with other words.[41]

2. Spectrum of generalisability: Language teaching has largely presented language in terms of fixed grammatical rules which may have exceptions. Lewis suggests that rather than think of rules and exceptions we should think of the generalisability of the statements about language. Thus, students should be informed that certain items are fixed, but most are on a sliding scale of generalisability.

3. Spectrum of communicative power: Not all words are equally useful - verbs must play a central role, but in a lexical rather than in a grammatical role. Lewis notes that ‘Language teachers, usually accidentally, see vocabulary largely in terms of nouns, and the teaching of verbs has largely been confined to work on their structure (Lewis 1993:39). Analysis of the BEC and PMC has made findings in this matter which bear out Lewis’s views on this. Distinct differences were found between nominal and verbal usage in real-life business and published Business English materials. These differences will be discussed in detail in Chapter 9.

4. Spectrum of likelihood: Teachers should present language to students in terms of how likely language is, rather than seeing it in terms of simply being correct or non-standard. Lewis gives the example of the non-count quality of the word weather - which is usually taught as a non-count noun. However, the phrase out in all weathers goes against this rule so absolute statements regarding this noun, and language in general, should be avoided.

5. Spectrum of acceptability: Lewis suggests the development for teachers of a spectrum of acceptability of language used by students, as opposed to the simple right/wrong method in use today. This, he realises, is a contentious issue.

6. Spectrum of conventionality: Here Lewis discusses the fact that language is arbitrary - a dog is called a dog for no apparent reason. Therefore, language is a matter of convention and he notes that ‘some language is much more a matter of convention than other language (linguistic not social convention)’ (1993:41). In this he is referring to written language which he regards as more conventionalised that spoken. Language should thus be seen on a scale of conventionality.

7. Spectrum of categorisation: Lewis suggests we should not pre-categorise words too readily:


Pedagogically, if students learn words as belonging to a particular category, they may well not see, and be unwilling to experiment with, the kind of flexible categorisation which maximises communicative power.                                                                          (Lewis 1993:42)


Thus for Lewis language, and the categories by which it can be defined, are not fixed. This, of course, is quite different from traditional structuralist views of language, where the definition of grammatical class was the basis for the teaching syllabus. Different too, therefore, is his approach to grammar and lexis.


b)  Language consists of grammaticalised lexis, not lexicalised grammar


Traditionally, grammar has held pride of place in teaching syllabuses, with lexis or vocabulary at best a poor second. The lexical approach, in contrast, puts lexis at the forefront as an organising principle. Lewis states simply in the introduction to Implementing the Lexical Approach  that ‘language consists not of traditional grammar and vocabulary but often of multi-word prefabricated chunks’ (1997:3). However, Lewis is careful to add that this approach still holds grammar in high regard and recognises the generative element of grammar without which ‘novelty and innovation - possible language - become impossible’ (1997:14). Grammar thus facilitates language use when speakers need to create something novel and new but it is only needed ‘when we are unable to find what we want ready-made in our mental lexicons’ (Morgan Lewis[42] 2000:15). The lexical approach here echoes the idiom-open principles of Sinclair (1991).


Although Lewis’s ideas are not based on any empirical work of his own, there has been enough work done by others noted in this chapter to fully justify his beliefs. It was noted previously in the section of grammatical and lexical collocations that writers now believe that lexis and grammar are fully integrated systems. Hunston & Francis (1998), for example, indeed refer back to the work of Lewis in their article on creating a pedagogic grammar from the COBUILD corpus. Their concept of pattern noted earlier shows that ‘lexical items have describable patterns’ (Hunston & Francis 1998:69) and that lexis and grammar are part of an interrelated system where ‘patterns...bridge the gap between lexicalizations and rules’ (1998:63).


c)  Collocations


Collocation is central to the lexical approach. Hill (1999) goes so far as to suggest a possible extension to Hymes’ communicative competence by saying that ‘We are familiar with the concept of communicative competence, but perhaps we should add the concept of collocational competence to our thinking’ (1999:5). Collocational competence is, therefore, considered a key factor in the lexical approach for the learning of language. Studies by Bahns (1993) and Bahns & Eldaw (1993) point to the problems students have when they are unable to successfully collocate. Their views are echoed by Hill (2000) who observes that non-native speakers have problems ‘not because of faulty grammar but a lack of collocations’ (Hill 2000:49).


Powell (1998), Williams (1998) and Morgan Lewis (2000), additionally, all point to the relationship of knowledge of collocation and grammaticalisation. The fewer ready-made chunks of language a speaker has to use, the more they have to grammaticalise what they are trying to communicate. The more students have to grammaticalise, the more chance there is of making language mistakes.[43]


d)  ‘Used language’


The lexical approach stresses the importance of input and this input should be language that is ‘used’[44] in the terms of David Brazil (1995). This means it should be probable language and not just possible language. Language input should be authentic - or at least close to authentic, and teaching materials should get away from the ‘la plume de ma tante’-type hypothetical sentences so popular within the structuralist approach. It is thought within the lexical approach that ‘Good quality input should lead to good quality retrieval. Impoverished input will lead to impoverished retrieval’ (Hill 2000:54).


The discussion here on the pedagogical implications of adopting a lexical approach to language teaching will be continued in Chapter 9, where the pedagogical implications of this research are discussed, especially in relation to materials development. Now though, the manner in which MWIs are studied in this research needs to be presented.


4.9 Multi-word items in this thesis


The previous sections have shown that it is difficult to use even superordinate terms to cover the concept of multi-word items, as so many terms have been used over the years by different researchers. What cannot be in any doubt, however, is their centrality in language use and reception, and consequently their importance to learners of English as a second language. The vast store of lexical chunks and collocations stored in the mental lexicon enables quick retrieval and both speeds and eases communication. This knowledge is shared between native speakers who can recognise the chunks used by each other, thus aiding both production and reception of language. It can also be argued that knowledge of chunks typical to a given genre or discourse community enables the interlocutors to process language in a similar way - each sharing the knowledge of the field. It has already been seen in the previous chapter that this is the case at the level of single words. It only seems logical to assume that the same can be said of multi-word items.


For these reasons it is important to include MWIs in any study of business language. Additionally, as one aim of this research is that the results of this thesis be utilised in the classroom, it is important to take into account the methodology presented by Lewis (1993,1997). This thesis studies MWIs in the following ways:


1. It computes the most frequent multi-word items in the BEC going from two words up to eight word chunks. 

2. It also computes the most key multi-word items - that is, those MWIs that occur unusually frequently in the BEC using the BNC corpus of general English as a reference point.

3. MWIs are analysed in terms of their semantic prosody.

4. MWIs are analysed in terms of their colligational patterning.

5. When analysing MWIs, account is taken of Business English language as a whole and the individual macro-genres that go to make it up.

6. The length, frequency and functionality of the MWIs is considered - thus referring to the work of Williams (1998), who noted that shorter MWIs are more frequent in the language and perform a greater variety of functions than longer ones.

7. The genre-specificity of MWIs (Henry 1996) is also considered and discussed.

8. The study focuses only on continuous MWIs.

9. No account is taken of compositionality or grammatical well-formedness (Moon 1997).

10. Finally, a selection of MWIs are analysed in terms of the knowing-acting axis of Pickett (1988) discussed in Chapter 3. Thus, sample MWIs are categorised in terms of whether they are used more for talking about business - knowing - or for doing business - acting. Additionally, these same MWIs are categorised along a written-spoken axis, indicating which are more used in written or spoken macro-genres.


4.10  The next chapter


If it can be said that Business English and lexis are the what of this work (they represent what is studied in this thesis), then the use of corpora and corpus linguistic methodology represent the how: corpora provide the means by which the research is carried out. The next chapter will, therefore, give a reasoning and justification for the use of corpora in this research. In doing this, a general methodological background will be provided that places this thesis within the bounds of British traditions of text analysis.














[1] West has already been mentioned in Chapter 3 in relation to another of the key concepts he created - needs analysis. An evaluation of West’s (often neglected) contribution to ELT can be found in Tickoo (1988).

[2] Ogden had been invited but refused to go.

[3] For a more detailed critique of the GSL see Richards’ (1975) article Word Lists: Problems and Prospects, where he lists several flaws with both the GSL and the vocabulary control movement in general. He concludes the article by offering a solution to some of the problems of word frequency lists, thus avoiding the exclusion of common pragmatic words that was the problem with the GSM and, amongst others, Ogdens’ Basic list.

[4] Both the work of Firth and these articles will be discussed later in this chapter in more detail.

[5] This point is discussed in some detail in Kjellmer (1991) where he distinguishes three basic kinds of collocations. Firstly, ‘right and left predictive’ collocations such as Anno Domini  and aurora borealis where each word equally suggests the other. The second kind he terms ‘right predictive’, for example, wellington boots, morse code  i.e. where a word suggests the word appearing on the right of it but not vice versa, e.g. morse suggests code and wellington suggests boots. Finally, there are ‘left predictive’ collocations such as open sesame, arms akimbo, where one word suggests the word preceding it, but again, not vice versa, e.g. sesame suggests open (Kjellmer 1991:112-113). Thus, most collocations can be seen to be unidirectional in one way or the other.

[6] There is some disagreement on this as Gitsaki (1996:1) says that Firth was the first to introduce the term. She does say, however, that the concept of collocation, though not named as such, was known to and described by the ancient Greeks 2,300 years ago (1996:13).

[7] Aston & Burnard (1998:13) actually checked this out using the BNC corpus and found that ‘silly ass’ occurred 8 times but did not occur once preceded by ‘you’. Such has English changed since the 1950s.

[8] For a good overview of these articles see Carter & McCarthy (1988:33-36).

[9] The importance of grammatical collocation was discussed by Greenbaum (1970). Interestingly, and perhaps oddly by today’s criteria, Greenbaum rejected corpus linguistic approaches to the study of collocation, using instead native speaker informants and a variety of tests to judge collocational awareness. He also restricted his study to verb-intensifier collocations.

[10] Kjellmer had referred earlier in the article to Los Angeles and Fidel Castro and how the presence of one part of each of these words presupposes the other.

[11] The concept of strong or loose collocation, which has also been the concern of studies into lexical collocation and will be returned to shortly.

[12] See Owen (1993) for discussion on this.

[13] This article will be discussed in more detail later in this chapter.

[14] Conzett (2000:74) gives a similar diagram but with actual examples, going from old car at the weak end of the scale to Stars and Stripes at the other.

[15] This is dealt with in much more detail in the section on multi-word items.

[16] WordSmith Tools 3 (1999), Oxford: Oxford University Press.

[17] Care needs to be taken here for grammatical and syntactical reasons. Collocating words can be separated by syntactical intervention yet still be clearly seen to collocate. An example would be: We had to adjourn what had been a very long, dull and tortuous meeting. The quite acceptable verb/noun collocation of adjourn a meeting is here separated by nine words, both grammatical and lexical. A definition of collocation that rigidly sticks to a 2:2 or even 5:5 span of words - that is 2 or 5 words either side of the node - will miss occurrences of this nature. Thus larger spans are also used in this research to gain neighbourhood collocates.

[18] The quotations given here are from the 1991 book, but could just as well be from the 1987 article as much of the text is the same.

[19] They suggest that as this group appear commonly in pairs, students can more easily acquire them in pairs.

[20] Lemma is a term that will be dealt with in more detail later in this thesis. In this instance a lemma is taken to be a head word to which is added grammatical derivations that are found within the same word class. For example, the lemma GO, when lemmatised, will have attached to it went and gone.

[21] An example of this might be the verb to commit. The word commit is followed most commonly by words that refer to a crime or a negative act of some kind: commit a foul, commit a transgression, commit a sin. Thus, the focus here is on the class of words following the word commit, rather than on a particular collocate (see Lewis 2000:137 for more on this).

[22] Louw was given the term originally by John Sinlcair in a private communication as early as 1988.

[23] This research has taken up Stubbs’ suggestion, and use of semantic prosody has been made for the development of Business English materials.

[24] Once again here, of course, one does run into collocative limitations, as one can only say e.g. I came down with a bad case of flu, but not I came down with a bad case of cancer. The number of types of disease are, therefore, limited with the expression I came down with.

[25] Attention will be paid here to collocates of a wider span in relation to the syntactic separation of collocates noted earlier.

[26] Quotes in Howatt (1984:157).

[27] Harold Palmer in the 1930s did discuss the idea of what he termed polylogs, mentioned briefly earlier. These were collocations and longer phrases that he regarded as fixed in the language. For more on this see Howatt (1984:237) and Kennedy (1992).

[28] It must be admitted that the two aspects can hardly be seen as separate, finite categories, but is merely a matter of focus on one aspect rather than on the other.

[29] For an overview see Table V (p.195) later in this section.

[30] Keller later transferred this survey into teaching materials, first in Canada and then in the UK under the title Conversation Gambits (Keller & Warner 1988).

[31] The way in which conventionalised forms speed language processing is a central part of the lexical approach discussed later in this chapter.

[32] Cowie did not believe that all MWIs are fixed, but sees the fixedness of  a large number of them as having great potential in teaching.

[33] In this they fall into the third of three schools of research identified by Moon (1997). Moon distinguishes three different approaches that have been taken with regard to MWIs. Firstly, Semantic-based models  that look at MWIs in terms of their degree of compositionality. Secondly, there have been Syntax based models that look at MWIs in terms of grammatical well-formedness. Finally, there have been the Functional models as exemplified by Pawley & Syder (1983) and, as mentioned above, Nattinger & DeCarrico (1992). Moon (1997:50) notes: ‘Here, multi-word items are integrated into the vocabulary in terms of their pragmatics. This leads to a more practical approach where multi-word items can be integrated into a dynamic model of  language-in-use, rather than language-as-artefact, and seen as enabling devices.’                                                                                       

[34] Lewis notes the pedagogical importance of this: ‘if we wish to use words as pattern generating items, it will be important to identify those which most helpfully predict collocates’ (Lewis 1993:93).

[35] Kjellmer (1991) also presented a relatively simple model of ‘types of set expressions’: fossilized phrases, semi-fossilized and what he called ‘variable’ phrases. For more on this see Kjellmer (1991:112-114).

[36] See also McCarthy (1990:7-8) on the gradability of idioms.

[37] See Chapter 3, Section 3.8.5 for a fuller discussion on this.

[38] Williams, in a personal communication (1999), was at pains to point out that although she has done this, she is not suggesting a link between functions and ‘underlying positive behaviours’.

[39] These are very similar to what Keller identified as gambits.

[40] Lewis also noted that, by definition, the Chomskyan concept of competence could not be empirically investigated and is thus invalid (1993:12).

[41] Note here also Lewis’s stress on verbs in the spectra of communicative power.

[42] The full name is used here to avoid confusion with Michael Lewis (no relative).

[43] Morgan Lewis (2000:16) gives the example of the phrase major turning point. He writes that if a student does not know this phrase they would have to paraphrase, e.g. a very important moment when things changed completely, with the increased likelihood of mistakes.

[44] Brazil defines used language as ‘language which has occurred under circumstances in which the speaker was known to be doing something more than demonstrate the way the system works’ (Brazil 1995:24).