Chapter 7             Hypotheses, Research Questions

                             and Method

 

 

7.1 Introduction

 

In this chapter the two main hypotheses and consequent research questions addressed in this thesis are presented. These are laid out in turn, along with the methods by which the hypotheses have been tested.

 

7.2 Hypothesis One

 

Hypothesis One: The lexis used in Business English is significantly different from general English.

Main Research Question: Is there a lexis specific to Business English and, if there is, what is it ?

 

This section of the research established lexis specific to the environment of Business English, and further investigated Business English in its own right. Thus, in testing the hypothesis, the research also performed a descriptive function[1] by studying in detail the lexical environment of Business English. This descriptive function led to additional research questions.

 

Additional research questions:

·    Can the concept of semantic prosody be found in Business English and if so are there business-specific prosodies?

·    What colligational patterns can be found in Business English and can grammatical patterning typical to Business English be identified?

·    Can the layering of Business English lexis be found as described by Pickett (1988, 1989)?

·    How are words distributed across Business English macro-genres and can they be divided along the ‘knowing’ - ‘acting’ axis of Pickett (1988)?

·    What kinds of clusters can be found in Business English and do business-specific clusters exist?

·    How do words associate with each other in Business English?

 

The analysis was divided into six main steps which are now discussed in turn.

 

Step One:  Creation of frequency lists

 

a) A frequency list of the words found in the BEC was computed using WordSmith 3 (Scott 1999). Similarly, a frequency list of words from the BNC Sampler corpus was also created.

 

b) Both frequency lists were then lemmatised using a modified version of Someya’s (1998) lemma list.

 

c) The 1,000 most frequent words of the BEC were identified and the lemma list obtained was manually edited to take out the code words used in the corpus and individual letters from abbreviations.

 

d) The 1,000 most frequent lemmas of the BEC were then identified and stored.[2] 

 

 

 

 

 

Step Two: Identification of key words in the BEC

 

a) Using WordSmith 3’s  key word  function, a list was then obtained of those words in the BEC that are statistically significant in their frequency of occurrence as compared to the general English of the BNC Sampler corpus (1999). Key words were calculated using the Log Likelihood Statistic of Dunning (1993), which is considered more accurate for ‘contrasting long texts or whole genre against [a] reference corpus’ (Scott 1999) as opposed to the chi square statistical test. The value of p was set at p = 0.000001. A key word was calculated thus:

 

·    The frequency of each word in the BEC was calculated.

·    The number of total running words in the BEC was counted.

·    These two statistics (word frequency/size of corpus) for each word in the BEC were then matched against each word’s frequency in the BNC reference corpus, and the total number of running words in the reference corpus.

·    The result was then cross-tabulated and the log likelihood test carried out.

·    This process was repeated for all the words in the BEC.

·    WordSmith then produced a key word list showing the words in order of keyness - that is, the most significant word first, followed by the next most significant and so on. 

 

b) The key word list produced by WordSmith included  positive key words and negative key words. Scott defined these words: ‘A word which is positively key occurs more often than would be expected by chance in comparison with the reference corpus. A word which is negatively key occurs less often than would be expected by chance in comparison with the reference corpus’  (1999: WordSmith Help File).

 

The key words, therefore, represent those words that are ‘special’ to Business English both in the sense that positive key words occur statistically more frequently in Business English than in general English, and conversely, that negative key words occur less in Business English than would be statistically expected in general English.

c) The positive key words were then manually edited to exclude: company names, dates, months, odd letters that were part of abbreviations, code words used in the making of the corpus, e.g. companyname, personname, and also place names, product names and the names of people. Also excluded were words that were the result of large input from one individual source.[3]

 

d) The positive and negative key words of the BEC were then separately categorised grammatically, using the categories of Ljung (1990): noun, verb, adjective, noun/verb, noun/adjective, verb/adjective, noun/verb/adjective and -ly adverb.

 

e) The four largest of these grammatical categories in both positive and negative key words - nouns, verbs, adjectives and noun/verbs - were then further individually analysed and each word in each grammatical category was placed into a semantic group. The semantic groups were identified based on a qualitative analysis of the key word lists.

 

The identification of semantic groups and the division of key words into positive and negative, allowed the lexical demarcation of the business world to be made, answering in part the first research question - is there a lexis specific to Business English?

 

Step 3: Identification of Key Key-Words and Associates

 

This part of the analysis identified words that were key in many texts - key key-words - and identified associate words. Associate words are defined by Scott (1999):

 

An ‘associate’ of key-word X is another key-word (Y) which co-occurs with X in a number of texts. It may or may not co-occur in proximity to key-word X. (A collocate would have to occur within a given distance of it, whereas an associate is ‘associated’ by being key in the same text.)  (Scott 1999 WordSmith Help File)

 

Associates, therefore, are not collocates in the commonly-known sense, nor do they necessarily occur within a given span of words around the key node word, but rather are words that are considered statistically key, found to co-occur across a number of texts. These words were discussed in detail in Chapter 4. WordSmith calculates all the associates of key key-words and places them in order of frequency. Thus, for example, the word company is found to associate with the word business in 27 texts and the word customer in 20 texts. The calculation of associates facilitated analysis that is described later in Step 5 (g).

 

Key key-words and associates were calculated thus:

 

a) Every text in the corpus was formed into a separate file using the WordSmith 3 Splitter tool. This created 1,102 files.

 

b) The files were then formed into one large wordlist database and a key words stop list was used to exclude those words noted in Step 2 (c) above.

 

c) This wordlist database was then statistically compared to the BNC Sampler using Log Likelihood, with a p value of p=0.000001. This created a key key-word database. The key key-word database shows ‘words which are key in a large number of texts of a given type’ (Scott 1997:237). Thus, this list computes not only the key words, but also takes note of which texts the words are key in. The number of texts where the word is key are then added up and placed in order of frequency in the key key-word list. Thus, for example, the highest key key-word, business, was found to be key in 111 files, the next highest word, company, was key in 81.  A list of the top key key-words was stored and presented.

 

 

Step 4 : Choice of key words for further analysis

 

The key words supplied an empirical basis for further lexical examination of Business English, as they represented language that is used significantly more (or less) in Business English than in general English.  Consequently, a selection of key words was then submitted to a more detailed analysis. This further analysis was necessary for two reasons. Firstly, the descriptive mandate of the thesis required a more detailed examination of the lexis of business to determine the way in which the key words typically behave in the Business English environment. Secondly, a more specific examination of the lexis could determine whether the words behave differently in the Business English environment compared to their behaviour in general English.

 

There were almost one thousand positive key words and analysis of all of these was not feasible. Therefore, fifty positive key words were chosen for further analysis and placed in five semantic categories - shown in the box below. The criteria for choice were both pragmatic - those key words that were more frequent were given priority - and pedagogical. Previous analysis, in Step 2 (e) above, had categorised the key words into semantic categories. Five of these categories were then chosen to represent the four largest word classes found in the key words of the BEC: nouns, verbs, adjectives and noun/verbs. In this way a reasonable spread of language was made part of the analysis. In addition, by placing the lexis into semantic rather than purely grammatical categories, the research was arguably made more accessible and facilitates easier transferral to the classroom. Ten key words were placed in each category.

 

People in Business: customer, manager, supplier, distributor, shareholder, employee, staff, partner, boss, management

 

Business Activities: business, investment, delivery, payment, development, production, communication, competition, takeover, distribution

 

Business Actions: sell, manage, receive, confirm, provide, send, develop, discuss, achieve, improve

 

Business Descriptions: high, big, low, global, international, local, competitive, corporate, strategic, financial

 

Business Events & Entities: sale, merger, trade, package, export, service, market, earnings, performance, product

 

Fig. 30  Key words analysed in the thesis shown by semantic category

 

Each of the 50 words was then subjected to an eight-part analysis, using, in part, Hoey’s (1997:1)[4] questions asked of any concordance line.  This eight-part analysis is now laid out below.

 

Step 5 : More detailed analysis of key words[5]  

 

a) Keyness: The significance of each word computed by WordSmith 3 was noted. This statistic showed how significantly different the frequency of the word was in the BEC compared to its use in the BNC. This was done by comparing the BEC to the BNC Sampler corpus using Log Likelihood at p = 0.000001.

 

b) Collocation: Hoey asked ‘What lexical patterns is the word part of?’ referring to the typical collocates of words. The most statistically significant collocates were determined for the key words chosen using WordSmith 3 and the Mutual Information statistic. The results were noted and presented. Due to the potential skewing of results caused by the MI statistic, collocation was dealt with in much more depth in terms of semantic prosody.

 

c) Semantic Prosody: Hoey asked ‘Does the word regularly associate with particular other meanings?’ This refers primarily to semantic prosody. The collocates of each key word were analysed to discern if they fall into distinct semantic groups or sets. Primarily, a collocational span of 5:5 was used, though on occasion a span of 10:10 was needed to determine semantic prosodic relations.  Semantic prosody was determined manually, as there is at present no automatic method of identification. The lexical sets formed by the collocates were noted, counted and presented in a table along with their percentage value, compared to the total amount of instances of the word (as done by Hoey 2000:232). Analysis in this section - in connection with analysis described in Step 6 - was able to answer the second research question - can the concept of semantic prosody be found in Business English and, if so, are there business-specific semantic prosodies?

 

d) Colligation and patterning:  Hoey’s questions three and four were combined, thus the questions asked of the key words were ‘What structure(s) does it appear in?’ and ‘Is there any correlation between the word’s uses/meanings and the structures in which it participates?’

 

For the colligational analysis, the BEC was POS tagged by the automatic tagger Autasys (Fang 1998), using the LOB tag set. The key words were then analysed manually utilising the tagged corpus, and the most common grammatical patterning and meanings of the chosen key words were compared to the patterning/meaning relations found in general English as shown in the COBUILD Dictionary (Sinclair et al. 1995) and COBUILD Grammar Patterns Verbs, Nouns and Adjectives (Francis et al. 1998a,b). Additional typical grammatical patternings of the key words were noted. The presence of sub-technical language was also analysed at this point, and the phenomena of reduced and altered meaning of words in a specific lexical environment were noted and recorded. The analysis carried out in this section - in connection with analysis described in Step 6 - was able to answer the third research question - what colligational patterns can be found in Business English and can grammatical patterning typical to Business English be found?

 

Also in this section a short analysis was carried out in line with Pickett (1988, 1889), where Business English was defined in terms of its lexical layering as discussed in Chapter 3. One text from a specific field of business - sportswear sale and distribution - was examined. Key words for the text were computed using Log Likelihood with p at p=0.000001. The resulting key word list was then categorised by use of concordancing into three categories of words:

 

·    Specialist business words - words specific to a field of business.

·    Sub-technical lexis - words with a specific meaning in the business area under investigation and possessing a different common meaning in general English.

·    General English - a general English base of words.

 

This sub-section answered the fourth research question - can the layering of business lexis be found as described by Pickett (1988, 1989)?

 

e) Macro-generic distribution: Macro-genres are defined for the purpose of this research as an umbrella category embracing several sub-genres - each macro-genre encompasses several smaller genres, but can still be identified as an overall category. For example, the macro-genre business letters can contain several sub-genres, but still be recognised as an overall category.

 

The range of usage of each key word across the macro-genres of the BEC was presented in a distribution chart using the Dispersion Plot function in WordSmith Tools 3. The Dispersion Plot chart showed both the ranked frequency of occurrence of the word in each macro-genre, and also, in graphical format, the distribution of usage of the word, also across each macro-genre.

 

Further, the macro-genres where each key word occurred most frequently were calculated and each word was placed on a four-way scale showing its most typical area of use. The scale showed both the distribution of the key words’ typical use along a spoken/written genre axis, and also distribution across a doing business-talking about business axis, using Pickett’s (1988) definitions of knowing and acting. This section answered the fifth research question - how are words distributed across Business English macro-genres and can they be divided along the ‘knowing’ - ‘acting’ axis of Pickett (1988)?

 

f) 3-word clusters: 3-word clusters for each key word were computed. These were clusters formed from the key word. A minimum number of n = 4  occurrences of the cluster in the BEC was used as a cut-off point for inclusion in the analysis.

 

In addition to analysing the 3-word clusters formed from the key words, further analysis was carried out on the word clusters in the BEC at both macro- and micro-level. 

 

Macro-level: Firstly, the most frequent 2-8 clusters were computed from the BEC. Then, the key word procedure was run to create key cluster lists of the most key 2- to 8-word clusters using the Log Likelihood procedure with p at p = 0.000001. These were noted and recorded. These combined lists provided a clear macro-level picture of the clusters in the BEC.

 

Micro-level: Ten clusters were chosen for closer examination from the macro-level analysis above: five 2-word and five 3-word clusters. Three of the five 3-word clusters chosen were picked on the grounds of high frequency, as the most key 3-word clusters were of very low frequency. The 2-word clusters were all key, and were chosen as being intuitively ‘business’ phrases. The clusters were then subjected to the same analysis as the key words as in Step 5 (a-f) above. This section ­- in connection with analysis described in Step 6 - answered the sixth research question - what kinds of clusters can be found in Business English and do business-specific clusters exist?

 

g) Associates: These were defined in Step 3. Associates of each key word under analysis were noted and presented. The relationship of key key-words to their associates was analysed and ten high-frequency key key-words were chosen for closer associative analysis. This section of the research answered the final research question - how do words associate with each other in Business English?

h) Positioning: Hoey asked ‘Is the word associated with (any position in any) textual organisation?’ This question was not dealt with in the analysis, but the syntactic positioning of some key words inside sentences was noted where appropriate.

 

Step 6:  Further comparison

 

For reinforcement of the answers to the above research question, five of the fifty key words chosen for analysis in the BEC were further analysed in order to see their typical behaviour in general English. To this end, the five words chosen - send, manage, big, global and package - were analysed in the BNC Sampler corpus. As these words belonged to the fifty key words under analysis in the BEC, they could, therefore, facilitate a further comparison between typical use in the BEC (Business English) and typical use in the BNC (general English). These words were subjected to the same analysis as in Step 5 (a-h) and the results noted and recorded.

 

The analyses carried out above (Steps 1-6), when combined, answered the first main research question - is there a lexis specific to Business English and, if there is, what is it ?

 

7.3 Hypothesis Two

 

Hypothesis Two: The lexis of Business English used in the real world differs significantly from the lexis used in published Business English teaching materials.

 

Research Question: Can significant lexical differences be found between the language used in published Business English materials and the language actually used in business?

 

Additional research questions:

·    How do the PMC key words define the lexical world of business and how does this definition compare to that shown in the BEC?

·    What semantic prosodies were found in the PMC and do they match or differ from those found in the BEC?

·    What colligational and grammar/meaning patterns were identified in the PMC and how do they compare to those found in the BEC?

·    How did the clusters found in the PMC compare to those in the BEC?

 

It should be stated that the main focus of this thesis has been on the Business English of the BEC. Analysis of the PMC, therefore, is more limited than that carried out on the BEC. The process of analysis of the PMC - also featuring a descriptive element - is laid out below:

 

a) A frequency list of the PMC was created using WordSmith 3.

 

b) This list was then lemmatised using a modified list of Someya’s (1998) lemma list.

The lemmatised frequency list was manually edited to remove individual letters and the names of  people, places, products, currencies and companies.

 

The next stage was to prepare two separate key word lists.

 

c) A key word list was created for the PMC using the Log Likelihood statistic with a p value of p = 0.000001 with the lemmatised  BNC as the reference corpus. The key word list was manually edited to remove individual letters and the names of people, places, products, currencies and companies.

 

d) A key word list was created for the PMC using the Log Likelihood statistic with a p value of p = 0.000001 with the lemmatised  BEC as the reference corpus. The key word list was manually edited to remove individual letters and the names of people, places, products, currencies and companies.

 

The two measures above (c and d) created two different key word lists for the PMC:

i) In (c), those words that are key to the PMC using the BNC (general English) as a reference corpus and;

ii) In (d), those words that are key to the PMC using the BEC (Business English) as a reference corpus.

 

The two key word lists represent two different aspects of the study. The PMC (BNC) key word list shows words in the PMC that occur significantly more in published Business English materials than in general English. The PMC (BEC) key word list shows words that occur significantly more in published Business English materials than in the Business English of the BEC. Thus, the PMC (BNC) list shows differences between published materials and general English, whilst the PMC (BEC) list shows differences between two ‘Business English’ corpora.

 

e) The PMC (BNC) key word list was grammatically categorised using the categories of Ljung (1990): noun, verb, adjective, noun/verb, noun/adjective, verb/adjective, noun/verb/adjective and -ly adjective.

 

f) The PMC (BEC) key word list was grammatically categorised using the categories of Ljung (1990): noun, verb, adjective, noun/verb, noun/adjective, verb/adjective, noun/verb/adjective and -ly adjective.

 

g) The four largest grammatical categories found in the PMC (BNC) key word list - nouns, verbs, adjectives and noun/verbs - were analysed and the words placed in semantic categories.

 

h) The four largest grammatical categories found in the PMC (BEC) key word list - nouns, verbs, adjectives and noun/verbs - were analysed and the words placed in semantic categories.

 

The above stages (a-h) allowed analysis of the way in which the lexical world of published Business English materials is demarcated from general English, and, additionally,  how this demarcation differs from the demarcation of Business English found in the BEC from general English. This section thus answered the research question - how do the PMC key words define the lexical world of business and how does this definition compare to that shown in the BEC?

 

i) Five of the most significant positive key words in the PMC (BNC) key word list were analysed along the same lines as the key words in the BEC, as shown in Step 5 in Hypothesis 1.[6] Analysis was more limited, however, and only keyness, semantic prosody, colligation and word clusters related to the word were analysed. The words chosen were manager, customer, product, market and business. Comparison of usage of the words in the BEC and the PMC was thus facilitated. This section answered the research questions - what semantic prosodies were found in the PMC and do they match or differ from those found in the BEC?, what colligational and grammar/meaning patterns were identified in the PMC and how did they compare to those found in the BEC? and how did the clusters found in the PMC compare to those in the BEC?

 

g) Multi-word units: Further work on clusters in the PMC was also carried out. A 3-word cluster frequency list was computed for the PMC and a 3-word cluster key word list was computed using the 3-word cluster list of the BEC as reference. Keyness was calculated using the Log Likelihood statistic with a p value of p = 0.000001. This facilitated comparison between three word clusters in the PMC and the BEC. This final section gave further insight into the research question - how did the clusters found in the PMC compare to those in the BEC?

 

All the above analyses (a-g), therefore, answered the second main research question - can significant lexical differences be found between the language used in published Business English materials and the language actually used in business?

 

 

7.4 The next chapter

 

The next chapter presents an overview of the results gained from the analyses described in this chapter. This will be followed by a more detailed presentation and discussion of results in Chapter 9. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 



[1] This is in line with Firth (1957:202), when he stated that ‘The business of linguistics is to describe languages’.

[2] The list of the top 1,000 lemmas can be found in Appendix 1 in Vol. II.

[3] Lyne (1985:17), in a corpus-based study of French business letters, discussed this problem and excluded words from his final corpus list that were present owing to large input of text from one individual source. In this study, therefore, a small number of words were excluded on the basis that they  were only key due to the fact that they appeared almost totally from one data source. For example, the words extinguisher and extinguishers were found to be key because the words appeared commonly in data gained from a company that sells fire-fighting equipment. These words were excluded from the final key word list.

[4] Hoey (1997:1) asks five questions of any concordance line of text:

1) What lexical patterns is the word part of? (Collocation)

2) Does the word regularly associate with particular other meanings? (Semantic Prosody)

3) What structure(s) does it appear in? (Colligation)

4) Is there any correlation between the word’s uses/meanings and the structures in which it participates?

5) Is the word associated with (any position in any) textual organisation?

 

[5] For a full example of the analysis of key words described in Step 5, see Chapter 8 Section 8.2.10.

[6] Reasons for choice of words is discussed in Chapter 9, Section 9.3.2.