ROLLS Film Club: The Wild Child 2 December

Join us for a screening of François Truffaut's film The Wild Child, a telling of the story of Victor, the 'wild child of Aveyron', a case that is often cited in debates about the Critical Period Hypothesis (the proposal that language acquisition is only possible if it happens at/by a key point in childhood development). 

Wednesday 2 December

13.00  Arts C233
free popcorn! 

ROLLS: Lynne Murphy on 'please' 25 Nov

Please join us for the final ROLLS talk in the Autumn 2015 'Variation' Series

Separated by a common politeness formula: please in American and British English

M. Lynne Murphy, University of Sussex

Wed, 25 November, 13.00 in Fulton 214

Several studies have observed that please is heard about twice as often in Britain as in America. What we don't know is whether that's just because Americans feel less need to 'act polite' or if American and British Englishes have different uses for please. Metalinguistic commentary by non-linguists gives some inkling of different uses; for example American blog commenters have mentioned that adding please to a request sounds 'bossy' rather than polite. 

This presentation summarises two recent studies on please: one speech-act driven and one lexically driven. The first (with Rachele De Felice, UCL) looks at requests in American and British email corpora to ask: in which requests does please occur or not occur in AmE and BrE? The second uses an internet corpus to look at all occurrences of please to ask: in which types of situations and with which meanings does please occur on US and UK websites?

These investigations reveal a number of different tendencies in the use of please. From a theoretical perspective, the findings give support for approaches to politeness that give centre-stage to conventions and conventionalisation (most recently, Terkourafi 2015) rather than indirectness. They also raise some questions in areas of applied linguistics, notably: what should English learners be taught about please?

Terkourafi, Marina (2015) Conventionalization: a new agenda for im/politeness research. Journal of Pragmatics 86, 11-18.

ROLLS: Peter French on Forensic Linguistics 28 Oct

Research on Language and Linguistics at Sussex (ROLLS): Forensic speech science
This week Prof. Peter French (University of York) will talking about 'The Tarnished Silver Tongue: casework and research in forensic speech science'. Prof French has provided evidence in thousands of cases across the whole range of forensic speech and audio.  His evidence has been for UK Crown Courts and courts across the world, including: Australia, Canada, Ireland, Germany, Ghana, Gibraltar, Hong Kong, Mauritius, The Netherlands, New Zealand, Singapore and the United States of America.  He has worked on cases heard by the International War Crimes Tribunal and the Bloody Sunday Inquiry.

Wednesday October 28, 13.00-14.00

Junior Research Associates: guest post by a former JRA

We're pleased to publish this guest post by Sarah Fitzgerald, a current third-year BA English Language and Linguistics student who took part in the Junior Research Associate programme in Summer 2015. 


Six months ago I went to see Dr Melanie Green. I wanted to discuss third year modules with her. I mentioned as I was about to leave that I had heard about the Junior Research Associate (JRA) scheme and thought it would be interesting to apply but had no idea what I might study. That passing remark ended up leading to the most interesting summer of work I’ve ever done and to so many opportunities which I would never otherwise have been offered.

Melanie is currently building a corpus of Cameroon Pidgin English (CPE) along with Gabriel Ozón at the University of Sheffield and Miriam Ayafor at the University of Yaoundé in Cameroon. She suggested that I apply for JRA funding for a project to create and test a system towards tagging CPE for parts of speech (POS). Just in case you are not nodding sagely at this information, and I certainly wasn’t when Melanie first suggested it, what follows is a brief rundown on pidgin languages, the context of CPE, linguistic corpora and POS tagging.

Pidgins evolve due to language contact, often through colonialism, and many modern pidgins developed due to the slave trade. People with no common language have to communicate as best they can when forced to work together. As a result there are many pidgins which are based on the vocabulary of European languages, particularly Dutch, English, French and Portuguese but which have very different systems of grammar from these languages. Pidgins start to develop their own grammar systems as the children of the origin
al speakers grow up speaking the pidgin as a first language – developing them into creoles. This natural development makes pidgins and creoles useful and interesting to study, particularly as their grammatical systems often have much in common with one another, even when thousands of miles apart.

Cameroon has two official languages, English and French, but more than 200 languages are spoken there. This means that people need a common language (or lingua franca) to communicate effectively. CPE, which is spoken by more than 50% of the population of Cameroon fills this role. While it is called Pidgin English CPE would more accurately be described as a creole language as it has its own grammar system and is spoken as a first language by a subset of its speakers. Radio talk show hosts often use CPE, people speak it for trade and use it on social media, but CPE is thought of as uneducated and its use is highly stigmatised. As a result it is rarely written and lacks the standardised spelling and the reference books such as grammars and dictionaries which might help to destigmatise a language. Which is where a corpus comes in.

In linguistics a corpus is a collection of texts. Most commonly an electronic collection which can be searched and used to identify patterns and frequency information about languages not otherwise apparent, even to native speakers. It is possible to learn new information about a language by gathering a selection of texts and using existing software to search for frequency or collocation information (words which occur together) but we can learn much more if the corpus is tagged for parts of speech. POS tagging involves attaching a tag to each word which identifies it as a noun, verb, adjective etc. This allows the corpus to be searched for patterns which would otherwise be hard to spot.

The first question most people have asked me at this point is how you make a corpus of a language which is not written down. Fortunately for me the wider project aims to transcribe 240,000 words of spoken CPE and they are well on their way to doing so, which means that texts were readily available. My task was to work out what tags the language needed and to create a tag set for it. In this I was aided by the grammar of CPE which Melanie and Miriam have recently written (Cameroon Pidgin English available 2016, I highly recommend it!). I then tagged 6,500 words manually and used this data to train learning software to tag CPE automatically. Day to day this meant sitting at my dining table staring intently at a computer screen. It was painstaking, absorbing and left me unable to string sentences together in English by the end of each day, but it was also very rewarding. I got to see something that I had created achieve a 90% success rate in the automatic tagging stage.

Nobody has tagged CPE before which meant that there wasn’t an instruction manual available when I started (I had to write myself one!) and that was a bit terrifying at first. What I did have was encouragement and support from Melanie, as well as from her colleague Gabriel. They were both willing to throw as much of their combined expertise at me as they could and some of it must have stuck as I managed to achieve the aims of my project. This support gave me a safety net which allowed me to try to work things out for myself wherever possible. I have gained so much from this project in terms of skills, experience and confidence in my abilities. I have also gained a greater appreciation of the limitations of my knowledge and understanding. My JRA project is over but I am still learning from it: this month I had the chance to present my work at the JRA poster exhibition, I also got the opportunity to write about my work as part of an article for publication in World Englishes and I have been able to continue tagging the corpus as I have been hired to work on the project as a research assistant this year. I am so glad that I spoke to Melanie when I did.

For any students in their second year I cannot recommend the JRA experience enough. If you are interested but don’t know what you want to study then talk to a tutor whose research interests you. They are likely to have plenty of ideas and working on a project in Melanie’s area of expertise has broadened my understanding of linguistics in ways that I doubt an idea of my own devising would have done. It may seem early (applications are in the spring) but it is worth thinking about now – research proposals are hard work!

 ---Sarah FitzGerald

Research news: Spring-Summer 2015

Here's our semi-annual summary of research activity among our group. As you can see, we've been busy!

Among the staff:

(in reverse alphabetical order)
Charlotte Taylor's article Beyond sarcasm: The metalanguage and structures of mock politeness has been published in the Journal of Pragmatics. She has also presented her work on impoliteness at two conferences this summer: Mock politeness & culture: Perceptions & practice at the Im/Politeness Conference 2015 (1-3 July, Athens), and Why are women so bitchy? Mock politeness & gender at Corpus Linguistics 2015 (20-24 July, Lancaster). In early September, she and colleagues introduced the Discourse Keywords of Migration project at a research meeting at Université Paris Descartes, where Charlotte talked about the keywords integration, multicultural, and community. She has also written a piece for The Conversation: Migrant or refugee? Why it matters which word you choose.

Justyna Robinson has started work with the Linguistic DNA project, speaking about it at the European Network for E-Lexicography in August (with Iona Hine, Sheffield) and organising a methodology training workshop at Sussex in September. In September she also presented a paper on Semantic Change across the Lifespan at the 10th UK Language Variation and Change conference at York. Justyna continues to be Associate Editor for English Today and has helped see three issues to press since our last update.

Roberta Piazza and her co-editors Louann Haarman (Bologna) and Anne Caborn (The Content Lab) have a new book: Values and Choices in Television Discourse: A View from Both Sides of the Screen. This includes a chapter by Roberta: The representation of travellers in television documentaries: dispelling stigma while dealing with infotainment demands. The book also contains interviews with Jon Snow, Cathy Newman and other television insiders. Roberta presented Nomadic people in British TV documentaries: between factual entertainment and journalistic investigation at the panel on Telecinematic discourse at the International Pragmatics Association conference (Antwerp, July) and When cinema borrows from stage: theatrical artifice through explicitness in The Cook, the Thief, His Wife and Her Lover and Dogville at the Poetics and Linguistics Association conference (Canterbury, July). She also presented her work on women travellers at the School of Education in spring, and worked with the Higher Education Internationalisation and Mobility ROMA project in summer.

Lynne Murphy has won two grants to support her work on British and American English. The National Endowment for the Humanities Public Scholar Program grant will support her writing a popular audience book (tentatively called How America saved the English language) in 2016-17, and a Leverhulme/British Academy small grant will support trips to dictionary archives in Oxford and the US to research American and British dictionary cultures. In July, Lynne presented Separated by a common politeness marker: please in American and British English at the International Pragmatics Association conference (Antwerp) and with Rachele De Felice (UCL) she presented The politics of please in British and American English: a corpus pragmatics approach at Corpus Linguistics 2015 (Lancaster). She also gave talks at De Montfort University, the Bedford Culture Club (Horsham), and Sunday Assembly Brighton, had articles published in The Skeptic and Lingo magazines, and was the featured speaker on an Odditorium podcast about the word the. Her paper with Steven Jones (Manchester) and Anu Koskela (De Montfort), Signals of contrastiveness: but, oppositeness and formal similarity in parallel contexts has been published in the Journal of English Linguistics.

Melanie Green continues to work on her British Academy-funded project to develop a corpus of spoken Cameroon Pidgin English. As part of the project, she has hosted Miriam Ayafor (University of Yaoundé I) at Sussex for training and project planning and supervised a Junior Research Associate (see below) project on part-of-speech tagging for the corpus. About half the data for the corpus is now recorded and transcribed. Melanie and her co-investigators Gabriel Ozón (Sheffield) and Miriam Ayafor presented a paper about the project at the ICAME conference in Trier in May. 


Lynne Cahill organised our extremely successful conference for A-level teachers in English Language in June. She has accepted invitations to serve on the executive board of the Association for Written Language and Literacy and the editorial board of the Journal of Written Language and Literacy.

Student research

We congratulate Imed Louhichi (left) who received his PhD at the summer graduation. Imed's thesis was Thinking-for-speaking in second language acquisition: a contrastive study of motion events in English and Tunisian Arabic, supervised by Lynne Murphy and Jules Winchester (Sussex Centre for Language Study).

PhD student Margarita Yagudaeva presented her work on the Semantic Stability of English Idioms at the 10th Newcastle-upon-Tyne Postgraduate Conference in Linguistics (March), the Biennial Conference on the Diachrony of English at Troyes, France (July), and at EUROPHRAS 2015 in Malaga, Spain (June).

PhD student Alexandra Reynolds has published Emotions et apprentissage de l'anglais dans l’enseignement supérieur: une approche visuelle in Voix Plurielles (Quebec). She has also recently presented two conference papers:‘To be good at science you need to be good at English’: A study of language attitudes in French higher education at the conference on Le plurilinguisme, le pluriculturalisme et l’anglais dans lamondialisation in Angers, and The impact of language policy on higher education in France at the iMean conference at Warwick University.

Undergraduate Sarah FitzGerald (right) received the Junior Research Associate fellowship this past summer. Her project was to work with Melanie Green toward developing part-of-speech tagging for the Corpus of Spoken Cameroon Pidgin English. Sarah presented her work at the JRA conference on 2 October and has since been hired on as a research assistant on the corpus project.

PhD student Barzan Ali presented a poster on Exploring the linear order principle: evidence from Farsi-English code switching at the 4th Barcelona Summer School on Bilingualism and Multilingualism at Pompeu Fabra University.

PhD student Rukayah Alhedayani presented her work Investigating antonymy in an Arabic corpus at the Forum for Arabic Linguistics at Essex in July. 

ROLLS 14 Oct: Sandra Jansen on an innovative angloversal

Wednesday, 14 October 2015, 13.00
Fulton 214, University of Sussex 

goose-fronting as innovative angloversal

Sandra Jansen, University of Brighton

In this presentation, I demonstrate that goose-fronting has been identified as phenomenon in varieties of English around the world and provide detailed information about this change in Carlisle, a city in the north-west of England. The results show that similarly strong linguistic constraints are found in this variety as in other varieties. However, the change cannot be characterised as Vernacular Universals (Chambers 2004, 2012) or global innovation (Buchstaller 2008). Hence I argue that we need to introduce the new category of innovating angloversals, a group of features that are arising independently in varieties of English due to language internal motivations rather than dialect contact.
A second point of discussion is the dynamics between goose and other back vowels, i.e. goat and foot. I argue that in order to understand goose-fronting completely, we also need to study the most adjacent back vowels. The data stem from interviews conducted in Carlisle between 2007 and 2010 and show that while goose is fronting across apparent-time, for goat and foot a change in progress is not observable. These dynamics seem to be geographically restricted to the north-west of England which leaves us with two conclusions. Either the apparent chain shift which is often referred to in the goose-fronting context has not set in yet or a chain shift is not a necessary consequence of goose-fronting. In both cases, goat and foot do not belong the group of innovating angloversals.

ROLLS 7 Oct: Justyna Robinson

Wednesday, 7 October 2015, 13.00
Fulton 214, University of Sussex 

What happens to our language as we grow older?

Justyna Robinson, University of Sussex

The study of linguistic usage across the lifespan has been increasingly attracting the attention of sociolinguists. Such a research perspective allows answering a number of intriguing questions including those which concern possible trajectories of language change at the level of an individual, together with questions concerning the interaction between an individual and a community. Most of the available insights into longitudinal change are based on studies of grammar or phonology. When it comes to the individual usage of lexis, linguists occasionally refer to anecdotal evidence, such as that older speakers keep on using older expressions, but without providing empirical insights. In this context, I provide new insights into lifespan change by focussing on the usage of words. More specifically I trace the use of fifteen polysemous adjectives, such as awesome, skinny, and gay between 2005 and 2015. The analysis of individual words is supplemented with insights of the histories of individual speakers. A number of observations based on this study allow me to conclude by mapping out the most fruitful lines of enquiry for future investigations of lifespan change.

ROLLS 30 Sept: Tom Devlin: Coalmining vocabulary & Durham English

Please join us at the Research on Language & Linguistics at Sussex seminar series:

Wednesday, 30 September 2015, 13.00
Fulton 214, University of Sussex 

The influence of coalmining vocabulary on variant usage in Durham English

Tom Devlin, University of Sussex
This research investigates the influence of coalmining vocabulary on variant usage by testing the claim that mining communities preserve distinctive and conservative phonological patterns (Wales 2006: 124). The study explores the degree of advancement of vowels belonging to the START lexical set (Wells 1982) in mining and non-mining words in the speech of sixteen older male speakers from former colliery villages in East Durham in the North East of England. The results show that regardless of the speaker's relationship to coalmining, START vowels are shifted to significantly backer realisations in mining words than in non-mining vocabulary, close to traditional pronunciations noted in historical dialect literature. This outcome is upheld even in identical lexical items with different meanings in mining and non-mining speech.

Wales, K. 2006. Northern English: A Social and Cultural History. Cambridge: CUP.
Wells, J. 1982. The Accents of English. Vol. 2: The British Isles. Cambridge: CUP.

Modelling semantic change workshop


Linguistic DNA of Modern Western Thought:

Modelling concepts and semantic change in English 1500–1800





Workshop 1: Computer-Assisted Language Processing

Friday 18th September 2015
University of Sussex, Jubilee Building, Room G22



Registration & coffee (9.00-9.30)

Session 1 (9.30-10.45)
                           Susan Fitzmaurice: Introduction to Linguistic DNA
                           Research Associates & HRI: Resources, progress, problems and queries
Coffee break (10.45-11.15)

Session 2 (11.15-12.30)
Diana McCarthy (University of Cambridge): Inducing and contrasting word meanings from different sources
Kathryn Allan (UCL): ‘Degrees of lexicalization’ in the Historical Thesaurus of the OED
Lunch (12.30-1.30)

Session 3 (1.30-3.15)

                           Gabriel Egan (De Montfort University): Instructive failures in authorship attribution by shared phrases in large textual corpora
                           Dirk Geeraerts (KU Leuven): Quantitative corpus approaches to lexical and conceptual variation I
Dirk Speelman (KU Leuven): Quantitative corpus approaches to lexical and conceptual variation II
Coffee break (3.15-3.30)

Session 4 (3.30-5.00)

                           Panel discussion: Dawn Archer (Manchester Metropolitan University), Scott Gibbens (Jisc Historical Texts), David Weir (University of Sussex), Pip Willcox (Bodleian Libraries)

Close (5.00)




‘Degrees of lexicalization’ in the Historical Thesaurus of the OED’ by Kathryn Allan

One of the most intriguing issues raised by the Historical Thesaurus of the Oxford English Dictionary (HTOED), which will be addressed by the ‘Linguistic DNA of Modern Western Thought’ project, is the significance of vocabulary size. Why are some semantic fields very densely populated in comparison to others, and why are concepts lexicalised to differing degrees across time? For some concepts, such as those in fields such as Food and Colour, there are obvious answers. There are no terms for ‘potato’ attested earlier than the late sixteenth century because it is not native to Britain and was only brought to the country then, and many terms from the late eighteenth onwards show the increasing numbers of varieties that have become familiar to speakers in modern times. Similarly, the rise in non-basic colour terms from the early Modern English period onwards corresponds to the technological changes that enabled the production of dyes, leading to sophisticated methods of creating and recreating precisely differentiated shades (discussed by Carole Biggam and Laura Wright, for example). This example seems to provide fairly clear evidence to support the view suggested in the preface of HTOED that in some cases the ‘degree of lexicalization [of a category] reflect[s] its considerable degree of importance to speakers of the language’. However, in other cases, including many abstract categories, the relationship between semantic field and conceptual domain is much less straightforward, and the emergence of a high number of new terms has no obvious external-world trigger. For example, HTOED records a spike of new terms for ‘sweet (in taste)’ between 1400 and 1700, including several variant forms with a common derivation such as douce, dulcet, dulce, dulcid, dulcorous and dulceous. Some of these are only attested a small number of times, and none replace the basic term sweet; their appearance is most readily explained as the result of shifts in stylistic norms combined with greater receptivity to Latinate vocabulary in this period. This paper considers some of the difficulties that emerge when considering the degree of lexicalisation of different concepts, and especially the complications that emerge from the data itself.

‘Instructive failures in authorship attribution by shared phrases in large textual corpora’ by Gabriel Egan

We can learn much from the mistakes made in recent authorship attribution endeavours that hunt for phrases shared between a work of unknown or contested authorship and the works in large textual corpora available to us digitally. Investigators have long known to suspect our intuitive conviction that a series of apparently unusual phrases cannot be shared between two works merely by chance--in fact they can--and have long acknowledged that a series of ‘negative checks’ are needed to be sure that linguistic constructions that seem rare really are rare. Despite knowing of these pitfalls, spectacular errors have been made recently because i) the methods of searching corpora are fallible in ways unforeseen by the investigator, ii) textual corpora are not necessarily as complete as investigators believe, iii) it is shockingly easy to introduce methodological bias into the experiments, and iv) it is easy to misunderstand and/or misrepresent the statistical significance of particular findings. This talk will discuss what went wrong in a series of recent investigations and draw lessons from them. Chiefly, the finding is that the principles that are supposed to prevail in scientific experiments should also govern work in our field. Our datasets and software source code should be made publicly available so that anyone may replicate our investigations. And our statistical methods should be subject to proper critique by professional statisticians. These rather dry findings will, it is hoped, be leavened by the telling of some amusing stories about what happens when authorship attribution goes wrong.

‘Inducing and Contrasting Word Meanings from Different Sources’ by Diana McCarthy

In Computational Linguistics, work on representing lexical meaning previously focused on manually created inventories. There is however now a large body of work that builds models of word meaning directly from corpus data. This has the advantage that one does not rely on advance knowledge of the relevant meanings; instead the knowledge emerges from the data.  This provides more scope for contrasting the meanings induced from different sources where the sources could differ with respect to, for example, textual domain, genre or time. In this talk I will outline some approaches for inducing word meanings and describe work, conducted with collaborators at the University of Melbourne, to induce and compare word meanings from different sources using topic models. I'll particularly focus on our work use these models to detect novel word senses in diachronic corpora. One key draw back with automatic word sense induction is the requirement for a large amount of data for training the models. Since corpora of a sufficient size have only been available in the last twenty years or so this limits application of these techniques to word meaning change attested within that period and for the types of corpora available. I will also therefore describe some corpus linguistics work, conducted with Sketch Engine, for the National Ecosystem Assessment. In this work we used Sketch Engine to contrast usages of  lexemes pertaining to the environment from different sources (academic, government and public). I'll discuss the pros and cons of the different approaches which can, of course, be complementary to one another.

‘Quantitative corpus approaches to lexical and conceptual variation’ by Dirk Geeraerts and Dirk Speelman

In this talk, we intend to present an overview of various types of corpus-based variation studies that we have been conducting in our research team Quantitative Lexicology and Variationist Linguistics and that we believe could be interesting for the 'Linguistic DNA' project. Specifically, we will introduce the distinction between formal and conceptual onomasiological variation, with a further distinction between direct and indirect approaches to the latter, and suggest that a formal onomasiological and an indirect conceptual onomasiological perspective could be the most relevant ones for the 'Linguistic DNA' project. We will illustrate these perspectives, with a methodological focus on the diagnostic concept of ‘onomasiological profile’ and the use of semantic vector spaces.