Open Science

Frequency patterns of semantic change: corpus-based evidence of a near-critical dynamics in language change

Published:07 June 2017

It is generally believed that when a linguistic item acquires a new meaning, its overall frequency of use rises with time with an S-shaped growth curve. Yet, this claim has only been supported by a limited number of case studies. In this paper, we provide the first corpus-based large-scale confirmation of the S-curve in language change. Moreover, we uncover another generic pattern, a latency phase preceding the S-growth, during which the frequency remains close to constant. We propose a usage-based model which predicts both phases, the latency and the S-growth. The driving mechanism is a random walk in the space of frequency of use. The underlying deterministic dynamics highlights the role of a control parameter which tunes the system at the vicinity of a saddle-node bifurcation. In the neighbourhood of the critical point, the latency phase corresponds to the diffusion time over the critical region, and the S-growth to the fast convergence that follows. The durations of the two phases are computed as specific first-passage times, leading to distributions that fit well the ones extracted from our dataset. We argue that our results are not specific to the studied corpus, but apply to semantic change in general.

1. Introduction

Language can be approached through three different, complementary perspectives. Ultimately, it exists in the mind of language users, so that it is a cognitive entity, rooted in a neuropsychological basis. But language exists only because people interact with each other: It corresponds to a convention among a community of speakers, and answers to their communicative needs. Thirdly, language can be seen as something in itself: An autonomous, emergent entity, obeying its own inner logic. If it was not for this third Dasein of language, it would be less obvious to speak of language change as such.

The social and cognitive nature of language informs and constrains this inner consistency. Zipf’s Law, for instance, may be seen as resulting from a trade-off between the ease of producing the utterance, and the ease of processing it [1]. It relies thus both on the cognitive grounding of the language, and on its communicative nature. Those two external facets of language, cognitive and sociological, are similarly expected to channel the regularities of linguistic change. Modelling attempts (see [2] for an overview) have explored both how sociolinguistic factors can shape the process of this change [3,4] and how this change arises through language learning by new generations of users [5,6]. Some models also consider mutations of language itself, without providing further details on the social or cognitive mechanisms of change [7]. In this paper, we adopt the view that language change is initiated by language use, which is the repeated call to one’s linguistic resources in order to express oneself or to make sense of the linguistic productions of others. This approach is in line with exemplar models [8] and related works, such as the Utterance Selection Model [9] or the model proposed by Victorri [10], which describes an out-of-equilibrium shaping of semantic structure through repeated events of communication.

Leaving aside sociolinguistic factors, we focus on a cognitive approach of linguistic change, more precisely of semantic expansion. Semantic expansion occurs when a new meaning is gained by a word or a construction (we will henceforth refer more vaguely to a linguistic ‘form’, so as to remain as general as possible). For instance, way, in the construction way too, has come to serve as an intensifier (e.g. ‘The only other newspaper in the history of Neopia is the Ugga Ugg Times, which, of course, is way too prehistoric to read.’ [11]). The fact that polysemy is pervasive in any language [12] suggests that semantic expansion is a common process of language change and happens constantly throughout the history of a language. Grammaticalization [13]—a process by which forms acquire a (more) grammatical status, like the example of way tooabove—and other interesting phenomena of language change [14,15] fall within the scope of semantic expansion.

Semantic change is known to be associated with an increase in frequency of use of the form whose meaning expands. This increase is expected indeed: As the form comes to carry more meanings, it is used in a broader number of contexts, hence more often. This implies that any instance of semantic change should have its empirical counterpart in the frequency rise of the use of the form. This rise is furthermore believed to follow an S-curve. The main reference on this phenomenon remains undisputedly the work of Kroch [16], which unfortunately grounds his claim on a handful of examples only. It has nonetheless become an established fact in the literature of language change [17]. The origin of this pattern largely remained undiscussed, until recently: Blythe & Croft [18], in addition to an up-to-date aggregate survey of attested S-curve patterns in the literature (totalizing about 40 cases of language change), proposed a modelling account of the S-curve. However, they show that, in their framework, the novelty can rise only if it is deemed better than the old variant, a claim which clearly does not hold in all instances of language change. Their attempt also suffers, as most modelling works on the S-curve, from what is known as the Threshold Problem, the fact that a novelty will fail to take over an entire community of speakers, because of the isolated status of an exceptional deviation [19], unless a significant fraction of spontaneous adopters supports it initially.

On the other hand, the S-curve is not a universal pattern of frequency change in language. From a recent survey of the frequency evolution of 14 words relating to climate science [20], it appears that the S-curve could not account for most of the frequency changes, and that a more general Bass curve would be appropriate instead. Along the same line, Ghanbarnejad et al. [21] investigated 30 instances of language change: 10 regarding the regularization of tense in English verbs (e.g. cleave, clove, cloven > cleave, cleaved, cleaved), 12 relating to the transliteration of Russian names in English (e.g. Stroganoff > Stroganov) and eight to spelling changes in German words (ss>ß>ss) following two different ortographic reforms (in 1901 and 1996). They showed that the S-curve is not universal and that, in some cases, the trajectory of change rather obeys an exponential. This would be due to the preponderance of an external driving impetus over the other mechanisms of change, among which social imitation. The non-universality of the S-curve contrasts with the survey in [18], and is probably due to the specific nature of the investigated changes (which, for the spelling ones, relates mostly to academic conventions and affects very little the language system). This hypothesis would tend to be confirmed by the observation that, for the regularization of tense marking, an S-curve is observed most of the time (7 out of 10). It must also be stressed that none of these changes are semantic changes.

In this paper, we provide a broad corpus-based investigation of the frequency patterns associated with about 400 semantic expansions (about 10-fold the aggregate survey of Blythe & Croft [18]). It turns out that the S-curve pattern is corroborated, but must be completed by a preceding latency part, in which the frequency of the form does not significantly increase, even if the new meaning is already present in the language. This statistical survey also allows to obtain statistical distributions for the relevant quantities describing the S-curve pattern (the rate, width and length of the preceding latency part).

Apart from this data foraging, we provide a usage-based model of the process of semantic expansion, implementing basic cognitive hypotheses regarding language use. By means of our model, we relate the microprocess of language use at the individual scale, to the observed macro-phenomenon of a recurring frequency pattern occurring in semantic expansion. The merit of this model is to provide a unified theoretical picture of both the latency and the S-curve, which are understood in relation with Cognitive Linguistics notions such as inference and semantic organization. It also predicts that the statistical distributions for the latency time and for the growth time should be of the same family as the inverse Gaussian distribution, a claim which is in line with our data survey.

2. Quantifying change from corpus data

We worked on the French textual database Frantext [22], to our knowledge the only textual database allowing for a reliable study covering several centuries (see Material and methods and electronic supplementary material, SIII). We studied changes in frequency of use for 408 forms which have undergone one or several semantic expansions, on a time range going from 1321 up to the present day. We chose forms so as to focus on semantic expansions leading to a functional meaning—such as discursive, prepositional or procedural meanings. Semantic expansions whose outcome remains in the lexical realm (as the one undergone by sentence, whose meaning evolved from ‘verdict, judgment’ to ‘meaningful string of words’) have been left out. Functional meanings indeed present several advantages: They are often accompanied by a change of syntagmatic context, allowing to track the semantic expansion more accurately (e.g. way in way too + adj.); they are also less sensitive to sociocultural and historical influences; finally, they are less dependent on the specific content of a text, be it literary or academic.

The profiles of frequency of use extracted from the database are illustrated on figure 1 for nine forms. We find that 295 cases (which makes up more than 70% of the total) display at least one sigmoidal increase of frequency in the course of their evolution, with a p-value significance of 0.05 compared to a random growth. We provide a small selection of the observed frequency patterns (figure 2), whose associated logit transforms (figure 3) follows a linear behaviour, indicative of the sigmoidal nature of the growth (see Material and methods). We thus find a robust statistical validation of the sigmoidal pattern, confirming the general claim made in the literature.

Figure 2. Extracted pattern of frequency rise for nine selected forms. The latency period and the S-growth are separated by a red vertical line.

Figure 3. Logit transforms of the S-growth part of the preceding curves. Red dots correspond to data points and the green line to the linear fit of this set of points. The r² coefficient of the linear fit is also displayed.

Furthermore, we find two major phenomena besides this sigmoidal pattern. The first one is that, in most cases, the final plateau towards which the frequency is expected to stabilize after its sigmoidal rise is not to be found: The frequency immediately starts to decrease after having reached a maximum (figure 1). However, such a decrease process is not symmetrical with the increase, in contrast with other cases of fashion-driven evolution in language, e.g. first names distribution [23]. Though this decrease may be, in a few handfuls of cases, imputable to the disappearance of a form (e.g. après ce, replaced in Modern French by après quoi), in most cases it is more likely to be the sign of a narrowing of its uses (equivalent, then, to a semantic depletion).

The second feature is that the fast growth is most often (in 69% of cases) preceded by a long latency up to several centuries, during which the new form is used, but with a comparatively low and rather stable frequency (figure 2). How the latency time is extracted from data is explained in Material and methods. One should note that the latency times may be underestimated: If the average frequency is very low during the latency part, the word may not show up at all in the corpus, especially in decades for which the available texts are sparse. The pattern of frequency increase is thus better conceived of as a latency followed by a growth, as exemplified by de toute façon (figure 4)—best translated by anyway in English, because the present meanings of these two terms are very close, and remarkably, despite quite different origins, the two have followed parallel paths of change.

Figure 4. Overall evolution of the frequency of use of *de toute façon* (main panel), with focus on the S-shape increase (left inner panel), whose logit transformation follows a linear fit (right inner panel) with an r² of 0.996. Preceding the S-growth, one observes a long period of very low frequency (up to 35 decades).

To our knowledge, this latency feature has not been documented before, even though a number of specific cases of sporadic use of the novelty before the fast growth has been noticed. For instance, it has been remarked in the case of just because that the fast increase is only one stage in the evolution [24]. Other examples have been mentioned [25], but it was described there as the slow start of the sigmoid. On the other hand, the absence of a stable plateau has been observed and theorized as a ‘reversible change’ [26] or a ‘change reversal’ [27], and was seen as an occasional deviation from the usual S-curve, not as a pervasive phenomenal feature of the evolution. We rather interpret it as an effect of the constant interplay of forms in language, resulting in ever-changing boundaries for most of their respective semantic dominions.

In the following, we propose a model describing both the latency and the S-growth periods. The study of this decrease of frequency following the S-growth is left for future work.

3. Model

3.1. A cognitive scenario

To account for the specific frequency pattern evidenced by our data analysis, we propose a scenario focusing on cognitive aspects of language use, leaving all sociolinguistic effects backgrounded by making use of a representative agent, mean-field type, approach. We limit ourselves to the case of a competition between two linguistic variants, given that most cases of semantic expansion can be understood as such, even if the two competing variants cannot always be explicitly identified. Indeed, the variants need not be individual forms, and can be schematic constructions, paradigms of forms or abstract patterns. Furthermore, the competition is more likely to be local, and to involve a specific and limited region of the semantic territory. If the invaded form occupies a large semantic dominion, then losing a competition on its border will only affect its meaning marginally, so that the competition can fail to be perceptible from the point of view of the established form.

The idealized picture is therefore as such: Initially, in some concept or context of use C₁, one of the two variants, henceforth noted as Y , is systematically chosen, so that it conventionally expresses this concept. The question we address is thus how a new variant, say X, can be used in this context and eventually evict the old variant Y ?

The main hypothesis we propose is that the new variant almost never is a brand new merging of phonemes whose meaning would pop out of nowhere. As Haspelmath highlights [28], a new variant is almost always a periphrastic construction, i.e. actual parts of language, put together in a new, meaningful way. Furthermore, such a construction, though it may be exapted to a new use, may have shown up from time to time in the time course of the language history, in an entirely compositional way; this is the case for par ailleurs, which incidentally appears as early as the fourteenth century in our corpus, but arises as a construction in its own right during the first part of the nineteenth century only. In other words, the use of a linguistic form X in a context C₁ may be entirely new, but the form X was most probably already there in another context of use C₀, or equivalently, with another meaning.

We make use of the well-grounded idea [29] that there exist links between concepts due to the intrinsic polysemy of language: There are no isolated meanings, as each concept is interwoven with many others, in a complicated tapestry. These links between concepts are asymmetrical, and they can express both universal mappings between concepts [30,31] and cultural ones (e.g. entrenched metaphors [32]). As the conceptual texture of language is a complex network of living relations rather than a collection of isolated and self-sufficient monads, semantic change is expected to happen as the natural course of language evolution and to occur repetitively throughout its history, so that at any point of time, there are always several parts of language which are undergoing changes. The simplest layout accounting for this network structure in a competitive situation consists then in two sites, such that one is influencing the other through a cognitive connexion of some sort.

3.2. Model formalism

We now provide details on the modelling of a competition between two variants X and Y for a given context of use, or concept, C₁, also considering the effect exerted by the related context or concept C₀ on this evolution.

— Each concept C_i,i=0,1, is represented by a set of exemplars of the different linguistic forms. We note that is the number at time t of encoded exemplars (or occurrences) of form μ∈{X,Y }, in context C_i, in the memory of the representative agent.
— The memory capacity of an individual being finite, the population of exemplars attached to each concept C_i has a finite size M_i. For simplicity we assume that all memory sizes are equal (M₀=M₁=M). As we consider only two forms X and Y , for each i the relation always holds: We can focus on one of the two forms, here X, and drop out the form subscript, granted that all quantities refer to X.
— The absolute frequency of form X at time t in context C_i—the fraction of ‘balls’ of type X in the bag attached to C_i—is thus given by the ratio Nⁱ(t)/M. In the initial situation, X and Y are assumed to be established conventions for the expression of C₀ and C₁, respectively, so that we start with N⁰(t=0)=M and N¹(t=0)=0.
— Finally, C₀ exerts an influence on context C₁, but this influence is assumed to be unilateral. Consequently, the content of C₀ will not change in the course of the evolution and we can focus on C₁. An absence of explicit indication of context is thus to be understood as referring to C₁.

The dynamics of the system runs as follows. At each time t, one of the two linguistic forms is chosen to express concept C₁. The form X is uttered with some probability P(t), to be specified below, and Y with probability 1−P(t). To keep constant the memory size of the population of occurrences in C₁, a past occurrence is randomly chosen (with a uniform distribution) and the new occurrence takes its place. This dynamics is then repeated a large number of times. Note that this model focuses on a speaker perspective (for alternative variants, see electronic supplementary material, SIIA).

We want to make explicit the way P(t) depends on x(t), the absolute frequency of X in this context at time t. The simplest choice would be P(t)=x(t). However, we wish to take into account several facts. As context C₀ exerts an influence on context C₁, denoting by γ the strength of this influence (see electronic supplementary material, SIIB for an extended discussion on this parameter), we assume the probability P to rather depend on an effective frequency f(t) (figure 5a),

3.1

We now specify the probability P(f) to select X at time t as a function of f=f(t). First, P(f) must be nonlinear. Otherwise, the change would occur with certainty as soon as the effective frequency f of the novelty is non-zero: That is, insofar as two meanings are related, the form expressing the former will also be recruited to express the latter. This change would also start quite abruptly, while sudden, instantaneous takeovers are not known to happen in language change. Second, one should preserve the symmetry between the two forms, that is, P(f)=1−P(1−f), as well as verify P(0)=0 and P(1)=1. Note that this symmetry is stated in terms of the effective frequency f instead of the actual frequency x, as production in one context always accounts for the contents of neighbouring ones.

Figure 5. Schematic of model mechanisms. (a) Difference between absolute frequency x and relative frequency f in context C₁. Absolute frequency x is given by the ratio of X occurrences encoded in C₁. Effective frequency f also takes into account the M occurrences contained in the influential context C₀, with a weight γ standing for the strength of this influence. (b) Schematic view of the process. At each iteration, either X or Y is chosen to be produced and thus encoded in memory, with respective probability P_γ(x) and 1−P_γ(x); the produced occurrence is represented here in the purple capsule. Another occurrence, already encoded in the memory, is uniformly chosen to be erased (red circle) so as to keep the population size constant. Hence the number of X occurrences, N_X, either increases by 1 if X is produced and Y is erased, decreases by 1 if Y is produced and X is erased, or remains constant if the erased occurrence is the same as the one produced.

For the numerical simulations, we made the following specific choice which satisfies these constraints:

3.2

where β is a parameter governing the nonlinearity of the curve. Replacing f in terms of x, the probability to choose X is thus a function P_γ(x) of the current absolute frequency x:

3.3

3.3. Analysis: bifurcation and latency time

The dynamics outlined above (figure 5b) is equivalent to a random walk on the segment [0;1] with a reflecting boundary at 0 and an absorbing one at 1, and with steps of size 1/M. The probability of going forwards at site x is equal to (1−x)P_γ(x), and the probability of going backwards to x(1−P_γ(x)).