Neurocognition of Language/Evolution of Language

Introduction
The ability to communicate with their conspecifics is inherent in most animals at least to some extent. For example, bees have specific kinds of dances, signaling the location of flowers relative to their hive. Apes can utter warning calls to inform other members of their group about an approaching predator and wolves are able to show different social signs – like dominance or submission – thanks to a shared set of facial expressions and gestures.

However, being able to use a complex, fully developed language seems to be an innate and unique trait to humans, as so far it could not be observed in any other known species (e.g., Fisher & Marcus, 2006). Mostly, communication systems of animals are composed of a limited set of utterances or behaviors with a very specific and limited meaning. Humans, on the other hand, are able to associate abstract connotations with words; they can express complex chains of thought using a principally unlimited lexicon and - thanks to the combinatorial power of syntax - build intricate sentences in order to express specific temporal or local relations between objects. Thus, language on such a high level is unique and universal to the human species. While studying the neurocognition of language, one might wonder about its origins. How and under which circumstances did language develop? Did it evolve in the vein of Darwinian natural selection? If yes, why can we observe such a large number of different spoken languages nowadays, as it does not seem likely that natural pressures were responsible for all of them? Investigating the emergence of language is not an easy task. In fact, researchers are confronted with such serious difficulties that the quest for the origins of language has already provocatively been labeled as “the hardest problem in science” (Kirby & Simon, 2003). The first hindrance is of a definitive kind: What is language and which specific qualities should it possess to be classified as a language? Even if researchers were to agree on a general definition, they would still have to face an even bigger issue: The lack of concrete empirical evidence. When studying the evolution of specific anatomical features or relatively simple motor tasks (e.g., walking), substantial evidence exists in the form of fossil archaeological findings from different stages of development. Contrary to that, language is a combination of complex cognitive and motor abilities and its emergence cannot be traced back to a specific point in time. As a consequence of this, researchers are left with a number of indirect measures like comparative (physiological) studies of humans and apes, the examination of the development of language in children and - to some extent - fossil evidence.

Unfortunately, using these kinds of methods to make inferences about the evolution of language does not work without using a significant amount of conjectures. In fact, an early surge of interest in this field of research after the publication of “The Origin of Species” by Darwin was extinguished by the Linguistic Society of Paris in 1866: It seems that most papers at that time relied excessively on pure speculation about possible situations that early humans had to face in the distant past. This resulted in a ban of each and every study covering language-evolution from being published by the Linguistic Society of Paris. A century later, thanks to more sophisticated methods and a better understanding of evolution, this area of research has become attractive again, leading to the participation of various fields of sciences, such as psychology, linguistics and archaeology (Jackendoff, 2002).

Up to this date, the research of language evolution is lacking generally accepted theories. In fact, most researchers provide their own take on this problem, thus increasing the total number of differing hypotheses (Kirby & Simon, 2003). The aim of the following chapter is to point out some aspects of language evolution with a relatively high degree of acceptance within the scientific community. Based on the discussed aspects, a relatively recent theory of language evolution will be portrayed, the Mirror System Hypothesis (Arbib, 2012).

Three rather uncontroversial characteristics of language evolution
Although there are still only few hard facts or generally accepted theories on language evolution available up to now, some specific hypotheses on this topic can be regarded as relatively accepted. The following section is going to present three of those assumptions: First of all, it will cover the question whether or not language has been a direct target of natural selection or rather emerged as a byproduct of the evolution of some other (cognitive) mechanism. Secondly, the concept of protolanguages as intermediate steps in evolution will be presented. Finally, the significance of gestures in the evolution of language will be discussed briefly.

Language and Natural Selection
As language presumably provides a beneficial adaption to many different exterior threats (e.g., communicating with each other while hunting deadly animals), it may seem plausible to assume that humans’ ability to create and acquire languages was formed by natural selection. However, many researchers favor the side-effect theory. It suggests that language is an evolutionary byproduct, probably an epiphenomenon of the brain’s increase in size and efficiency (e.g., Chomsky, 1988; Hauser, Chomsky & Fitch, 2002; Piattelli-Palmarini, 1989). Chomsky himself wrote the following about the evolutionary approach:

“''We know very little about what happens when 1010 neurons are crammed into something the size of a basketball, with further conditions imposed by the specific manner in which this system developed over time. It would be a serious error to suppose that all properties, or the interesting properties of the structures that evolved, can be ‘explained’ in terms of natural selection.''” (Chomsky, 1975, as cited in Jackendoff, 2002)

Supporters of the side-effect theory argue that complex abilities like language could not have evolved in the relatively short time span since humans diverged from other primates. Moreover, according to this theory, the presence of a grammar in itself does not have any adaptive value. Grammar also is said not to be able to evolve at all, but rather that it is a dichotomous trait of language that either exists in a species or does not (Harley, 2008). Nowadays, however, the majority of researchers is convinced of the natural-selection theory, as there (i) apparently was enough time for grammar to evolve and as (ii) it is today believed by the majority of researchers that the presence of a full language actually does bring adaptive advantages. One such advantage, for example, is the fact that you can only warn other members of your species about possibly threatening events that might occur in the future when you have acquired a specific degree of grammar (Carroll, 2003; Jackendoff & Pinker, 2005; Pinker, 2003; Pinker & Jackendoff, 2005). Some researchers even go as far as calling theories that explain the emergence of language solely through an increasing brain size or as a fortunate accident a “retreat to mysticism” (Jackendoff, 2002).

Patel (2008) approaches this ongoing discussion more cautiously and warns about jumping to premature conclusions: It is tempting to attribute the emergence of universal and unique human abilities like language to evolution, but that is often unreasonable from a biological point of view. For example, there are no other species besides Homo sapiens that can create and control fire, thus making it a unique and universal skill. But it seems a lot more plausible that the mastery of fire developed thanks to the humans’ increasing brainpower rather than being a direct target of natural selection. To carry it to the extremes, nobody would honestly argue that the ability to play computer games arose because of natural selection. Therefore, Patel (2008) suggests as a research strategy to formulate the null-hypothesis that language has not been a direct target of natural selection. The appropriate scientific approach, then, would be to gather evidence in favor of the natural selection theory so as to be able to falsify this null-hypothesis. The following list briefly summarizes the evidence provided by Patel (2008):

•	Babbling: Every healthy baby about 7 months old starts to utter repetitive syllables (Locke, 1993), even if it is deaf (Oller & Eilers, 1988). If exposed to sign language, babies will “babble” with their hands (Petitto et al., 2004). This can be taken as evidence for a training mechanism - shaped by natural selection: Human babies (and songbirds, too) are thus able to “learn the relationship between oral movements and auditory outcomes, in other words, to tune the perceptual-motor skills they will use in acquiring their species’ communication system” (Patel, 2008). Deaf babies exposed to sign language, respectively, learn the relation between gestural movements and visual input.

•	Anatomy of the human vocal tract: The human larynx is positioned much lower than in any other primate species (i.e., apes, Homo neanderthalensis), increasing the amount of sounds we can emit and simultaneously increasing the risk of choking to death (Lieberman, 1984). This anatomical difference might have occurred due to natural selection shaping the body, ultimately providing a biological basis for modern speech.

•	Vocal learning: Humans are the only primates with an extensive use of vocal learning (Egnor & Hauser, 2004), such as listening to auditory input and trying to imitate the sounds, which suggests that this characteristic is shaped by natural selection to support babies in their language acquisition.

•	Critical periods for language acquisition: Humans need to be exposed to their native language during a critical period (i.e., childhood, see also chapter 8) in order to master it (Lennenberg, 1967). This may be a mechanism to allow humans a very early acquisition of socially important competencies.

•	Different modalities of language: Speech is the main modality of human language, but true sign languages such as British Sign Language (BSL) cannot be considered any inferior. In fact, they are just as rich in complexity as speech, sharing most components (e.g. phonology, syntax, morphology) and neural substrates with spoken language in the left hemisphere (Emmorey, 2002; see also chapter 9). The human drive to acquire language thus seems extremely powerful as different modalities can be used interchangeably, hence probably being shaped by natural selection.

•	Mental predisposition for learning and creating languages: In 1977, a school for deaf children in Nicaragua was founded where adults tried to teach them lip reading and speaking Spanish. Before that, the children were isolated from each other, not being able to communicate. Interestingly, they started to develop their own sign language in school, completely without help from teachers or other adults. Younger students learned from older cohorts and added new grammatical elements, thus increasing its complexity from year to year (Senghas & Coppola, 2001). Ultimately – over the course of approximately 25 years – this development led to the fully elaborated Nicaraguan Sign Language (ISN), demonstrating strong mental predispositions in humans to create languages.

•	Fixated language-related genes: Damage to the FOXP2 gene leads to immense impairments in language related abilities like deficits in oral motor activities or weaker performances in grammatical and lexical judgments (Marcus & Fisher, 2003). The impairment in nonverbal abilities does not seem to be as strong as in verbal abilities (Alcock et al., 2000a; 2000b). FOXP2 shows almost no variation within humans, but differs strongly from other species. It seems to have become fixed within the past 200,000 years after having been a target of evolution (Enard et al., 2002).

•	Biological cost of failure to acquire language: A trait that has been the target of natural selection obviously bore adaptive advantages. So if a human failed to acquire language, his or her adaptive fitness most likely would drop by a huge amount, decreasing their chances for reproduction (and therefore removing people without the ability for language from the gene-pool).

Combining these different pieces of evidence, Patel (2008) regards the rejection of the null-hypothesis that language has not been a direct target of natural selection to be justified, thus accepting the natural selection hypothesis of language evolution. Fitch (2005) agrees and acknowledges natural selection as the main mechanism behind the emergence of language. However, he points out that it cannot account for every aspect of language, especially not for the more recent ones like phonology and semantics. While clearly being responsible for adaptive individual behavior, it does not seem likely for natural selection to have shaped components of language that have to be culturally shared and socially accreted over generations. Fitch (2005) thus adds a relatively newly discovered social selective force of evolution to the discussion, namely kin selection. As this mechanism favors behavior that not only increases the fitness of the individual and its offspring, but also the fitness of the whole kin, individuals with communication systems supporting their kin will be selected. Furthermore, species with a rather long lifespan will acquire many experiences and complex concepts over the course of their life. If communicated to the young offspring, it will increase the inclusive fitness of the kin.

Intermediate steps of language
The idea that language went from an animal-like communication system to a rich, modern language in a single step does not seem very credible (Fitch, 2005). That is why Bickerton (1990) introduced the term “protolanguage”, stating that it took two stages for language to evolve. In his view, protolanguage first arose with the emergence of Homo erectus about 1.6 million years ago. Just like modern languages, protolanguage had vocal labels attached to different concepts, yet it is distinguished from modern language by its lack of syntax. Bickerton (1990) controversially hypothesizes a very quick and spontaneous step from protolanguage to language about 50,000 years ago, after the appearance of Homo sapiens. Interestingly, there is some evidence backing up his claims, as protolanguage can still be observed today, e.g., in pidgin languages (Bickerton, 1981), in language-deprived children (“Genie”; Curtiss, 1977, see also chapter 8.6.3), in the early language of infants, or when teaching language to apes (Savage-Rumbaugh, Shanker & Taylor, 1998).

Two different types of theories on protolanguage can be distinguished, i.e., synthetic and analytic models (Hurford, 2000; Fitch, 2005). Protolanguage according to synthetic models consists of single words with specific meanings attached to them like in Bickerton’s (1990) theory. Assuming that the lexicon (i.e., the vocabulary) came first, the necessary ingredient to achieve the level of a modern language is a proper syntax in order to form complex sentences using those single words. In analytic theories, in contrast, protolanguage is made up of holophrases. A holophrase is an indecomposable word with a very complex meaning (in fact representing a whole proposition) that can even change with context. For example, imagine a (hypothetical) word like “krumblak” signaling something like “beware of the lion sitting in the grass to the left of you”. There is an increasing amount of evidence favoring analytic models, as holophrases still exist today (Wray, 2002): One-word utterances of infants seem to have rather complex connotations (e.g., “ebab” may represent something like “please give me some apple juice”) and even adults are using them (e.g., “more” may mean “add even more meat to the stew” in the context of cooking food). After reviewing data from several comparative studies between humans and animals, Fitch (2005) suggests “that it is easier to evolve complex learned structure in a vocal communications system (as evidenced by the multiple parallel evolution of such structure in birds and whales) than to evolve a system of combinatoric meaning”. He concludes that analytic theories seem to gain the upper hand in research (see also: Arbib, 2003; Bickerton, 2003).

Taken together, protolanguage is a very plausible concept supported by a relatively wide range of evidence. Many researchers agree that language went through different intermediate manifestations (e.g., Jackendoff, 2002; Fitch, 2005), but most likely more than initially suggested by Bickerton (1990). Bickerton himself took on a more gradualist stance on protolanguage in his later works (Calvin & Bickerton, 2000), distancing himself from the idea of a spontaneous huge leap from proto- to a modern language in a short period of time.

Gestures as a starting point
Given the multimodality of language, it is very plausible to assume that manual gestures and signs - in addition to speech - played a rather prominent role in the evolution of language. As a matter of fact, this proposal is anything but new, as it has been suggested for the first time over 80 years ago (Paget, 1930). More recently, Corballis (2003; 2004) proposed that the first primate communication systems consisted nearly entirely of gestures and that speech evolved from that point, effectively freeing our hands from communication and thereby enabling us to wield tools while talking to conspecifics. McNeill (2012) agrees about the importance of gestures, but states that gesture-first theories cannot be correct. According to his opinion, speech and gesture are bound together very closely and that they thus are equally important. Gesticulation appears in most cases of communication, covering the same ideas as speech, so that both modalities form a single unit. As theories on the evolution of language need to explain the nature of language, gesture-first theories cannot be accurate, according to McNeill (2012) as they do not account for the speech-gesture unity that can be observed, but rather predict a clear dominance of speech and a minor, insignificant role of gesticulation.

Comparative studies do provide some evidence for the significance of gestures in the evolution of language, as the brain of apes has been shown to resemble the human brain rather closely in terms of specialization (Cantalupo & Hopkins, 2001). Critically, Brodmann’s area 44 - probably participating in the orchestration of manual gestures in those species - is enlarged in the left hemisphere in chimps and gorillas and corresponds with Broca’s area in humans, one of the classic areas involved in speech. Mirror-neurons - firing when performing an action or perceiving another animal executing the same action - are located in this very region in apes (Rizzolatti, Fadiga, Goassi & Gallese, 1996). Brain imaging studies showed increased activation in corresponding areas in the human brain when executing or observing a grasping-task, but not when simply looking at objects (Grafton, Arbib, Fadiga & Rizzolatti, 1996). It is argued that mirror-neurons served as a ‘kick starter’ for language evolution, enabling the ability for imitation and therefore the development of the first manual signs (Stamenov & Gallese, 2002; Arbib, 2005).

Taking all this evidence into account, it appears very plausible to assume that language either evolved on its own, starting from gestures, or that both spoken language and the use of gestures as a sign language evolved together in a dynamic manner. In the following section, a modern theory of language evolution, building upon these findings, will be presented.

Introducing the main assumptions of the Mirror System Hypothesis
In the previous section we learned that language most likely evolved due to natural selection, went through several intermediate forms and that gestures probably played an important role during this transition. The Mirror System Hypothesis, as proposed by Michael Arbib (2012), offers an intriguing explanation for the emergence of language while meeting all these criteria. His main argument is that natural selection shaped the human body and brain, leading to the language-ready early Homo sapiens about 100,000 – 200,000 years ago. Beginning with this period, humans were able to use first protolanguages. The development of modern languages took place in the following 50,000 – 100,000 years via cultural evolution. Arbib (2012) formulates three key hypotheses:

Hypothesis 1. There is no innate Universal Grammar: Arbib (2012), distancing himself from Chomsky’s claim of a genetically coded Universal Grammar, argues that the human brain adapted via natural selection for protolanguage rather than modern languages. The genome of humans provides the readiness to learn and create languages, but does not encode any syntactic knowledge.

Hypothesis 2. Language-readiness is multimodal: Sign language is in no way inferior to spoken language. Even facial expressions carry lots of information during conversation, suggesting a close link between speech and gestures. Therefore, it is assumed that different modalities - speech and gesture - evolved together, eventually providing language-readiness.

Hypothesis 3. The Mirror System Hypothesis: The mechanism responsible for the rise of language evolved atop the mirror neuron system for grasping, thus providing the evolutionary basis for language parity (denoting that an utterance or gesture has the same meaning for both communication partners).

Language had to pass through several stages in evolution in order to gain the necessary properties to be considered either as protolanguage or as a modern language. Table 1 summarizes these crucial stages postulated by the Mirror Neuron Hypothesis (Arbib, 2012), while the following section will present them more detailed.

Crucial stages in language evolution
As briefly mentioned above, the existence of a mirror system for grasping is the starting point of language evolution as postulated by Arbib (2012). It is inherent in monkeys and apes as well as humans. In the human brain, it can be located in Broca’s area (among other regions). While not being originally “intended” for communication, the mirror system provides the evolutionary basis for the evolution of language: By being able to both perform and recognize an action, it permits the existence of an extremely important property, i.e., language-parity. Language parity means that the signal (i.e., a gesture or an utterance) means the same for both the sender and perceiver when communicating. It is important to note that the presence of a mirror system on its own is not sufficient for the emergence as language, as proven by monkeys and apes. The human mirror system expanded its role due to natural selection shaping the brain.

One of these expansions is the ability for imitation. Both apes and humans are able to observe the behavior of other individuals and copy it. In simple tasks, like grasping and peeling a banana, this is no big deal. Trying to acquire a complex action plan with several subtasks, on the other hand, is much harder. Using long periods of observations and many trial and error efforts, apes can still manage to obtain the plan. This is the ability for simple imitation. Humans usually don’t have problems to imitate even complex behavior after a short time of monitoring the action and after only few trials. They are capable of complex imitation by having the capacity for complex action analysis. When witnessing novel, complex behavior, we understand the initiator’s intentions, maybe even recognizing already familiar sub-actions. By reproducing the already known actions and practicing the novel ones, we can increase our skill and imitate the overall act in a rather short time. This ability, of course, bears a big adaptive value by enabling humans to share important skills within the community. It seems that the connectivity of the mirror system with various other regions in the human brain expanded vastly in comparison to monkeys and apes, since complex imitations require cognitive skills like joint attention that draw heavily upon distributed systems of the brain.

With the capacity for complex imitation comes the capacity for pantomime, i.e., being able “to use reduced forms of actions to convey aspects of other actions, objects, emotions, or feelings – the artless sketching of an action to indicate either the action itself or something associated with it” (Arbib, 2012). For example, by miming the act of eating something, I can express that I am very hungry without actually eating or even having food in my hands. The obvious advantage of pantomime is that the object or situation it references does not need to be present, enabling a community to communicate about a range of topics without actually witnessing the specific situations. Unfortunately, pantomime often proves to be very ambiguous. For example, moving your hands wavelike could suggest many things, like “water”, “snake”, “swimming” etc. This limiting downside of pantomime may have led to the development and use of a more abstract and open repertoire of signs, possessing very specific connotations, i.e., protosign.

Protosign, being the manual component of protolanguage, is claimed to provide the basis for protospeech (i.e., the vocal component of protolanguage). It is important to note that protosign did not achieve the state of a full language before protospeech first emerged. Both modalities, according to Arbib (2012), rather evolved together in an expanding spiral: Once manual gestures gained advances, vocal gestures profited from them (and vice versa). Both components combined formed an analytic protolanguage based on holophrases, providing the “language-ready brain” of the first Homo sapiens.

At this point, the biological evolution ends and the cultural evolution of language takes over, for it appears rather unlikely to presume natural selection to be the responsible mechanism for the huge range of different languages today. For instance, why should the German language bear a greater adaptive value in response to environmental pressures in Germany than the French language in Germany? Arbib (2012) supposes fractionation to be one of the responsible factors for the cultural evolution of language. He provides the following simplified example: A tribe using protolanguage may have had two holophrases concerning “fire”, with one syllable in each word coincidentally being the same, e.g., “reboofalik” meaning “the fire burns” and “falikiwert” meaning “the fire cooks meat” (hypothetical examples). Therefore, the tribe may agree (probably implicitly) that “falik” denotes “fire”. The community eventually regularizes the term “reboo” for “burn” and “iwert” for “cooks meat”. Because “falik” was created rather arbitrary, the placement of “falik” in the term to denote “burn” differs from its placement to denote “cooks meat”. There needs to be some kind of convention about the term’s construction in order to maintain the meaning, especially when wanting to add new protowords. That is why the tribe may settle on “reboo falik” and “iwert falik”, thus creating a first syntactic rule. In this manner – but actually of course in much more complex sequences – a modern language will eventually be established gradually.

Summary
This chapter dealt with the evolution of language, a research topic with a vast array of different theories and very few “hard facts”, due to serious issues like the lack of a definition of language and - more importantly - the lack of direct empirical evidence. Nonetheless, there are some of aspects of language evolution with relatively good support within the scientific community. Three of those have been presented in this chapter. Firstly, most researchers agree about natural selection being the driving force behind the emergence of language, instead of assuming it to be a mere epiphenomenon of the brain’s increase in size. Patel (2008) provides a non-exhaustive, but rather comprehensive list with evidence for this claim. Secondly, language certainly went though several intermediate forms – amongst others, most likely a protolanguage made up of holophrases – before becoming a modern, sophisticated language. At last, given the multimodality of language, many researchers suppose that the starting point of language either were gestures or that both speech and gestures evolved altogether.

The Mirror System Hypothesis as proposed by Arbib (2012) integrates all these aspects, thus providing an intriguing, modern approach on language evolution. Arbib (2012) assumes that the mirror system for grasping served as “scaffolding” for this process by offering the property of language parity. During a phase of biological evolution (natural selection) the mechanisms responsible for the transition of language evolved atop the mirror system and went through several different stages of evolution, according to this model. With the emergence of protolanguage, the brain of the early Homo sapiens can be seen as “language ready”, possessing all necessary properties to develop a more sophisticated, i.e., modern language. The vast array of different languages emerged as a result of cultural evolution of protolanguages. Despite taking into account the most recent research, it must however not be forgotten that the Mirror System Hypothesis still is exactly that: an unproven hypothesis like each and every take on language evolution that has been forwarded up until now.