Linguistics/Orthography

Introduction
Quick: take a look around yourself and see how many examples of writing are within view.

While languages have been in use for tens of millennia, writing has had a comparatively short lifespan.

An individual unit of writing is known as a glyph. Similarly to the concept of the phoneme in phonology or morpheme in morphology, we can speak of graphemes and allographs in orthography. A grapheme is a fundamental unit of written language, and an allograph is any acceptable instantiation of the grapheme in writing. It is standard linguistic practice to enclose graphemic representations in angular brackets ⟨⟩.

Directionality
The most common directions for written language are horizontal rows of left-to-right (ltr) or right-to-left (rtl) text. Examples of languages written left-to-right include English, Hindi, and Thai. Examples of right-to-left writing include Arabic, Hebrew, and N'ko.

Vertical writing systems are also found, usually going from top-to-bottom (though bottom-to-top is attested). These vertical rows may be ordered either left-to-right or right-to-left. The Mongolian script (not in much use anymore) is a vertical script ordered left-to-right, while Japanese may be written either horizontally from left-to-right or in vertical columns from right to left. Bottom-to-top writing includes some instances of the Berber script Tifinagh.

In rare cases writing systems may use combinations of these directions. The term boustrophedon (meaning 'ox-turning' in Greek) generally refers to horizontal scripts in which lines alternate for directionality, and the characters themselves may be mirrored.

Types of spelling systems
Phonemic, morpho-phonemic, defective, complex

Alphabets


Given that you are reading this book, you are probably already familiar with at least one alphabet. The concept underlying alphabets is essentially that they represent the sounds of a language rather than the meaning, and that consonants and vowels are treated in the same way, i.e. the letters ⟨a,e,i,o,u⟩ are not different in any fundamental way from the letters ⟨b,c,d,f,g...⟩.

One might say that idealized alphabets assign one letter per segment, whether consonant or vowel. In reality many alphabets deviate from this somewhat – for instance, in English we use the digraph (two-letter combination) ⟨sh⟩ to represent the single sound /ʃ/.

Abjads
An abjad (also known as a consonentary or a consonental alphabet) is a writing system where words are generally written only with characters for consonants. The Modern Hebrew writing system employs an abjad; for example, the Modern Hebrew word /zanav/ 'tail' is written ⟨זנב⟩ (right-to-left), which would correspond to ⟨znv⟩ in English. Characters for certain certain vowels may also be used, and system of vowel marking&mdash;or vocalization&mdash;may be available for use but not usually employed.

Abjads are a less common form of writing system; however, most modern alphabets as well as many other writing systems have descended from the Proto-Canaanite alphabet used in the late Bronze Age. The most prominent abjads still in use today are the Arabic and Hebrew writing systems.

Abugidas
An abugida (or alphasyllabary) is similar to an abjad in which vocalization is obligatory – consonants are the basic graphemic units, but they are modified (usually with diacritics) depending on what vowel comes after them. Vowels without preceding consonants may be written either with separate independent graphemes, or by using the usual vowel diacritics on a null consonant glyph. Examples of languages which use abugidas include Hindi, Thai, and Amharic.

In many abugidas consonants have a "default" vowel which is assumed to follow them if they are not marked for any other vowel, and they may be marked with a special symbol (known as a virama, from Sanskrit) or otherwise modified to suppress the inherent vowel. For example, in the Devanagari script, a script used to write a number of Indic languages including Hindi and Marathi, any consonantal grapheme in isolation is presumed to be followed by the vowel /a/. If any consonant is followed by another consonant without an intervening vowel, the cluster is written as a ligature (a glyph composed of multiple graphemes combined together).

Syllabaries
A syllabary assigns each syllable to one grapheme. Languages that use syllabic writing include Japanese (in one script), Cherokee, and Yi. Languages with syllabaries tend to have simpler phonotactics, since more possible syllables translates to a larger inventory of graphemes.

Syllabaries are distinct from abugidas in that similar syllables (e.g. ga and gi) are not necessarily related to each other systematically.

Pictographies and Ideographies
Pictographies and ideographies both represent ideas using picture-like symbols. They are different in that pictograms are more accurate graphical representations of the objects they symbolise, whereas ideograms tend to be more abstract. A pictogram can evolve into an ideogram as it reduces in complexity over time.

For example, the Chinese character for 'tiger' was originally pictographic in oracle script (left). Over time, it evolved into the modern character, an ideogram (right):

Logographies


A logography (or logographic script) is a writing system in which graphemes generally represent words or morphemes rather than sounds. Individual characters in a logographic script are known as logograms.

Logographic scripts are not necessarily ideographic or pictographic. For instance, Chinese characters are not always ideographic since they may sometimes be used purely for phonetic content, and are usually have opaque derivation rather than being transparently pictographic.

The term hieroglyphs or hieroglyphics may be used to refer to logograms, but it is more often used to refer specifically to Ancient Egyptian.

Isolated logograms may be used in non-logographic scripts. For instance, the numerals and mathematical symbols used in English writing are logograms—1 one, 2 two, + plus, = equals, and so on. In English, the ampersand & is used for and and et (such as &c for et cetera), % for percent, $ for dollar, # for number, € for euro, £ for pound, etc. Note that logograms such as 1 are rarely used for phonetic value – one would not write something like "The team 1 the soccer game." Also note that 1 may have different phonetic value depending on context: c.f. 1 "one" and 1st "first".

Mixed scripts
Some languages use multiple types of script in writing. Such a form of writing may be known as a mixed script. Examples of mixed scripts include Egyptian Hieroglyphs and Japanese writing.

Unwritten language and new orthographies
In the past many non-native linguists created defective orthographies for previously unwritten languages, failing to mark important features such as tone and vowel length which they could not distinguish themselves, and sometimes marking unimportant allophonic detail.

Many contemporary new orthographies are modeled after the IPA. For instance, many languages in parts of Africa have the 7-vowel system /a ɛ e i ɔ o u/, and their recently introduced orthographies commonly make use of the graphemes ⟨ɛ, ɔ⟩.

Exercise 1: Nēhinawēwin
Swampy Cree (Nēhinawēwin) is a dialect of Cree, an Algonquian language spoken in Manitoba and Ontario. It is one of a number of Canadian languages which uses a writing system which is a type of the collection of related orthographies known as Canadian Aboriginal syllabics (or just syllabics). The following is an Swampy Cree inscription from Winnipeg, written in syllabics. Using the transcription of the text in the Latin alphabet, given below, determine what rules govern characters in Cree syllabics, and what type of script you think it should be considered as.



Êwako oma asiniwi mênikan kiminawak

ininiwak manitopa kaayacik. Êwakwanik oki

kanocihtacik asiniwiatoskiininiw kakiminihcik

omêniw. Akwani mitahtomitanaw askiy asay

êatoskêcik ota manitopa.