Urdu/Urdu Script

Lesson 1: Introduction
The Urdu alphabet comprises 38 letters, mostly derived from the Arabic and Persian writing systems. Like those languages, Urdu is written and read from right to left and letters are connected by means of ligatures. Learning the 38 letters, which may vary in size and shape depending on their position in a word and the particular letter preceding or following them, requires significant practice.

In this lesson, we shall introduce the first two Urdu letters. As it happens, with just these two letters we shall be able to explore many of the basic rules of Urdu writing.

ا (Alif)
Alif is the first letter of the Urdu alphabet. It is often augmented by a vowel marker (which we shall study in a later chapter), and thus can sound like many different English vowels. On its own, it is usually pronounced like the 'a' in the English 'car'. We shall indicate long vowels by a macron in this text (e.g., long "a" shall be written "ā").

The table above shows Alif (connected with Be, the other letter in this lesson - see below). In its isolated position, it is a simple line, similar to the lower case English 'l'. Like 'l', Alif also represents the height of a "line" of Urdu writing, though, as we shall see, certain letters extend above or below the line.

Exercise 1.1: Locate the Alif in each cell of the table.

ب (Be)
Be is the second letter of the Urdu alphabet. It sounds like the English 'b'.

We shall study several letters similar to Be in subsequent lessons. These letters are collectively known as "bowl letters". Bowl letters are generally about half the height of the Alif and sit on the same line as the Alif. Be is distinguished by the single diacritic mark underneath its bowl. In the table above, note that the width of the Be changes significantly depending on its location in a word. When it is in the initial or medial position, it is compressed to about half its normal width. We will discuss bowl letters and their various permutations in more depth in a subsequent lesson.

Exercise 1.2: Locate all the Be letters in the table.

Exercise 1.3: Review the tables for both ا and ب paying particular attention to when letters are connected and when they are not. Describe the pattern.

Terminal and Connecting Letters
Certain letters in Urdu cannot connect to the letter coming after them (i.e., the letter to their left) even if that letter is part of the same word. We shall refer to these letters as "terminal" (though, again, they do not necessarily terminate a word). Such letters are also sometimes referred to as "unfriendly" letters.

Most letters, on the other hand, are able to make connections to letters coming before and after them. However, even if the last letter of a word is a "friendly" letter, it does not connect to the first letter of the next word in the sentence.

Alif is a terminal letter. It can be connected to letters preceding it, but cannot be connected to letters following it. Hence:

ب + ا = با

but

ا + ب = اب

Or, a combined example:

ب + ا + ب = باب

By contrast, Be has no difficulty connecting to prior or subsequent letters:

ب + ب + ب + ب = بببب Exercise 1.4: Print out the letter forms worksheet and write each letter or combination of letters at least five times.

Exercise 1.5: Connect the following letter sequences. Recall that the width of Be changes depending on its position in the word.


 * 1) ا + ب + ب
 * 2) ا + ب + ا + ب
 * 3) ب + ب + ا
 * 4) ب + ا
 * 5) ا + ب + ا
 * 6) ب + ب + ا + ب + ا

Lesson 2: Vowel Markers
Vowel markers play a crucial role in learning Urdu. Generally, Urdu writing omits these vowel markers, but they are used extensively until sufficient familiarity with Urdu vocabulary renders them superfluous.

In the last lesson, we described the sound of Alif as similar to the 'a' in 'car', that is the long vowel "ā" as opposed to the short vowel "a" in "cat". Urdu uses three markers to indicate short vowels:



Positioning Vowel Markers
Vowel markers are written above or below the letter preceding the vowel sound. In other words, to write "bub" the Pesh (ُ) would be written above the first ب as such: بُب.

Another way to think about this is to view the vowel markers as modifying the sound of the vowel they are attached to. Consonants cannot be pronounced without some added on to them, and the vowel marker merely indicates a particular consonant-vowel combination.

Vowel markers are generally written above or below the diacritic marks belonging to letters. Thus, "bi" is written بِ. Vowel markers can be further distinguished from diacritic marks by the fact that the vowel markers consist of strokes or the comma-like Pesh, whereas diacritics are points.

Exercise 2.1: Write the following in Urdu:
 * 1) "bib"
 * 2) "bab"
 * 3) "ba"

Default Pronunciation
The default vowel sound separating two letters is Zabar (َ). If you do not know how to pronounce a word and do not have any vowel markings to guide you, assume Zabar is the vowel. Thus, by default بب would be read "bab", not "bib" or "bub". You will do this less and less as you learn to recognize words without the aid of vowel markings.

Short Vowels with Alif
When Alif is combined with Zabar, Zer, or Pesh, a short "a", "i", or "u" is produced. At the beginning of a word, in the absence of any vowel marker, Alif is assumed to be the short "a". In medial or final positions, an unmarked Alif is generally assumed to be a long "ā".

Thus, "ab" can be written اَب or اب.

To write "ba", we should not use Alif at all. با would be read as Be followed by the long vowel "ā", or "bā". Writing باَ would seem to indicate a Be followed by a short vowel "a", but this is not how the sound is written in Urdu. Instead, "ba" should be written بَ. The Alif is completely unnecessary and, moreover, incorrect.

Exercise 2.2: Write the following in Urdu:
 * 1) "ib"
 * 2) "baabi"
 * 3) "babu"
 * 4) "abu"

Alif Madda (آ)
Of course, it sometimes becomes necessary to initiate a word with "ā" and to do so Urdu has a special vowel marker used only with Alif known as the Madda. Recall that Alif in the medial and final positions already has the sound "ā", and so the Madda should not be used in those positions (though on rare occasions it is used in the medial position).

Exercise 2.3: Copy the attached forms at least five times each.

Exercise 2.4: Write the following in Urdu:
 * 1) "āb"
 * 2) "bābā"
 * 3) "babā"
 * 4) "ābiba"
 * 5) "bubāba"
 * 6) "ub"

Exercise 2.5: Transliterate the following from Urdu to English:
 * 1) بَبا
 * 2) ابا
 * 3) ببُ
 * 4) بِباب
 * 5) بابُ
 * آب

Lesson 3: Bowl Letters
The "bowl letters" are a group of connecting (or "friendly") letters that, for the most part, look similar to Be.

Identifying a Bowl Letter
When connected, bowl letters generally do not retain both sides of their bowl structure. We saw this earlier when connecting ب and ا as با. Note that the right side of the bowl is visible, but the left side is nowhere in evidence. The vertical portion of the right side of the bowl is sometimes referred to as its "tooth". The tooth of a bowl letter must always be visible.

For example, when parsing ببا ("babā"), note that the initial Be and the second Be do not have left sides to their bowls, but each retains its tooth. Recognizing the tooth is sufficient to tell you that you are looking at a bowl letter.

Once the letter has been identified as a bowl letter, its associated diacritics must be located and counted. The principal difference between most bowl letters is the number (one, two, or three) and position (above or below) of its diacritics.

Recall from our earlier discussion of Be that bowls can be significantly compressed in the initial and medial positions, and are generally allowed their full width in the final position.

ب (Be)
Be sounds like the English 'b'. It is identifiable by its tooth and a single diacritic below.

پ (Pe)
Pe sounds like the English 'p'. It is identifiable by its tooth and three diacritics below in the shape of a triangle pointing downward.

ت (Te)
The sound of Te is somewhere between the English "t" and the "th" in "the". It is sometimes called the "dental t". It is identifiable by a pair of diacritics side by side positioned above the bowl.

ٹ (Ṭe)
Ṭe is a hard "t" sound, more emphatic than the hard "t" of the English "tack". It is sometimes called the "retroflex t". We shall render this sound "ṭ" in our transliterations. The letter is easily identifiable by its distinctive diacritic mark: ط.

ث (Se)
Se is one of a few Urdu letters making the sound of the English "s". It is relatively uncommon. It is identifiable by the three diacritics forming a triangle above the letter (contrast this with Pe, where the diacritics are below the letter and pointing down).

ن (Nūn)
Nūn makes the sound of the English "n". It is distinguished by a bowl that is rounder and deeper than those of the other bowl letters, and by a single diacritic placed above or inside the bowl. Nūn's characteristic bowl shape is visible in the isolated and final positions. In the initial and medial positions it looks like any other bowl letter.

ی (Ye)
Ye, like the English "y" can act as both a consonant and a vowel. Thus, it at times sounds like "y" in "yard" and at other times sounds like "ee" in "tree" (we will render this sound as "ī"). Ye is significantly different from other bowl letters in its isolated and final forms. The isolated and final forms of Ye have no diacritic marks and dip below the line of writing. In its medial form, however, Ye behaves like a typical bowl letter, and is distinguished by two diacritic marks below the line. Exercise 3.1: Write the following in Urdu:
 * 1) "saban"
 * 2) "bas"
 * 3) "nīn"
 * 4) "yābāt"
 * 5) "tabīyāt"
 * 6) "ṭap"
 * 7) "pān"
 * 8) "nān"
 * 9) "tusī"

Exercise 3.2: Transliterate the following into English:
 * 1) باپ
 * 2) پاٹ
 * 3) بتا
 * 4) آثان
 * 5) ٹیپُ
 * 6) بیبی
 * 7) ثِبٹ

Exercise 3.3: Identify the bowl letters in the following texts. Remember, look for the tooth and then the diacritics and ignore any forms we have not learned:
 * 1) اگر قلم میں تلوار سے زیادہ طاقت ہے، تو اردو ویکیپیڈیا پر یہ طاقت آپ کے ہاتھ میں ہے۔
 * 2) کینیڈا رقبے کے لحاظ سے دنیا کا دوسرا سب سے بڑا ملک ہے۔
 * 3) ویکیپیڈیا تدوینی آموختار میں خوش آمدید۔

Lesson 4: Triangle Letters
The "triangle" letters are a group of nearly connecting (or "friendly") letters distinguished only by their diacritics.



Identifying Triangle Letters
As you can see, the "triangle" letters are so named because they retain the triangular portion of their bodies in all of their forms. Note that the semi-circular portion of the letter swoops below the line in the isolated and final positions.

Once the triangular portion of the letter has been identified, you need only find and count the diacritics in order to identify the letter.



A Note on Nastaliq
Identification can be made more or less difficult by the particular font or writing style employed by a given text. In the Urdu/Persian style known as Nastaliq, words do not always sit on a line, and this begins to become apparent as you start to deal with triangle letters. Words involving triangle letters in Nastaliq begin to take on a diagonal aspect which, while certainly beautiful, can make letter identification more difficult for beginners. Nastaliq is, however, very common in Urdu.

The image to the left shows the Persian and Urdu words for "five", پنج and پانچ respectively, being parsed letter by letter. Note that the Nastaliq forms of the words differ significantly from the flatter "Naskh" forms in this text.

ج (Jīm)
Jīm sounds like the English letter "j", as in "judge". It is identified by its single diacritic mark that appears below the letter in its initial and medial forms, or inside the letter in its isolated and final forms. ج must be carefully distinguished from ب, especially in its initial and medial positions, when it has a single diacritic mark below. See, for example, با vs. جا.

چ (Che)
Che sounds like "ch" in English, as in the borrowed word "chai". It is identified by its three diacritic marks forming a triangle pointing downward. These diacritics appear below the letter in its initial and medial forms. Care must be taken to distinguish چ from پ, which has the same diacritics. In particular, چ in its medial form is often confused for پ.

ح (He)
He in Urdu is pronounced like the English "h". It is not the only "h" sound in Urdu, and is less common than the Choṭī He (literally, "small He") we will encounter later. This triangle letter has no diacritic marks.

خ (Xe)
Xe is a sound in Urdu that has no direct equivalent in English. It is a cross between "x" and "kh" pronounced in the back of the mouth, similar to the "ch" in the Scottish "loch". Xe is the only triangle mark with a diacritic above the letter. خ must be distinguished from ن, which also has a single diacritic above. See, for example, بخا vs. بنا.

Exercise 4.1: Transliterate the following into English:
 * 1) بجا
 * 2) جپی
 * 3) خانا
 * 4) چِنات
 * 5) حثاب
 * 6) پچیچ

Exercise 4.2: Transliterate the following into Urdu:
 * 1) "sachī"
 * 2) "bachā"
 * 3) "Xhāb"
 * 4) "buXhṭī"
 * 5) "hijāb"

Exercise 4.3: Transliterate the Nastaliq forms in the figure to the right into English and rewrite them in the flat Naskh style.

Lesson 5: The Terminals
We have covered one terminal, or "unfriendly" letter, the Alif (ا). In this lesson, we will cover the remainder of the unfriendly letters in Urdu. While the list of letters below may seem long, there are in fact only three basic forms presented in this chapter: variety is introduced by the diacritics.

These eight letters together with Alif encompass all of Urdu's terminals. Each of these letters can be connected to letters preceding them, but they cannot connect to letters following them.

All other letters in Urdu can connect to preceding and following letters. Note, however, that letters at the end of a word do not connect to the first letter of the next word, regardless of whether they are terminal.

د (Dāl)
Dāl makes a sound similar to the English "d", but is softer. To some it sounds like a cross between the "th" in the English "the" and "d". Dāl lacks diacritics and its form is the basis for the next two letters (often collectively referred to as the "Dāl series").



When written in its isolated form, د sits on the line. It's form changes only a little when connected to a preceding letter. Just as we saw with the "tooth" of the bowl letters, the line of writing must go up to indicate a Dāl.

Thus, with ب + پ we observe a tooth indicating the beginning of the پ and with ب + د we see something very similar:

بپ vs. بد

Note that the Dāl's left side does not go up. If the left side did rise, Dāl would look too much like a bowl letter. Similarly, think of what would happen if Dāl was not a terminal letter: if such were the case, a medial Dāl (or other Dāl series letter) would look exactly like a connected bowl.

ڈ (Ḍāl)
Ḍāl makes a retroflex "d" sound that does not exist in English. We shall render this sound "ḍ" in this text. The symbol ط above the letter marks it as a retroflex. We saw this symbol above ٹ, which was also retroflex. Aside from this diacritic, the form of Ḍāl is exactly like that of Dāl.

ذ (Zāl)
Zāl sounds like the English "z". It is one of several "z" sounds at Urdu's disposal, and it is less common than another "z" we shall encounter in this chapter. Zāl mimics the form of Dāl and can be distinguished by a single diacritic above the letter.

ذ can easily be confused for ن, due to the similarity of the diacritics. The two can be distinguished in the medial position by the fact that ن connects to next letter while ذ does not. In other positions, one must check whether the letter is a complete bowl, as would be the case for ن. Note also that Zāl, like Dāl, stays above the line, whereas ن hangs below.

Exercise 5.1: Identify all of the Dāl series letters in the following text. Remember that Dāl series letters sit above the line.

ذاتی شمارِند۔ ایک ایسا شمارند۔ہے جس کی قیمت، جسامت اور اہلیت اِس کو افراد کے لئے مفید بناتی ہے، اور یہ خصوصاً صارف کے برا۔ راست اِستعمال کے لئے مقصود ہوتا ہے، جبکہ شمارند۔ اور صارف کے درمیان کوئی عامل حائل نہ ہو۔.

ر (Rē)
Rē sounds like a rolled "r", as in the English interjection "brr" or the Scottish pronunciation of "girl". Rē, like Dāl, has no diacritics. Its form is the basis of the "Rē series".

Unlike Dāl, Rē extends below the line and is generally written as a larger letter with a large angle compared to the slightly acute Dāl. Compare:

د vs. ر

Rē can look a bit indistinct when it is in the final or medial position. The antecedent letter it is connected to begins to slope downwards below the line. The downward slope indicates the Rē. Thus:

بر، جر as opposed to Dāl's بد، جد

ڑ (Ṛē)
Ṛē represents a retroflex hard "r" sound that is not found in English. Ṛē is the third and final Urdu retroflex letter, and has the same retroflex marker we've seen before: ط. Apart from its diacritic marker, Ṛē looks exactly like Rē.

ز (Zē)
Zē sounds like the English "z". It is the most common "z" letter in Urdu. Zē is distinguished by a single diacritic above, and is otherwise identical to Rē.

ژ (Zhē)
Zhē produces a "zh" sound akin to the "si" in the English "vision". It is a rarely used letter in Urdu, distinguished by the three diacritic marks above its Rē-like body.

Exercise 5.2: Identify the Rē and Dāl series letters in the text below and specify which series each belongs to.

ڈاکہ ایسا جرم ہے جس میں کوئی قدر رکھنے والی چیز ہتھیا لی جائے یا ہتھیانے کی کوشش کی جائے، طاقت کے استعمال، یا طاقت کے استعمال کی دھمکی، یا ہدف کو خوف میں مبتلا کر کے۔ ڈاکہ چوری سے یوں مختلف ہے کہ اس میں دہشت یا ڈرانے دھمکانے کا استعمال کیا جاتا ہے۔