The death of the Urdu script

Thom Holwerda 2014-06-25 Office 36 Comments

Way back in 2009, I wrote about a few specific cases in which computers led to (subtle) changes in the Dutch language. While the changes highlighted in that article were subtle and not particularly substantial, there are cases around the world where computing threatens much more than a few subtle, barely noticeable features of a language.

This article is a bit too politicised for my taste, but if you set that aside and focus on its linguistic and technological aspects, it’s quite, quite fascinating.

Urdu is traditionally written in a Perso-Arabic script called nastaliq, a flowy and ornate and hanging script. But when rendered on the web and on smartphones and the entire gamut of digital devices at our disposal, Urdu is getting depicted in naskh, an angular and rather stodgy script that comes from Arabic. And those that don’t like it can go write in Western letters.

It’d be fantastic if Microsoft, Google, and Apple could include proper support for nastaliq into their products. It’s one thing to see Dutch embrace a new method of displaying direct quotes under the influences of computers, but to see an entire form of script threatened is another.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

36 Comments

2014-06-25 4:06 pm

mail4asim
Wow, never thought I would read an article about Urdu font on OSAlert. Thank you Thom for sharing this.

I have to agree that Urdu fonts on most of the websites, including Facebook are horrible. But that’s all we have and some people, like my mother, are able to work with it.

I cringe every time I get a message in Roman Urdu but that’s the easiest way to communicate for many of us without an Urdu Keyboard.

Learning Urdu script helped me in reading Arabic and Farsi. But on the other hand Roman Urdu has helped a lot of people learn English and be able to communicate globally.

Edited 2014-06-25 16:07 UTC
2014-06-25 4:21 pm

mutter
Roman Urdu is the bigger threat by far. Everyone uses it and it is truly disgusting. Urdu script is so badly rendered that people find it difficult to read it off a computer screen. Part of the problem is rampant software piracy in South Asia. There is little incentive for software makers to cater to consumers here since there are few paying consumers here and fewer still who demand this.

Edited 2014-06-25 16:22 UTC

2014-06-25 10:34 pm

chithanh
There is little incentive for software makers to cater to consumers here since there are few paying consumers here and fewer still who demand this.

In the age of free and open source software, I can no longer consider this argument valid.

One no longer needs money, but only to motivate a group of sufficiently qualified volunteers. If an entire people’s cultural heritage is threatened, then that should not be too difficult.

2014-06-26 1:58 am

bassbeast
If the FOSS argument were valid then there would be a FOSS alternative to almost any commercial software of note….hint, not even close.

FOSS has shown time and time again if you are not a PROGRAMMER and your primary job is not PROGRAMMING then FOSS frankly isn’t for you. This is why you have a billion text editors but no decent medical transcription software, half a dozen shells but no drop in replacement for Quickbooks/Quicken, its because FOSS is written BY programmers and FOR programmers, not a programmer? move on please.

The only way anything would be done about this is if somebody were to use Kickstarter to just outright buy an urdu script to give to the world but frankly that need not be FOSS, it could be public domain, I just haven’t seen much of squat happening in FOSS as of late that isn’t geared for programmers or server admins, sorry.

2014-06-26 1:04 pm

james_gnz
If the FOSS argument were valid then there would be a FOSS alternative to almost any commercial software of note….hint, not even close.

FOSS has shown time and time again if you are not a PROGRAMMER and your primary job is not PROGRAMMING then FOSS frankly isn’t for you. This is why you have a billion text editors but no decent medical transcription software, half a dozen shells but no drop in replacement for Quickbooks/Quicken, its because FOSS is written BY programmers and FOR programmers, not a programmer? move on please.

…

I think that’s a bit of an over-generalisation. I’m given to understand that there are non-techie types that are able to use Ubuntu, and find it to do what they need.

That said, you’re right that much restricted software doesn’t have free equivalents, but I rather suspect it’s often because the restricted software exists that equivalent free software doesn’t.

Necessity is the mother of (at least some) invention(s). The desire to help others and/or for the fame that may come from doing so, or simply doing something new out of interest, are others. These motivators are all reduced if something comparable already exists.

If someone can license software that does what they need, then they no longer need to write free software to do it. Further, if sharing data is important (as it often is), then it will no longer be enough to just write the software, it also has to be made compatible with whatever else already exists (which is likely to be a moving target). Without this compatibility, it’s less likely to be as helpful to others, which means there’s less chance of widespread adoption, and less chance of fame, and writing something similar to something else that already exists may simply be less interesting.

Copyright law is good at producing good-enough new software fast. When hardware advances make a new application for software possible, copyright law provides the funds to quickly throw together a team and ship a first version (whereas free software projects tend to be relatively slow to gather momentum).

No doubt this is good in the short term, but I’m not convinced it’s a nett positive in the long term.

2014-06-27 4:58 am

mutter
Pakistanis don’t use open source software because they prefer pirated microsoft software. OSS doesn’t stand a chance when you have mass piracy.

The future of the language is not threatened. Only the elite can afford computers and internet access so this only affects them. In future when more people can afford it demand will increase and the market will move to meet that demand. But as I said right now there is little demand.

2014-06-25 5:20 pm

ncafferkey
Technology was killing scripts even before computers arrived. As far as I know, the Irish Gaelic script was abandoned in the early 20th Century in favour of Roman script due to the difficulty of reproducing it with typesetting and typewriters.

2014-06-25 10:20 pm

hhas
This. Middle- and Far-East scripts are beautiful to look at and a right royal PITA to support due to their much more complex layout rules. Chinese, Japanese, Korean, and Thai require special rules to determine where line wraps may or may not occur. Arabic script uses complex ligatures so each characters requires different glyphs when appearing at the start, middle, or end of a word. Thai doesn’t even have punctuation. And so on.

All this makes it difficult and time-consuming (i.e. costly) to implement such scripts correctly, so can’t entirely blame vendors for being reluctant to undertake such work when it’s not going to pay for itself due to piracy, poverty, etc. If the article author wishes to do more than tilt at windmills, he needs to address these economic and technical challenges as well.

Edited 2014-06-25 22:21 UTC

2014-06-25 5:21 pm

chithanh
There are actually two reasons why Latin/Roman script is becoming popular in Urdu.

One thing is that the use of Latin alphabet and Western Arabic numerals is considered modern. So if you want to show the world that you are a modern person or organization, you use Latin script. This can also be observed in other regions with Arabic script like Afghanistan or Iran, but also countries like Greece.

The second reason is that the Latin script really is superior for writing Urdu. The Arabic script was adopted due to cultural dominance, not because it was optimal. Similar things have happened in the past elsewhere, e.g. in the Vietnamese language which formerly used a writing system based on Han ideographs due to influence of the Chinese culture. While Han may be optimal for writing Chinese, it is not for Vietnamese. It has been replaced a century ago with a much easier to learn Latin script (with diacritics).
2014-06-25 5:48 pm

tidux
The author has all the classic symptoms of being a hipster twat:

* He’s an American going on about “muh heritage” on the Internet.

* He’s an American whining about “cultural appropriation” without a hint of irony.

* He’s an Apple fanboy.

* He’s obsessed with outdated and generally useless writing technology. This is more commonly typewriters or fountain pens, but he’s chosen to go for a crossover with whining about “cultural appropriation” and obsess over writing on wood with shitty brushes.

If you strip out the politics and manufactured outrage, this boils down to language patterns being modified to fit the computer, which is barely news at this point. Even in English we have txt speak, and the rules of quotation mark placement seem to be changing to better match how they work for string parsing in programming languages.

2014-06-25 6:31 pm

ngaio
No. The author loves an aspect of his literary culture and not surprisingly he doesn’t want it lost for mundane reasons. I wish him all the best in his efforts. I’m glad to see some in the IT industry are on board too.
2014-06-25 10:15 pm

chithanh
Calling the author a hipster twat was maybe a bit harsh.

I do agree that the author is (perhaps irrationally) attached to a script that is poorly suited for the information age.

I also agree that he does not make any sound arguments why someone should prefer nasdaliq over naskh or Latin script.

But he does raise a valid point: Many people who wish to use nasdaliq cannot do so, because of the limitations of their computer software. And that is something that needs to change.
2014-06-25 11:08 pm

oskeladden
If you strip out the politics and manufactured outrage, this boils down to language patterns being modified to fit the computer, which is barely news at this point.

It’s a lot more than that. How would you feel if you were told that you had to read and write in Fraktur from now on, because that’s all the computer would support? I’m not a native speaker of Urdu, but I do read it as a second (or, more precisely, fourth) language, and yes, the difference between naskh and nastaliq really is that big.

2014-06-26 8:54 am

mkowalik
==

Edited 2014-06-26 08:58 UTC
2014-06-26 8:57 am

mkowalik

It’s a lot more than that. How would you feel if you were told that you had to read and write in Fraktur from now on, because that’s all the computer would support? I’m not a native speaker of Urdu, but I do read it as a second (or, more precisely, fourth) language, and yes, the difference between naskh and nastaliq really is that big.

Actually, I think the proper question would be “How would you feel if you wre told that you had to read and write english in Cyrillic from now on?”. Using different script is bit more than using different font..

2014-06-26 11:55 am

dnebdal

Actually, I think the proper question would be “How would you feel if you wre told that you had to read and write english in Cyrillic from now on?”. Using different script is bit more than using different font..

I get the impression this is about as fruitful as the biological debates about where on the family tree you separate out a variant vs. species vs. a species complex vs. a family.

For writing Arabic, the different styles have the same relationship as fraktur to latin – same sounds, same letters, mutated shapes. (There is Arabic written in a nastaliq style.)

For writing Urdu, naskh and nastaliq are, as I understand it, more like cyrilic to latin – there’s a definite overlap, but there’s also sounds/letters unique to each. I guess you could compare it to writing Icelandic in native-alphabet latin vs. English-alphabet Fraktur (a subset of the sounds, with odd shape variants).

Oh, and I’d say blackletter is a bit further off than what’s typically implied by “font” or “typeface” alone. The ligatures, shapes, and general shaping rules can make it look quite alien – especially in some of the handwritten variants (ref http://commons.wikimedia.org/wiki/File:Calligraphy.malmesbury.bible… ). The nicest fraktur variants probably qualify, though – and they’re a bit more readable (e.g. http://de.wikipedia.org/wiki/Fraktur_%28Schrift%29#mediavie… ).

Edited 2014-06-26 12:15 UTC

2014-06-30 9:56 pm

zima
* He’s obsessed with outdated and generally useless writing technology. This is more commonly typewriters or fountain pens

What’s bad about fountain pens? (and my portable Kolibri typewriter will be useful during the inevitably upcoming apocalypse)

2014-06-25 6:12 pm

teco.sb
I wouldn’t be so quick to point the finger at Google/Apple/Microsoft. This seem to me like a Unicode Standard problem, much like the CJK ideograph problem (http://www.unicode.org/faq/han_cjk.html or http://en.wikipedia.org/wiki/CJK_Unified_Ideographs).

Granted Google, Apple and Microsoft are all members of the Unicode Consortium and could request an amendment to the standard, but Unicode has made it clear with the unified Han character encoding that this is the way forward for them.

How is this issue different from the unified CJK characters?

By the way, simply including a special typography for these scripts is not generally considered a solution by the native speakers of these languages. China, for example, still uses the GB18030 because they do not like Unicode for their language (see http://en.wikipedia.org/wiki/Chinese_character_encoding).

Edited 2014-06-25 18:30 UTC

2014-06-25 9:07 pm

sakeniwefu
You are right.

The only thing you could blame the companies for would be not refusing to implement the Unicode standard, deriding it in public and coming up with a sane alternative.

The Unicode standard is inconsistent and awful design through and through.

Combined and pre-combined glyphs, random width glyphs, random CJK unification(not that consistent unification would be any better), backward-compatiblility exact duplicates, emoji, UTF-16, control characters and who knows what else.

The only part of Unicode that deserves salvaging would be the UTF-8 encoding, and even that has been infected by other sections of the standard.

Sadly, Unicode does fulfill most companies and developers’ needs most of the time. Some very influential people need to be very pissed for anything to change for the better.
2014-06-26 9:29 am

james_gnz
… How is this issue different from the unified CJK characters? …

Not claiming to be an expert, but I am given to understand that CJK correspondences can be rather nuanced, whereas I get the impression that here the correspondences are straightforward one-to-one, and the problem is with complex font rendering. Without taking a position on Han unification, it seems to me that unification here makes perfect sense.

2014-06-26 6:48 pm

teco.sb
My understanding of the article is that the point the author was trying to make is the naskh script is being used in place of the nastaliq for the Urdu language. The point I was trying to make is that, unless the Unicode standard has separate code points for these two different scripts, which it probably doesn’t, there is nothing that an application can do. How will the application know that a particular code point is to be displayed as nastaliq or naskh script? This is a rhetorical question, and the answer is: it doesn’t.

To me this is the same exact problem posed by Han unification, where multiple scripts (Traditional Chinese, Simplified Chinese, Korean and Japanese) are all represented by the same code points in the standard. Which glyph is used to represent a particular code point is entirely up to the application. This actually means that to a Japanese person, a Chinese text would be rendered with Japanese glyphs. Yet, in China, the same text would be rendered with Chinese glyphs. The same applies here, the naskh glyphs are being used to represent nastaliq glyphs and the author is complaining it looks bad.

2014-06-27 12:06 am

james_gnz
My understanding of the article is that the point the author was trying to make is the naskh script is being used in place of the nastaliq for the Urdu language.

Yes, but I think this is because a Nasta^E?l"A<
The point I was trying to make is that, unless the Unicode standard has separate code points for these two different scripts, which it probably doesn’t, there is nothing that an application can do. How will the application know that a particular code point is to be displayed as nastaliq or naskh script? This is a rhetorical question, and the answer is: it doesn’t.

My understanding is that the scripts are unified (not separate code points), and the choice of script depends on font selection (either in the document, or the user’s defaults). While font selection might not be perfect, I think it’s better than the alternative, where if the desired script isn’t available, nothing useful gets rendered.

To me this is the same exact problem posed by Han unification, where multiple scripts (Traditional Chinese, Simplified Chinese, Korean and Japanese) are all represented by the same code points in the standard. …

My understanding is that with Han unification there may be nuanced differences in meaning, not just appearance, that may make it difficult to draw black-and-white one-to-one correspondences.

2014-06-25 8:36 pm

NaGERST
At the risk of sounding somewhat political and uneducated on the issue (as i do not speak the language in question) i would suggest they could use the characters used for hindi as both hindi and urdu are related and in many cases mutually intelligible. Before the partition of india i am told that they did.

Perhaps it is a bad idea seeing how some indians and some pakistani people do not seem to like each other very much, but at least it an option.
2014-06-26 7:00 am

siraf72
My initial reaction is that this is just a whinge. Naskh performs the function just fine. But if you’re missing 10 letters that’s a different story. I didn’t realise there were that many more letters in Urdu.

Not sure I agree that arabic fonts are “stodgy”. Some cultural bias there methinks!

On the note of using roman letters. Arabic speaker typing Arabic using english letters have appropriated the numbers 2,3, 5, 6, 7, and 9 to make up for lack of equivalent letters. they’re shapes (very roughly) resemble the arabic letters required.

For example the word Naskh becomes Nas5.

Edited 2014-06-26 07:10 UTC
2014-06-26 8:30 am

safiuddinkhan
urdu script is one of the most difficult scripts to to be automated via typesetting and it wasnt till 80s with the rise of computers and invention of jamil noori nastaliq system that people were able to write urdu properly using nastilq script …. although article is correct to some extent but its not completely true …. microsoft has proper urdu nastaliq fonts preinstalled with windows 8 … for linux and mac a freely available version noori nastliq in openttf format is avalible which is used both to display proper urdu and to write proper urdu ….. openttf noori nastliq also works well on libre office and microsoft office and i mostly use it to write urdu text ….. most pakistani urdu news sites use noori nastilq to display text
2014-06-26 10:12 am

Soulbender
…I demand support for Runes in modern operating systems and smart phones.

Edited 2014-06-26 10:12 UTC

2014-06-26 11:07 am

james_gnz
…I demand support for Runes in modern operating systems and smart phones.

< http://en.wikipedia.org/wiki/Runes#Unicode >

Renders fine on my computer except for the last 8 characters (added in Unicode 7.0). I suspect runes are relatively easy to render compared to what’s being asked here.

Edited 2014-06-26 11:08 UTC

2014-06-26 1:31 pm

Soulbender
Yeah, but can you easily input using runes?

2014-06-26 7:02 pm

oskeladden
Yeah, but can you easily input using runes?

The irony is that the Germanic Runes are actually better supported in Android than Nastaliq is. You can, without too much difficulty, create a custom keyboard that lets you type in characters from the Runic block in Unicode. As of last August (I give that date because I haven’t checked since), it was impossible to get Android to display Urdu in Nastaliq rather than Naskh. Android uses the same system font for Arabic, Persian, Urdu (and other languages using the Perso-arabic script). That font uses Naskh, and there’s no obvious way to change it.

2014-06-26 12:25 pm

henderson101
Ah – but Futhark or Futhork?

2014-06-26 1:21 pm

james_gnz
Ah – but Futhark or Futhork?

No problem, Unicode has unified them.

2014-06-26 4:29 pm

henderson101
That doesn’t exactly work though does it? They are quite different.

2014-06-27 1:30 am

james_gnz
That doesn’t exactly work though does it? They are quite different.

I’m not entirely sure what you mean, but a brief Internet search hasn’t shown me anything to suggest that there’s any dispute about what the correspondences are between the different Runic scripts.
2014-06-29 10:47 pm

henderson101
The Futhorc used for old English and Frisian had a significant number of different and extra variation of characters. Especially around the A.

2014-06-27 9:42 am

msoni
Disclaimer: I am from and living in India and we have a history of 3 wars with Pakistan for past 60 years, though we do share 1000 year history with them of a common culture. I don’t know and have never learnt how to write Urdu

I can empathise with the writer because Indic languages have brought similar problems in rendering. It has agonisingly painful to watch no hindi alternative emerge for a long time except to use ISCII based code (http://hindi-fonts.com/fonts/devlys-010) which gets butchered on an unsupported PC.

Indians too have started using romanized hindi for most of their applications. To be precise, the first mainstream devices to support hindi out of the box were android phones. Microsoft had it way before but it was complex and required knowing english to start with.

The bigger thing I can empathise with is the writers difficulty in trying to convey the concept of script-based calligraphy to person unfamiliar with the concept. Sanskrit (which is incidentally the most computer easy language to code) supports space based compression in writing with samas and sandhi (see http://en.wikipedia.org/wiki/Sanskrit_compound ).

Devanagari script requires skillful typography coding to emulate this only a native can see bad hindi/sanskrit examples to understand the cognitive dissonance caused by seeing such examples. For example (this is from a 2014 bug): https://code.google.com/p/chromium/issues/detail?id=355445

See bad:

https://chromium.googlecode.com/issues/attachment?aid=3554450000000&…

See OK

https://chromium.googlecode.com/issues/attachment?aid=3554450000000&…

It is very upsetting to see your own name getting butchered. For someone else also it causes significant cognitive dissonance. It like going on a beautiful road of flowers and suddenly encountering a pile of shit.

So Thom, I can empathise with the writer and I can assure you he is not being political only honest.

2014-06-27 8:17 pm

oskeladden

See bad:

https://chromium.googlecode.com/issues/attachment?aid=3554450000000&…

See OK

https://chromium.googlecode.com/issues/attachment?aid=3554450000000&…

That looks to me like an issue with the particular font Chrome is using – it looks like the font doesn’t have a ligature for the A> + r combination. Have you tried setting Chrome to use a different font for Hindi?