NTA Unicore reply

NTA Unicore reply

Santhosh Thottingal

Thanks Martin Hosken for bringing up this topic and I agree with your proposal. I tried to summarize my response as below.

  1. Majority of the fonts in Malayalam follow NA + Virama + RRA for /nta/ and Chillu N+RRA for /nra/, defining alternate sequences for this is unwanted. Children are taught in schools to use NA + Virama + RRA for /nta/. Government's Malayalam computing education programs also use the same. 
  2. Alternate encoding sequence for a heavily used sequence does not help Malayalam at all. /nta/ is required to write എന്റെ(meaning: My). A person named Antony(ആന്റണി) doesn't want his name to be written in multiple ways to have a miserable life. A contradicting sequence suggested by Unicode is very confusing and need to be corrected as Martin Hosken proposed. 
  3. Input tools by Google, like the one in Google docs, Google input tool etc produce the NA + Virama + RRA sequence for /nta/. What Cibu described is incorrect, you may try yourself and verify this(start a google doc, select language as Malayalam, choose input tools and write nta using the first one for Malayalam.). I recorded a short video on this. https://thottingal.in/tmp/nta-googledoc.mp4 I am yet to see an input method that produce sequence as in Chapter 12, except the one-to-one keyboards like Inscript(But nobody tought this sequence). 
  4. A sequence like "അവൻെറ (/..nte/; ...chillu-n, sign-e, rra)" as Cibu explained does not exist. Such a rendering can be obtained by a font with non-stacking ന്റ ligature. A similar case is ള്ളെ(LLLLE) - here you can do the same trick as Cibu explained to get similar visual appearance. That is, ള + െ + ള = ളെള(LLELLA). But that does not become a intended sequence. (ICANN IDN rules specifically prevent a vowel sign after chillu). The reason why there is no font that has non-stacking ന്റ is because it cannot be visually differentiated from Chillu N+ RRA, read as /nra/ as in Henry, Enroll etc. 
  5. Back in 2013 I started drafting an editorial correction proposal for Chapter 9(Now it is chapter 12) explaining these issues. May be I need to update that as per Chapter 12 now and submit to unicode. https://thottingal.in/documents/Unicode-Chapter09-Proposed-Corrections.pdf There are more inconsistant statements and tables in the same chapter. I also have some notes about this issue written in 2011 https://thottingal.in/documents/Malayalam-NTA.pdf outlining the timeline, and discussions happened at that time.
  6. It is very unfortunate that the typeface engineers has to put several sequences for a single ligature as Liang Hai illustrated. The typefaces designed and maintained by Swathanthra Malayalam Computing does not do this. SMC follow one and only one sequence for nta. That is NA + Virama + RRA. 
  7. The definition in Chapter 12 was not a problem so far since no input tool supported it. But if any input tool developer refer that and implement one, problem starts. People writing nta using that won't match with existing content, failing all search, and any language processing applications. Hence it is very important to correct it. 
  8. About the support in Microsoft Windows, here is a screenshot from Windows 10, Nirmala UI font: https://i.imgur.com/7Lumzql.png As you can see it does not render the Chapter 12 sequence. It uses NA+VIRAMA+ZWJ+RRA for /nta/ which is wrong. I am not sure if there is a newer version of this font with different behavior. 
  9. The NA + Virama + RRA sequence is correct as per Malayalam since it follows the nasal+virama+voiceless unaspirated plosive pattern of conjunct formation similar to ങ്ക, ഞ്ച, ന്ത, ണ്ട, മ്പ.
  10. As of now, most of these issues can be fixed if Microsoft fonts follow the NA+VIRAMA+RRA sequence. There is no need to introduce a new encoding.





Report Page