The fascinating evolution of typing Chinese language characters

This story first appeared in China Report, MIT Know-how Evaluate’s e-newsletter about know-how developments in China. Join to obtain it in your inbox each Tuesday.

The thought of downloading a third-party keyboard to your telephone could appear pointless to most individuals, however in China it’s the norm. 

Chinese language is the one trendy language that’s logographic, that means that the way in which a personality is written will be utterly separate from its pronunciation (Japanese, Korean, and Vietnamese have their variations of the Chinese language characters). Due to that, counting on a default keyboard could be extremely tough. So immediately, 800 million individuals in China use sensible keyboard software program that predicts what a consumer desires to kind.

However a robust reliance on this know-how additionally presents a safety danger: most keyboard apps transmit keystrokes to the cloud to allow higher textual content prediction, creating a chance for the content material to be intercepted if the apps don’t have robust sufficient encryption protocols.

This week, I reported on one such encryption loophole present in Sogou, one in every of China’s hottest third-party keyboard apps. A gaggle of researchers on the Citizen Lab, a College of Toronto–affiliated analysis group, managed to intercept virtually every thing they typed into Sogou by deploying a two-decade-old exploit. 

Not solely can this sort of software program endanger individuals’s private and monetary info, however—maybe extra essential—it may possibly compromise in any other case encrypted messages in apps like Sign, and permit them to be caught by police or malicious actors.

For extra info on this specific loophole and the broader implications, you’ll be able to learn the story right here.

However for the e-newsletter, I wish to take you all on a geeky journey into the historical past of keyboard apps—or enter technique editors (IMEs), as they’re formally referred to as. IMEs are so ubiquitous and elementary immediately that it’s simple to neglect how a lot onerous work was put into their creation. And so they’re an enchanting instance of how improvements can bridge the hole between the digital world and the true world.

Within the ’80s, there was no method of processing Chinese language characters with the non-public computer systems in the marketplace. Even after the laborious means of digitizing Chinese language characters to be displayed on pc screens, a giant query remained: How do you kind these characters? Notably, how do you match the tens of 1000’s of Chinese language characters to the 26 letters on a QWERTY keyboard?

The primary try was vastly completely different from the keyboard apps immediately, and centered on how Chinese language characters are written.

In August 1983, precisely 40 years in the past, a Chinese language engineer named Wang Yongmin developed the primary widespread strategy to enter Chinese language characters into a pc: Wubi. He did it by breaking down a Chinese language character into completely different strokes and assigning a number of strokes to every letter on the QWERTY keyboard.

A diagram of how Wubi uses the QWERTY keyboard.
The diagram above exhibits how every secret’s matched with three to 12 character parts. The texts at backside are poems to assist customers bear in mind the mixtures.

For instance, the Chinese language character for canine, 犬, has a number of shapes in it: 犬, 一, 丿, and丶.These shapes had been matched with the keys D, G, T, and Y, respectively. So when a consumer typed “DGTY,” a Wubi enter software program would match that to the character 犬.

On the left are the Chinese character 犬 and its phonetic spelling; on the right is a guide on how to type the character in Wubi.
A information on how the character 犬 must be typed into Wubi software program.

Wubi was capable of match each Chinese language character with a keystroke mixture utilizing at most 4 QWERTY keys. It’s thought of one of many quickest methods to kind Chinese language, however the draw back can also be fairly apparent: customers have to memorize which keys correspond to which strokes, so the educational curve is kind of steep. (A technique individuals have remembered the keyboard designations? Jingles!)

The subsequent step within the evolution of Chinese language IMEs was the invention of typing by phonetic spelling.

It could be onerous to consider, however pinyin, the trendy method of spelling every Chinese language phrase in a standardized Latin alphabet, was solely created within the 1950s. Within the ’80s and ’90s, China began to experiment with educating youngsters pinyin in class earlier than educating them methods to write Chinese language characters. One end result was that pinyin grew to become a better and extra extensively accepted strategy to match Chinese language characters to the Latin letters on a keyboard.

To stay with the instance of the character 犬 (canine), its pronunciation was standardized as quǎn, so typing Q, U, A, N on the usual keyboard would get you this character in your display screen. 

A lot of pinyin-based IMEs had been invented within the ’90s. Essentially the most outstanding was Zhineng ABC, developed in 1993 by Zhu Shoutao, a pc science professor at Peking College. After Microsoft built-in Zhineng ABC as one of many default IMEs in Home windows PCs, it grew to become essentially the most extensively used one within the nation.

However typing by pinyin additionally has its issues: dozens or a whole lot of Chinese language characters can share the identical phonetic spelling. When you kind QUAN, the pc has no strategy to inform which of 81 characters is the one you need.

A list of all Chinese characters with the spelling quan.
There are at the least 81 Chinese language charactershat are spelled quan.

So each time you typed a phrase in Zhineng ABC, you continue to wanted to pick out the right character from an extended record of potential candidates.

A screenshot of the window in Zhineng ABC program, while typing the word "Zhineng ABC."
How Zhineng ABC displayed phrases for customers to select from.

Fortunately, they had been all the time displayed in the identical order, that means you’d begin to bear in mind the place characters you ceaselessly used appeared within the little window. 

I can affirm this, as I discovered to kind with Zhineng ABC. The final character in my identify is 毅, spelled yi; and yi occurs to be the sound with essentially the most potential matches in Chinese language, with a whole lot of characters spelled the identical method (thanks, Mother and Dad). It was etched in my thoughts that after I needed to kind 毅 in Zhineng ABC, I wanted to scroll to the fourth web page and select the sixth possibility.

Clearly, that’s not environment friendly. Actually, it’s really slower to kind in Zhineng ABC than in Wubi. However the subsequent era of keyboard apps shortly surpassed its predecessors.

In 2006, Sogou was launched, primarily combining the inspiration of pinyin typing and the tech of a search engine. Simply as serps advocate content material that’s closest to what individuals are asking about, keyboard software program can predict what customers might wish to kind. 

With Sogou, the candidate characters and phrases are now not displayed in a everlasting order; the order modifications based mostly on a consumer’s typing historical past and what’s within the information. For instance, now that I’ve typed 毅 a number of occasions on this e-newsletter already, Sogou remembers that and places it on the high at any time when I kind yi.

Many different progressive IMEs had been invented across the similar time as Sogou. Some tried to mix the strategies based mostly on shapes with these based mostly on spelling. Others enabled customers to write down a Chinese language character instantly on the machine, since trackpads and contact screens had been coming into use.

However over time, these strategies had been slowly given up in favor of the way more environment friendly typing in sensible keyboard apps like Sogou, which grew to become the inspiration of how Chinese language individuals work together with applied sciences and one another. 

They grew to become a necessity for individuals’s on a regular basis lives—however this sadly opened everybody to a higher safety danger. Even when extra individuals knew about these vulnerabilities, it’s onerous to think about Chinese language customers would ever ditch the apps; as an alternative, possibly it’s time customers begin demanding higher safety practices and extra transparency from these corporations. 

(There are various extra fascinating facets to the historic relationship between the Chinese language language and know-how. For instance, individuals in Taiwan and Hong Kong have developed their very own methods of typing Chinese language characters. For a fantastic introduction, I’d advocate the guide Kingdom of Characters by Jing Tsu, a professor of East Asian languages and literature at Yale.)

What else do you wish to learn about Chinese language keyboard apps? Ask me any questions at

Meet up with China

1. A landmark settlement between the US and China to cooperate on science and know-how is about to run out on August 27 after being in impact for 44 years. Its finish would deal a heavy blow to the way forward for scientific analysis. (Wall Avenue Journal $)

2. Xiong’an, the Chinese language metropolis close to Beijing that’s being constructed as a flagship sensible metropolis, is experiencing notably devastating rain this summer time, leaving some individuals to marvel if the selection of location was a mistake. (CNN) 

  • Simply how dangerous was the rain in and round Beijing? One county recorded 1.6 years’ price of rain in simply three days. (Reuters $)

3. Huawei will present surveillance techniques for the Taliban to put in throughout Afghanistan. (Kabul Now)

4. To stability the growing demand for burial house and the declining provide of land, Beijing is popping its cemeteries vertical and digital. (Bloomberg $)

5. A Chinese language artist is re-creating the previous homes demolished within the nation’s modernization course of, one miniature at a time. (New York Instances $)

6. One Chinese language AI-powered chatbot allowed customers to create an excellent companion to speak to day-after-day. When the app went out of enterprise, the customers had been heartbroken. (Remainder of World)

7. Dozens of Chinese language corporations are growing their very own model of “miracle” weight-loss medication like Wegovy which might be widespread within the West. (Monetary Instances $)

8. American intelligence businesses issued a warning that their Chinese language and Russian counterparts are actually focusing on house corporations and their workers. (New York Instances $)

Misplaced in translation

Through the peak of the pandemic, virtually each Chinese language province was constructing 方舱 (fangcang), makeshift hospitals the place covid sufferers had been quarantined. So what occurred to them? Reporters on the Chinese language publication Southern Weekly combed by way of a whole lot of presidency procurement reviews throughout the nation and located that native governments are spending tens of millions of {dollars} to dismantle or repurpose them—or, in some circumstances, to construct extra of them.

Not less than 4 makeshift hospitals are being shut down and the land returned to its unique use, and the development of 5 new ones has been halted. Tools and building supplies from these hospitals are actually being resold on-line at low costs. In the meantime, 24 current hospitals are being reworked into everlasting medical or illness prevention facilities. However there are 10 new hospitals nonetheless being constructed, with a complete funds of $17 million. One potential rationalization is that the native governments’ annual budgets had been already set at the start of this yr to cowl the development of fangcang.

Yet another factor

How sensible can and ought to a public restroom be? At Shanghai’s Hongqiao railway station, a giant display screen shows real-time details about which stalls and urinals are occupied and which aren’t. I perceive the concept is to information a passenger to an empty spot quicker, however hear me out—possibly not every thing must be “smartified.” 

A big blue screen at the Shanghai Hongqiao railway station displaying which restroom stalls and urinals are available currently.

Leave a Reply

Your email address will not be published. Required fields are marked *