波音游戏-波音娱乐城赌球打不开

Tracking modern Chinese language with LIVAC

 

Which individuals in the Chinese speaking communities of Hong Kong, Taiwan, and Beijing have had most media exposure over the last two weeks? Which words were most frequently used? You may think these are questions to which there are no definite answers, only subjective guesses. But in fact these and other precise and statistics-based answers are only a click away in the Synchronous Linguistics Variation in Chinese Speech Communities (LIVAC) Corpus (www.rcl.cityu.edu.hk/livac/sample), developed by the Language Information Sciences Research Centre (LISRC), a CityU University Research Centre. 

The three key indices of the LISRC: "Celebrity Roster", "Place Name Rank", and "Common Word List", were compiled from the Synchronous LIVAC Corpus. First launched in 1994 by LISCR Director and Chair Professor of Linguistics and Asian Languages, Professor Benjamin T'sou , the LIVAC Corpus is one of the Competitive Earmarked Research Grants projects supported by Hong Kong's Research Grants Council.

A ten-year research project

Since July 1995, the LIVAC database has been regularly compiled with linguistic data from the major newspapers and electronic media from six Chinese-speaking communities: Hong Kong, Taiwan, Beijing, Shanghai, Macau, and Singapore. Words and phrases are first automatically selected by computer and then manually proofread and categorized. From this, a database composed of the linguistic structure-Character, phrase, sentence, and text-is constructed. This database is very useful for linguists and people interested in exploring linguistic phenomena, social organizations, culture and other developments in Chinese communities.

In early 2001, the size of the corpus exceeded 70 million characters and 400,000 phrases. It is continuously expanding. Currently, the part of the corpus database that has been put on the web comprises approximately 16 million characters and 190,000 phrases. It consists mainly of linguistic data compiled from July 1995 to June 1997. According to the LISCR schedule, the database will be expanded and renewed until June 2005. The total number of characters and phrases compiled at the end of the project is estimated to be 100 million and 600,000, respectively.

A Chinese language time capsule

"The corpus is like a time capsule, capturing the social, cultural, and linguistic developments of the six Chinese speaking communities within a decade," Professor T'sou explained, "This provides valuable primary research materials for linguists and those interested in studying Chinese societies." One of the many important objectives of the corpus is to explore in depth the dynamics in the development of modern Chinese vocabulary. This includes examining the origins and subsequent forms of new-concept words, the development of meaning in words, the transference of old phrases, and phrases with local colour.

Guess how many common Chinese translations can be found for the term "Internet" in the six targeted communities? According to LIVAC records between 1995 and 2000, there are at least 13 and the most frequently used translation varies between the different Chinese speaking communities. For instance, in Hong Kong"" (pronounced hu lian wang in Putonghua) is often used; in Taiwan, "" (wang ji wang lu); in Singapore, "" (wang ji wang luo); in Macau, ""(hu lian wang luo); and in Shanghai and Beijing, "" (yin te wang).

Professor T'sou said, "The Chinese language is diverse, not a single entity. It carries different local colour in different communities. People often criticize the Chinese written language used by young people in Hong Kong as being mingled with Cantonese colloquial expressions. This is in fact a value judgment. The same language of the same locale develops differences over the passage of time. Language never stops evolving. The corpus lets us see the developments and variations of modern Chinese language in different Chinese communities over the last 10 years."

Unlimited application potential

The process of building the database is long, laborious and tedious, similar to "cultivating a barren continent" or "moving a huge mountain", Professor T'sou said. "However, when the task is completed and the result is a 'feast' to be shared by all who are interested, we forget about the hardship and feel rewarded."

 

Apart from academic research, a database with a huge linguistic corpus, with built-in search and statistical functions, has enormous potential for application. It is increasingly common now for Hong Kong's law courts to use Cantonese, and the Synchronous LIVAC Corpus can be used in the process of recording litigation. Mobile phones designed for Chinese input also need to be supported by a huge linguistic database. In fact, as Professor T'sou pointed out, some network and IT product development companies, such as the Japanese telecom giant NTT, Hong Kong's leading web content provider, tom.com, and a subsidiary of AOL have already started applying the LIVAC database.

 

YOU MAY BE INTERESTED

Contact Information

Communications and Institutional Research Office

Back to top
大发888官网下载 官方| 视频百家乐官网赌法| 大发888下载安全的| 威尼斯人娱乐城信誉lm0| 好运来百家乐官网现金网| 迪威百家乐赌场娱乐网规则 | 威尼斯人娱乐网假吗| 百家乐官网注码调整| 百家乐开放词典新浪| 大发888冲值| 游戏房百家乐官网赌博图片| 百家乐大娱乐场开户注册| 百家乐官网如何看面| 免费百家乐官网过滤| 大发888虎牌官方下载| 跨国际百家乐官网的玩法技巧和规则 | 缅甸百家乐官网网络赌博解谜 | 富田太阳城二期| 百家乐官网技巧头头娱乐| 总玩百家乐有赢的吗| 德州扑克怎么分钱| 百家乐官网龙虎台布| 大发888娱乐场客户端下载| 百家乐官网视频赌博| 678百家乐官网博彩赌场娱乐网规则 | 十六浦娱乐城官网| 百家乐官网扑克片礼服| 六合彩开奖查询| 新锦江百家乐娱乐| 最佳场百家乐官网的玩法技巧和规则| 24山水口决阳宅| 六合彩网上投注| 3U百家乐官网的玩法技巧和规则 | 大发888真钱游戏注册| 立即博百家乐官网的玩法技巧和规则 | 百家乐最佳注码法| 冠军百家乐官网现金网| 太原百家乐的玩法技巧和规则| 百家乐官网一般多大码| 百家乐投资| 百家乐赌博论坛在线|