ccls Logo

Natural Language

 

Projects & Clients

category

Electric Power Grid

Computational Biology and Bioinformatics

Machine Learning Basic Research

spacer

Other Applications of NLP

Senior Team Members

category

CADIM: Columbia's Arabic Dialect Modeling Group

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Top

 

 

Arabic Dialect Modeling Project. Natural Language Processing builds bridges between people and provides the tools to communicate, not fight. Like High and Low German, each Arabic-speaking country has developed a local dialect with grammatical structures, accents and even words and meanings that are very different from traditional classical Arabic. The regional dialects can be difficult to understand, even for native Arabic speakers from other regions. The CCLS Natural Language Processing team is harnessing the power of machine learning and Arabic computational linguistics to build tools and resources for dialectal Arabic, using multilingual computational lexical semantics.

The Arabic language is actually a collection of dialects from different geographic regions with important phonological, morphological, lexical, and syntactic differences. These spoken dialects are not officially written. However, throughout the Arab world, written and official spoken communications in government and the media, only Modern Standard Arabic (MSA) is used. MSA is based on Classical Arabic and is itself not a native spoken language. This situation has important negative consequences for Arabic automatic speech recognition (ASR) and natural language processing (NLP). Experience has shown that using MSA text for language models is ineffective in understanding or improving dialect ASR.

The Center’s Arabic Dialect Modeling project has the potential to improve the quality of ASR for Arabic dialects and, more generally, to increase the understanding of how closely related languages can be modeled formally. All work on the Arabic Dialect Modeling project is open to the public and the developed NLP tools for Arabic dialects will be made available to the research community. As the tools are refined, the project may provide future benefits not only to corporations doing business with partners throughout the Middle East, but also to publishers, universities, libraries and cultural organizations that want to include regional contributions – and a positive pride in those contributions - as they strengthen awareness of classical Arabic poetry, philosophy and scientific texts that were high points in human civilization.

Other Applications of NLP. Although the language and security applications for NLP are obvious, the analytic techniques used in Natural Language Processing also have unexpected applications to projects that at first do not seem to involve a language or linguistics. For example:

  • Financial markets depend on accurate understanding by financial analysts of opinions around the world that may not always be expressed in standard language.
  • Medical applications include the facilitation of clear, accurate translation between English speaking doctors and patients who speak Arabic with a colloquial dialect, particularly when the patient may be a child who cannot speak formal standard Arbabi
  • NLP algorithms can be used to deal with alternative abbreviations and geographical coordinates. For example, the Center’s work with a large public utility includes thousands of entries listing street names or locations of manholes. This information is a vital part of understanding and predicting which electrical components under which manholes have failed or, depending on location, weather, age and other conditions are most likely to fail, but old data archives may record the address in different forms. Natural language processing algorithms are needed to ensure that 210 West Main St, 210 W, Main, Corner of West Main and Oak Street, and West Main – 210.

We invite you to contact us to learn about our work in more detail or to discuss your challenges. All our research is open to the public and we are always interested in considering new challenges. Our team is already thinking about ways to apply their work to Spanish, Chinese, and other languages.

Senior Team Members:Owen Rambow, research scientist and associate research scientists, Nizar Habash, Ph.D., Computer Science, and Mona Diab, Ph.D. Computational Linguistics. To meet other members of Columbia’s Arabic Dialect Modeling group (“Cadim”), go to http://ccls.columbia.edu/cadim.

 

language

 

natural language processing