Browse Prior Art Database

Method of Globalized Emoji Segmentation SaaS Service

IP.com Disclosure Number: IPCOM000248729D
Publication Date: 2016-Dec-30
Document File: 5 page(s) / 100K

Publishing Venue

The IP.com Prior Art Database

Abstract

The core idea defines a method and a framework for supporting globalized Emoji segmentation SaaS Service. The method includes a Emoji segmentation service (SaaS API, Software as a Service Application Programming Interface) integrated with NLP (Natural Language Process) segmentation modules) to segment a national text document input into minimum meaningful units according to different locales (language, region, and codeset).

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

1

Method of Globalized Emoji Segmentation SaaS Service

Emoji (絵絵絵 [えええ] is hot because Emoji phrases, strings of picture characters, have been widely used by 6 billion mobile users in

the world on social media and mobile device. For instance, in 2013, more than 80% short messages contain Emoji phrases in USA. And, 800

Emoji characters have been defined in Unicode.

There are some Emoji related studies, works and application features. A a lot of information searching service providers are supporting

many different cloud based search APIs (such as search engine, deep question and answer, personal information manager, email, social media.

Most operating systems support Emoji. Unicode Consortium adds new Emoji characters every year.

Understanding Emoji meaning is a new task for linguists and globalization support. Also, the most machine learning algorithms and

methods need to get the right meanings of those social method contents mixed with Emoji characters and strings. In addition, cloud search API

has been widely used to allow applications to match search queries against real-time data streams such as social media, bigdata analytics, deep

Q&A, and machine translation.

In the cloud based NLP SaaS services, text segmentation is a foundation of Natural Language Process (NLP). Especially, in East-Asian

languages (Chinese, Japanese, Korean [CJK]), segmentation is the first step for most NLP services (such as text2speech, semantic/syntax

analysis, sentiment analysis etc…). In current Natural Language Process (NLP), widely used and rapidly growing Emoji in the world reduces the

segmentation functionality due to lacking of a reliable Emoji segmentation method because of following factors:

Emoji characters are ignored and filtered because outputs of a Emoji mixed sentence segmentation analysis may mislead,

misunderstand, or even return incorrect results due to Emoji interference.

Emoji on different languages may have different meaning.

Emoji on different languages may have different pronunciation.

Emoji strings may be mixed based on both meaning and pronunciation.

Same Emoji character or/and Emoji string in same language may have different meanings under different user groups in certain time,

2

location, or different circumstances.

The current big problem is: It is hard to retrieve and analyze mixed Emoji phrase contents on data mining and bigdata analytics. And. it

is hard to build and keep a static Emoji Phrase Dictionary to support NLP segmentation function. Therefore, it is necessary to define a method of

globalized Emoji segmentation SaaS service which can provide Emoji segmentation engine.

In the paper, the core idea defines a method od a framework fo...