Browse Prior Art Database

A Method and System for Detecting Clickbait Article by using Informality Features in the Article

IP.com Disclosure Number: IPCOM000249791D
Publication Date: 2017-Apr-05
Document File: 3 page(s) / 19K

Publishing Venue

The IP.com Prior Art Database

Related People

Prakhar Biyani: INVENTOR [+3]

Abstract

A method and system is disclosed for detecting a clickbait article by using informality features in the article. The method and system trains a machine-learned model that uses informality features of an article to predict likeliness of the clickbait article.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

A Method and System for Detecting Clickbait Article by using Informality Features in the Article

Abstract

A method and system is disclosed for detecting a clickbait article by using informality features in the article.  The method and system trains a machine-learned model that uses informality features of an article to predict likeliness of the clickbait article.

Description

Disclosed is a method and system for detecting a clickbait article by using informality features in the article.  The method and system trains a machine-learned model that uses informality features of an article to predict likeliness of the clickbait article.

The method and system identifies a clickbait article by sampling a large set of webpages.  The set of webpages sampled can be annotated as clickbait if the informality features of the articles match with predefined definitions of different types of clickbait. 

The different types of the clickbait may include, but not limited to, exaggeration, teasing, inflammatory, formatting, graphic, bait-and-switch, ambiguous, factually wrong and the like. 

Here, the exaggeration clickbait is when a title exaggerates content on landing pages of articles.  Similarly, the teasing clickbait is when details are omitted from an article’s title to build suspense.  The inflammatory clickbait is when an article includes inappropriate/vulgar words.  The formatting clickbait is when there is an over usage of capitalization / punctuations in articles. 

Further, the graphic clickbait is when there is a salacious / disturbing / unbelievable subject matter in articles.  The bait-and-switch clickbait is when things or content promised or implied from a title of an article is not on a landing page of the article.  The ambiguous clickbait is when an article’s title is unclear / confusing and the factually wrong clickbait is defined, if articles are plain and incorrect.

Subsequently, the method and system extracts headline and readable part from body of each webpage from the set of webpages and computes the informality features, content features and similarity features to identify the clickbait article. 

The informality features from the readable part are computed using Coleman Liau score, Formality measure, LIX index, RIX index, Number of slang and

Number of swear words.  Here, the Coleman Liau score is computed as, where L is the average number of letters and S is the average number of sentences per 100 words. 

Similarly, the Formality measure is computed as

The LIX index is computed as, where W is the number of words, LW...