Publication Date: 2017-Mar-03
Disclosed is a system or method which enhances the available metadata in form of Tags/transcripts of the audio, video contents in a way that the search engines can consume it in optimized way to provide accurate results.

Enhanced search engine optimization technique for audio video content

Data on internet comprises of text, video and audio content, but searching of the 

content is majority done using text based search. Searching of Audio Video

content across the internet relies on the textual metadata associated with the

Audio Video content.

With advancement in technology, high network bandwidth are available to 

general public allowing to publish content using Audio and Video media. While

audio and video media is increasingly used for content creation across the web,

the search still relies on the textual meta data associated with the audio video


The easiest and obvious way to do this is by apply Speech to text analysis and 

adding the transcript as metadata to the Audio Video media content. This is a

great way to feed search engines with relevant information to rank and find the

Audio and Video content.

In addition to this additional search tags are also associated with the audio video 

content on web for enhancing the ranking of the audio video content in search.

Said that, Audio and Video content has tremendous advantage over the Text 

content and audio video content carries more meaning then its textual counter

part. It has more implicit meaning, that cannot be merely captured through

Search to Text conversion. This aspect of AV content is not yet explored enough

to enhance the metadata. Utilizing this aspect of Audio Video content can

increase the relevance of search for Audio Video content.

With advancement of audio and video analysis, extracting characteristics like 

audio tone, facial expression and body language gesture has become possible.

To utilize the enhanced meaning associated with the Audio Video, Audio

Video analysis can be done to extract the emotion (extracted from audio

tone, facial expression and body language) from the Audio Video content

and utilized to enhance the metadata associated with the content for

enhanced search of Audio Video content.


How does it work -

The proposal is to enhanced relevance of search metadata of audio video content 1.

using audio video content analysis.

