InnovationQ and the IP.com Prior Art Database will be updated on Sunday, May 19, from 9-11am ET. You may experience brief service interruptions during that time.
Browse Prior Art Database

System and method to generate a movie file from a story telling adapting message feelings to target profile

IP.com Disclosure Number: IPCOM000257032D
Publication Date: 2019-Jan-11
Document File: 6 page(s) / 87K

Publishing Venue

The IP.com Prior Art Database

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 28% of the total text.

Statement – What is the problem, how does it work today.

Sometimes a person wants to describe an idea or an experience he/she had in a way that his/her auditors could appreciate it. He/she could visualize it in his/her mind and he/she could explain it via voice, but a video should explain better than a talk.

For example, I'd like to tell you my latest holidays on the seaside and my audience is not able to hear, it would be nice if my speech could be represented as a movie, maybe with transcript. The ambience could vary according to my feelings in that moment (e.g. if I'm very happy and excited maybe the weather represented is a clear sky, if I'm sad there will be clouds), but sometimes it is not enough. What for me is relaxing, like a solitary seashore, for my auditors could not; maybe they prefer something like swimming or play golf to relax, so the video representing my feelings should be misunderstood.

Another example could be software development. The inventor of the new software/feature could

try to explain that by voice to some stakeholders, that given the different background, could

understand something different from the original idea and starting to like the idea based on their

understanding. If same person describes his/her idea using a video, misunderstanding could be

reduced. This would help avoiding the tireSwing problem (see


Third example could be the possibility to transform into video a police call: I am a witness of a crime and I'm calling the police and from my speech (this time removing emotions), the police could see a video representation of what I see.

Common people does not know too much how to create a video, since it requires a lot of specific knowledge, but they could describe this idea like a "storyboard" by voice.

Solution - Essence of what is proposed - The “What”

What is proposed here is a method and an implementing system to dynamically analyze a speech input, i.e. collected by a microphone, to find common entity and known patterns that describe motion (i.e. animation) and emotion and use these patterns to generate a video file that convert source feelings into audience feelings analyzed thanks to data crawling (i.e. social media).

Our solution exploits existent speech analysis technology and machine learning algorithm to parse input signal and find animation patterns, personality insights and feelings.

Overview – “How” it works: detailed view and advantages of choices made

The system we propose is based on following components:

• A speech transductor, used to collect voice signal

• A GPS chip

• A real-time speech to text converter (e.g. Watson speech to text service)

• A real-time cognitive text analyzer system, trained to identify specific terms that refers to entity, motion, emotions, tone (e.g. a combination of Watson Natural Language Understanding and Tone Analyzer).

• A layer identifier system that, given an image identifies key layers and attaches to the corr...