By Muralidhar Sridhar
Global Head of Product Management, PFT
A spotlight today is rightly thrown on how GenAI can leverage a multimodal AI analysis to meticulously dissect video content, detect subtleties from emotional undercurrents to thematic elements like romance and action sequences. Furthermore, it can be utilized to identify and convert complex components such as actor entries, musical themes, and landmark dialogues into textual metadata. GenAI models can be used to apply retrieval augmented generation (RAG) and chain of thought reasoning to craft context-aware synopses, ensuring that each scene is not only understood in isolation but also in relation to its narrative arc.
Beyond mere analysis, GenAI can be extended to generating profound insights and trivia, offering creative suggestions for social media engagement, and suggesting appropriate hashtags, empowering content creators and marketers to create compelling promotional content like trailers and highlight reels, enhancing content visibility and audience engagement.
In recent POCs with industry giants and sports bodies, we witnessed significant benefits of GenAI enrichment of content metadata that was hitherto very challenging to achieve. The ability to get insights, suggestions, recommendations, and analysis has greatly benefited our work. We can generate new content, clips, cutdowns, highlights, marketing posts etc. and this powers higher creative enablement, efficiencies, and greater monetization opportunities.
If you are struggling to find the right content, it’ll be even more challenging for you to repurpose or monetize it effectively. It’s become more important than ever to find, use and re-use content in the library and content arriving fresh into the library to work on various downstream use cases.
To find the right content, we need three key features:
Traditional methods of content discovery have involved manual metadata tagging, that may not be deep enough* on a per clip or shot basis covering complete multimodal capture from visual to audio transcript.
Discovery and search have been restricted to typical search engines which are not equipped with modern AI technologies that can perform multi-dimensional semantic search.
The solution involves the following key components:
Multimodal and Generative AI’s potential to revolutionize content discovery
A comprehensive multi modal discovery of content that can tag metadata deeply at every frame, shot, clip and scene level can build a solid foundation for discovery. Through all audio and visual facets and other inferences in context, it can enable the use of PFT’s patented Machine Wisdom to summarize findings in context of dimensions of time and a union of facets discovered.
Discovery facets encompass a range of elements such as named people, objects, keywords, transcripts, emotions, on-screen text, and more. Higher level discoveries like Compilations, Key moments (patented by PFT), Key Dialogues (patent for PFT pending), Romance, Action, Medical, Landscapes, etc. Extraction of key thumbnails / stills and their description are captured.
Using GenAI, we can auto-create synopsis at a clip, scene and asset level. This involves chaining discovered contents and insights on a per frame basis and per shot basis into clips, summarizing and then rolling it up to scenes and then the whole asset. At each clip or scene level, the summarization needs to be done in context of previous set of summarized clips and the current clip as a sliding window analysis all through the length of the video.
Propose scripts for trailers and promos:
Propose storylines for social media posts:
The intelligently captured metadata is vectorized using LLMs and stored in an Index of smart vector enabled AI search engines. At every clip level, based on multi modal identification of content, a document that can be vectorized in several dimensions is created. This document per clip is then vectorized and stored in a vector search engine that can store and find semantic similarity between the key and data stack.
A conversational search is the most appropriate use case to be enabled on top of this rich metadata. This is enabled using RAG frameworks. The search query is first classified and analyzed to check what part of the query is more relevant for search. This part is then vectorized and looked up in the Vector search using KNN and such similarity search algorithms.
After retrieving the search results, the context of the search and any preceding context stored in the thread are analyzed using LLMs. Subsequently, the results are re-ranked and reformatted. A response to the user based on the identified results, is created, and shown to the user.
While the results identified could be just metadata, the engine translates this into Video clips and presents this as a set of video clips, the key metadata that matched the links to the catalogue of the clip and the actions that can be taken on them.
During a conversational search, the user may ask for information that may not be readily available through a search. For example, the user may want us to describe a piece of content in 20 lines or perform an action of translating the content using Generative AI into another language.
The solution will apply RAG to solve this as follows:
The applications of GenAI-enabled discovery and its downstream use cases can be classified into three key types of benefits and in three key verticals.
The key areas where its use cases would be very relevant:
Content creation: Content re-use. Search clips and content to find re-usable clips for new content creation.
Example would be show all the clips where someone says a “Yes” or “I do” at a wedding, or show clips of all skylines and landscapes.
Social media monetization: Create scene lifts, cut downs and show all the romantic clips of this movie. Find clips, pick up key moments and assemble them and speed up the social media distribution. Get recommendations on clips in the library from social media
FAST: Create interesting compilations for monetization. Create automatic highlights or stories of the content.
Content marketing: Discover and get recommended highlights, story lines, promo briefs and ideas. Search and pick up clips to build social media promotion. Create GenAI assisted promos and trailers for FAST channels on scale
Content monetization: GenAI-enabled discoveries of automatic ad slots and contextual ad keywords for the ad slots.
Gen AI discovery enabled social media monetization of content clips like scene lifts
This approach to content metadata enrichment, discoverability, searchability, convenience, speed and scale enables faster and better content creation, management, marketing and monetization. Enterprises can use it to gain a significant leap in creative enablement, operational efficiencies, and monetization.