Blog

Generative AI Enhanced Content Discovery

By Muralidhar Sridhar
Global Head of Product Management, PFT

August 26, 2024

A spotlight today is rightly thrown on how GenAI can leverage a multimodal AI analysis to meticulously dissect video content, detect subtleties from emotional undercurrents to thematic elements like romance and action sequences. Furthermore, it can be utilized to identify and convert complex components such as actor entries, musical themes, and landmark dialogues into textual metadata. GenAI models can be used to apply retrieval augmented generation (RAG) and chain of thought reasoning to craft context-aware synopses, ensuring that each scene is not only understood in isolation but also in relation to its narrative arc.


Beyond mere analysis, GenAI can be extended to generating profound insights and trivia, offering creative suggestions for social media engagement, and suggesting appropriate hashtags, empowering content creators and marketers to create compelling promotional content like trailers and highlight reels, enhancing content visibility and audience engagement.


In recent POCs with industry giants and sports bodies, we witnessed significant benefits of GenAI enrichment of content metadata that was hitherto very challenging to achieve. The ability to get insights, suggestions, recommendations, and analysis has greatly benefited our work. We can generate new content, clips, cutdowns, highlights, marketing posts etc. and this powers higher creative enablement, efficiencies, and greater monetization opportunities.


Importance of content discovery in the digital age


If you are struggling to find the right content, it’ll be even more challenging for you to repurpose or monetize it effectively. It’s become more important than ever to find, use and re-use content in the library and content arriving fresh into the library to work on various downstream use cases.


To find the right content, we need three key features:


  • To tag the metadata of the content correctly and comprehensively
  • To have the right technologies that can search for/discover deep metadata
  • To have the appropriate tools and technologies to use this discovered content in downstream use cases.

Challenges in traditional content discovery mechanisms


Traditional methods of content discovery have involved manual metadata tagging, that may not be deep enough* on a per clip or shot basis covering complete multimodal capture from visual to audio transcript.


  • Typical manual cataloguing has focused on title, synopsis, keywords matrices
  • Doing a complete multi modal tagging requires large manual operations that is time-consuming, costly, and unscalable.

Discovery and search have been restricted to typical search engines which are not equipped with modern AI technologies that can perform multi-dimensional semantic search.


  • Searching content deeply inside an archive was limited by text-based search engines
  • The ability to do a semantic search was limited with the metadata
  • Post discovery, getting to the right clip to use in downstream use cases was not easy

The solution involves the following key components:


  • Multi-modal, GenAI-enabled discovery and machine wisdom
  • Vectorization for search
  • Semantic search
  • RAG-based conversational search and actions

Multimodal and Generative AI’s potential to revolutionize content discovery


A comprehensive multi modal discovery of content that can tag metadata deeply at every frame, shot, clip and scene level can build a solid foundation for discovery. Through all audio and visual facets and other inferences in context, it can enable the use of PFT’s patented Machine Wisdom to summarize findings in context of dimensions of time and a union of facets discovered.


Discovery facets encompass a range of elements such as named people, objects, keywords, transcripts, emotions, on-screen text, and more. Higher level discoveries like Compilations, Key moments (patented by PFT), Key Dialogues (patent for PFT pending), Romance, Action, Medical, Landscapes, etc. Extraction of key thumbnails / stills and their description are captured.


Deep insights using GenAI


Using GenAI, we can auto-create synopsis at a clip, scene and asset level. This involves chaining discovered contents and insights on a per frame basis and per shot basis into clips, summarizing and then rolling it up to scenes and then the whole asset. At each clip or scene level, the summarization needs to be done in context of previous set of summarized clips and the current clip as a sliding window analysis all through the length of the video.


Propose scripts for trailers and promos:


  • Based on the summarization, Gen AI LLMs are used to identify scripts for trailers and promos
  • This may involve visual analysis and other multimodal analysis to be put together in the machine wisdom framework to arrive at the trailer and promo ideas and scripts

Propose storylines for social media posts:


  • Based on the analysis done in the content, several attractive mini storylines are created and reformatted to give a promotional appeal
  • Suitable intonation is applied to the storylines to make it attractive
  • Based on the multi modal analysis done, LLMs are used to identify highlight moments in the content
  • Pre-identified key moments, key dialogues and compilations are used to make a higher and more accurate quality of prediction.

Vectorize for semantic search


The intelligently captured metadata is vectorized using LLMs and stored in an Index of smart vector enabled AI search engines. At every clip level, based on multi modal identification of content, a document that can be vectorized in several dimensions is created. This document per clip is then vectorized and stored in a vector search engine that can store and find semantic similarity between the key and data stack.


Conversational search with RAG


A conversational search is the most appropriate use case to be enabled on top of this rich metadata. This is enabled using RAG frameworks. The search query is first classified and analyzed to check what part of the query is more relevant for search. This part is then vectorized and looked up in the Vector search using KNN and such similarity search algorithms.


After retrieving the search results, the context of the search and any preceding context stored in the thread are analyzed using LLMs. Subsequently, the results are re-ranked and reformatted. A response to the user based on the identified results, is created, and shown to the user.


While the results identified could be just metadata, the engine translates this into Video clips and presents this as a set of video clips, the key metadata that matched the links to the catalogue of the clip and the actions that can be taken on them.


Conversational actions with RAG


During a conversational search, the user may ask for information that may not be readily available through a search. For example, the user may want us to describe a piece of content in 20 lines or perform an action of translating the content using Generative AI into another language.


The solution will apply RAG to solve this as follows:


  • Classify the request using LLMs
  • Identify what actions and its parameters are intended by the user
  • Handle the respective use cases appropriately within the system
  • Present the content back to the user

Use cases and benefits


The applications of GenAI-enabled discovery and its downstream use cases can be classified into three key types of benefits and in three key verticals.


  • Creative enablement
  • Operational efficiencies
  • Monetization

The key areas where its use cases would be very relevant:


Content creation: Content re-use. Search clips and content to find re-usable clips for new content creation.


Example would be show all the clips where someone says a “Yes” or “I do” at a wedding, or show clips of all skylines and landscapes.


Social media monetization: Create scene lifts, cut downs and show all the romantic clips of this movie. Find clips, pick up key moments and assemble them and speed up the social media distribution. Get recommendations on clips in the library from social media


FAST: Create interesting compilations for monetization. Create automatic highlights or stories of the content.


Content marketing: Discover and get recommended highlights, story lines, promo briefs and ideas. Search and pick up clips to build social media promotion. Create GenAI assisted promos and trailers for FAST channels on scale


Content monetization: GenAI-enabled discoveries of automatic ad slots and contextual ad keywords for the ad slots.


Gen AI discovery enabled social media monetization of content clips like scene lifts


Conclusion


This approach to content metadata enrichment, discoverability, searchability, convenience, speed and scale enables faster and better content creation, management, marketing and monetization. Enterprises can use it to gain a significant leap in creative enablement, operational efficiencies, and monetization.


WHAT'S NEXT