RTL荷兰文依赖于厚皮类动物的可扩展, Data-Driven Machine Learning Pipeline to Make Broadcast Video Content More Discoverable

RTL荷兰文, 是欧洲最大广播集团的一部分, wanted to use artificial intelligence (AI) to make video content more valuable and discoverable for millions of subscribers. 厚皮类动物提供了数据驱动的机器学习(ML)自动化, scale and reproducibility the team needed to work with massive amounts of unstructured video data.

复杂AI的模块化方法

荷兰RTL电台每天向数百万电视观众广播, along with delivering streaming content that garners hundreds of millions of monthly views online. 它的父, RTL集团, 是欧洲最大的广播公司,也是贝塔斯曼的一部分, 世界上最大的媒体集团之一.

荷兰RTL的一个关键增长指标是收视率, but optimizing the value and discoverability of video assets is an extremely labor-intensive endeavor. 这使得自动化的时机成熟了, and the team applied machine learning to optimize key aspects of its video platform, 比如创建缩略图和预告片, 为那些预告片挑选合适的缩略图, 并在视频流中插入广告内容. The right thumbnail might not seem crucial but it makes all the difference in the world to whether someone clicks or passes a video by forever.

“We want to use artificial intelligence to make sure we optimally apply our human intelligence, 这样必赢网站网址的团队才能更有创造力,更紧密地联系在一起,Vincent Koops说, 荷兰RTL高级数据科学家. “The problem is that it’s not easy to apply AI to managing unstructured content; these content operations are computationally complex, 从数据科学的角度来看,视频人工智能具有挑战性.”

The team solved this problem by breaking complicated tasks into simpler subtasks, eliminating larger task-specific models in favor of an assembly of reusable modules. This modular approach to machine learning allowed the company to train the AI on various elements of the video stream across visual (frame extraction, 镜头分割, 面部识别), 音频(标签, 语音识别, 音乐类型)和文本(语言检测, 关键短语)子任务.

然而,这导致了下一个大挑战:

如何将这些单独的ML管道编织在一起来解决更复杂的任务?

于是研究小组转向厚皮动物.

必赢彩票网

厚皮类动物 provides the data layer that allows machine learning teams to productionize and scale their machine learning lifecycle. 凭借厚皮类动物行业领先的数据版本控制, 管道和血统, 团队获得数据驱动的自动化, pb级可伸缩性和端到端可重复性. 对RTL荷兰文, 厚皮类动物 was the key to combining and orchestrating the various subtasks into a unified way to process videos at scale. 不仅如此,视频处理也是资源密集型的. 厚皮类动物’s incrementality allowed the team to only process new videos as they arrive or change, 而不是从头开始再加工. 这为他们的方法带来了巨大的速度,节省了时间和金钱.

“厚皮类动物’s pipelines are the building blocks that enable us to solve these complex problems,”瘦身达人. “Effectively applying AI to just one of these tasks can save over a thousand hours a year,”他解释说.

RTL荷兰文 also needed to track metadata and content extracted from video streams to assess and improve the effectiveness of its AI models. 厚皮动物的不可改变的血统确保了这种端到端的重现性. 更重要的是, 厚皮类动物允许团队倒回代码的旧版本, 数据或模型, so if a new thumbnail or clip wasn’t performing as well as a previous version they could roll back the change to the more high performance versions.

最后, 视频数据是拍字节级的数据, which made 厚皮类动物’s ability to scale to multiple petabytes crucial to meeting RTL荷兰文’ goals. 厚皮类动物’s inherent parallel processing and code agnosticism allowed the team to handle a huge volume of video without code changes, 而其可伸缩的数据版本化优化了存储和计算成本.

“We’ve tried solutions such as Argo Workflows or DVC that delivered some of the same capabilities, 但是切换不同的工具很麻烦. The benefit of 厚皮类动物 is that all of these features are highly coupled in one coherent platform, 哪个对必赢网站网址很有用.”

文森特瘦身达人
荷兰RTL高级数据科学家

协调ML以解决厚皮动物的复杂挑战

与厚皮类动物, RTL荷兰文 can easily combine various AI models to solve much more complex video challenges, 例如确定AD插入的最佳点, 或者选择一个引人注目的缩略图剪辑.

确定广告插入是一件棘手的事情. 业务规则决定广告频率, but just playing an ad at predetermined intervals will disrupt dialogue or ruin a key scene, 严重降低观看体验. 在理想的情况下, the ad should appear between scene and dialogue transitions – something the team at RTL荷兰文 has trained its AI to recognize.

在这种情况下, 视频存储在Azure的云中, S3或类似服务, 并导入厚皮类动物的视频库中, where pipelines extract the information necessary to detect an ideal ad insertion point, 遵守关于AD频率的业务规则, 内容限制, 和更多的. Boundary detection determines shot transitions with high probability based on frame-to-frame changes in color histograms. 最后, speech detection ensures a break in the dialogue so that speakers aren’t cut off mid-sentence by an unexpected ad. 组合在一起, all these models allow automated ad insertion without degrading the viewing experience.

“We’ve tried solutions such as Argo Workflows or DVC that delivered some of the same capabilities, 但是切换不同的工具很麻烦. The benefit of 厚皮类动物 is that all of these features are highly coupled in one coherent platform, 哪个对必赢网站网址很有用.”

文森特瘦身达人
荷兰RTL高级数据科学家

Koops notes that this approach – creating simple subtasks that are orchestrated through 厚皮类动物 to solve complex challenges – can apply across domains. It allows the company to use or adapt publicly available AI models to its unique needs. “This sort of orchestration works for any task that can be distilled down into smaller components; it’s much more efficient and flexible than creating monolithic models, and with 厚皮类动物 it’s easy to recombine the components to solve other interesting challenges.”

对RTL

RTL Nederland is a 100% subsidiary of RTL集团, Europe’s largest TV, radio and production company. “RTL集团”为75.大型国际传媒集团贝塔斯曼(Bertelsmann)持有1%的股份. Sister companies of RTL集团 are the publishers Penguin Random House and Gruner + Jahr, 音乐公司BMG和客户服务提供商Arvato.