的弗劳恩霍夫-Gesellschaft, 总部位于德国, is the world’s leading applied research organization. With its focus on developing key technologies that are vital for the future and enabling the commercial exploitation of this work by business and industry, 弗劳恩霍夫 plays a central role in the innovation process. 成立于1949年, the 弗劳恩霍夫-Gesellschaft currently operates 75 institutes and research institutions throughout Germany. The majority of the organization’s 29,000 employees are qualified scientists and engineers. The 弗劳恩霍夫 Institute for Intelligent Analysis and Information Systems IAIS is one of the leading scientific institutes in the fields of Artificial Intelligence, Machine Learning and Big Data. Its speech technology team works on advanced speech technology systems and adapts language models to new domains. If your language model needs to understand intricate medical terminology or complex legal jargon then the 弗劳恩霍夫 IAIS scientists can help.
The biggest challenge for the Speech Technologies team was productionizing and scaling of a working prototype. The initial prototype had monolithic scripts that processed their clients’ data linearly. The scripts included lots of discrete steps and that meant when something went wrong it was difficult to debug. Tracking down which step failed and why something took time.
It was also inefficient as they had to pre-allocate resources to the pipeline. The problem was that not all stages of the pipeline needed the same resources. Some steps were memory intensive while others were CPU heavy. Still others were disk intensive. But because the team processed everything with a single big script, they were stuck pre-allocating resources which could have been leveraged for other tasks. They wanted a system that would allow a quick path to productionizing prototypes, easily scalable and can rightsize the containers based on the workloads they were running and quickly release those resources when they finished their task.
Since data was processed linearly, it meant there were scaling limitations, such as having to reprocess large amounts of data that can take weeks to finish. They needed a technology that would parallelize their data processing and scale it to brand new heights and that’s where 厚皮类动物 came into the picture.
Why 弗劳恩霍夫 IAIS Chose 厚皮类动物
厚皮类动物 delivered a simple way to move from prototype into production, while providing mass parallelization to the 弗劳恩霍夫 IAIS Speech Technologies team.
It allocated only the resources each stage in the pipeline actually required. Since 厚皮类动物 uses Kubernetes on the backend, it can quickly scale up one stage in the pipeline, creating new containers with exactly the right resources, and then release them as soon as that stage finishes.
It also parallelized the data processing itself, slicing up the text and audio data into different chunks. That let the scripts work on different parts of the data at the same time, which dramatically sped up processing time from days or weeks, 到几个小时. Now some customers are getting updated models daily, as the system swiftly pulls in new data, trains on it and delivers a newly updated model. The team doesn’t have to think about how to recreate all the steps correctly. They can put data in one end of the pipeline and output a model on the other end. 如果出了什么问题, it helps to debug a pipeline stage in isolation, where inputs and outputs are well defined in 厚皮类动物’s JSON file.
The 厚皮类动物 platform also gave them tremendous flexibility. Too often platforms lock you into the limited set of tools and languages they support. Maybe they only support Python or R, but 厚皮类动物 supports any tool, 或框架, or library you can package up into a container. That let the Speech Technologies group build multistage pipelines in different languages, 一次性使用Bash, Python in another and C/C++ in a different stage, each language doing what it does best.
The 弗劳恩霍夫 IAIS Speech Technologies team likes 厚皮类动物’s data driven approach. For them, it is a paradigm shift in how to automatically process large amounts of data.
厚皮类动物’s data first approach fits the new needs of machine learning engineers everywhere. 最后, it’s 厚皮类动物’s powerful data driven pipelines and mass parallelization that helps advanced research teams adapt to the new realities of data-driven software development fast.请求一个演示