数字推理 is a communication analytics company that uses machine learning and AI to help its customers address some of the world’s toughest challenges. With customers ranging from law enforcement and intelligence agencies to leading healthcare institutes and top financial firms, the company uses data and powerful machine learning models to 发现欺诈, 检测癌症, combat human trafficking. As they set out to explore what a next-generation architecture might look like to address its clients’ complex, 高风险的需要, 数字推理 explored 厚皮类动物 to see how version-controlled data and containerized data pipelines would help them achieve explainable, 可重复的, scalable data science.
To deliver highly effective machine learning solutions to its customers, 数字推理 must process large volumes of disparate, 无组织的, seemingly unrelated information that constantly changes. 吉米·惠特克, Manager of Applied Research, his team use this data to develop complex models that detect key patterns and information in the sprawling array of intangible connections between people, 的地方, 和事件. The team must constantly balance the opportunity cost between agility at scale with the overhead of communications when they collaborate with clients. They need an architecture that can deliver machine learning models that are both explainable and easily reproducible.
During an internal hackathon, 惠特克, a group of 数字推理 developers, an intern set out to build the next-generation architecture for the company’s deep learning workloads. The team sought to accomplish two goals: (1) find a way to make their constantly changing data behave in the same version-controlled manner as code and (2) use the latest scalable infrastructure possible. The 数字推理 team selected Kubernetes to address scalable infrastructure. For its data science platform, it chose 厚皮类动物.
Using 厚皮类动物 and Kubernetes, the team built scalable, 可重复的, explainable data science pipelines in just one day. 厚皮类动物 enables the team to continuously ingest its constantly changing data end-to-end — with complete provenance and without sacrificing agility.
While the team initially set out to build just one pipeline, by the end of the hackathon, they had multiple pipelines set up for different use-cases. For their audio research use case, they built an end-to-end pipeline that would analyze audio files all the way through the transcription process, output the transcripts, then apply some of the natural language processing components that they were working on onto the output of the transcripts — and so on all the way through to the inference testing. Because things were going so smoothly they even expanded into building pipelines to image analysis. And it didn’t end there.
惠特克’s team took the new architecture a step further, integrating Jupyter notebooks into the process so its research engineers and data scientists could easily apply changes to any point of their pipeline and watch the impact in real time as 厚皮类动物 automatically implemented those changes. “厚皮类动物 allows us to look at and component-ize the entire pipeline of analytics and transformations we run. For complex systems, this is incredibly useful to understand the big picture before jumping into the code. We can then easily dive into a specific component to address the needs of a project we are working on.惠特克说:“.
厚皮类动物 helped 数字推理’s data scientists unearth new insights and carry out rapid experimentation without sacrificing speed or functionality. “There’s always a tension between agility, 可解释性, reproducibility, but 厚皮类动物 makes that tension manageable” 增加了惠特克. 厚皮类动物 enables the company’s data scientists to efficiently overcome obstacles, handle data divergence, generate reproducible outputs. With billions of dollars — and even lives — at stake, 数字推理 is always on the lookout for new ways to build accurate models for its clients that inform smart decisions using the best architecture and platforms available.请求一个演示