GraphPipe is a protocol and collection of software designed to simplify machine learning model deployment and decouple it from framework-specific model implementations.
Model serving network protocols are tied to underlying model implementations. If you have a Tensorflow model, for example, you need to use tensorflow’s protocol buffer server (tensorflow-serving) to perform remote inference.
Pytorch and Caffe2, on the other hand, do not provide an efficient model server in their codebase, but rely on tools like mxnet-model-server for remote inference. mxnet-model-server is written in python and provides a json api without batch support. While this is good for simple use cases, it is not suitable for back-end infrastructure.
ONNX exists, but tackles the vendor-coupling problem by standardizing model formats rather than protocol formats. This is useful but challenging, as not all backend model formats have fully equivalent operations. This means a simple conversion doesn’t always work, and sometimes a model rewrite is necessary.
For operators looking to sanely maintain infrastructure, having a standard way for front-end clients to talk to back-end machine-learning models, irrespective of model implementation, is important.
A minimalist machine learning transport specification based on flatbuffers.
Simple, efficient reference model servers for Tensorflow, Caffe2, and ONNX.
Efficient client implementations in Go, Python, and Java.