TensorFlow Serving

GitHub Support CommunityModel serving and monitoring

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.

Features

Can serve multiple models, or multiple versions of the same model simultaneously
Exposes both gRPC as well as HTTP inference endpoints
Allows deployment of new model versions without changing any client code
Supports canarying new versions and A/B testing experimental models
Adds minimal latency to inference time due to efficient, low-overhead implementation
Features a scheduler that groups individual inference requests into batches for joint execution on GPU, with configurable latency controls
Supports many servables: Tensorflow models, embeddings, vocabularies, feature transformations and even non-Tensorflow-based machine learning models

Official website

Link

Tutorial and documentation

Click here to view

Montreal

1275 Av. des Canadiens-de-Montréal,

Montréal, QC H3B 0G4

Canada

Los Angeles

312 Arizona Ave,

Santa Monica, CA 90401,

USA

Dubai

Gate Avenue Zone D at DIFC – Sheikh Zayed Road

Dubai, United Arab Emirates

Doha

1 Al Corniche St, Burj Doha, level 21,

Doha, Qatar