Hello everyone! I have a TensorFlow Serving related question. I have also posted it to StackOverflow (https://stackoverflow.com/questions/70843006/same-model-served-via-tf-and-tf-serving-produce-different-predictions) but as no-one is answering there i thought to repost it here, maybe any of you has encountered such problems and can provide me some advice.
PROBLEM: prediction values vary in direct call and call to TF-serving (we assume the TF prediction is correct, TF-serving is incorrect)
SETUP: a TF model that uses Keras as backend and being saved as 2 files: model.h5 and architechture.json.
GOAL: predict using TF-serving
CURRENT TRIAL:
- In order to use the model on TF-serving I save the model into Saved_Model format:
model = model_from_json(json_string) model.load_weights(model_path) export_path = './gModel_Volume/3' tf.keras.models.save_model(model, export_path)
there are several approaches to save it that we tried, all produce same saved_models
-
Load the data from image is same for both setups:
X = [] X.append(preprocess_input(img_to_array(load_img(img_url_str, target_size=(260,260))))) X = np.asarray(X)
3a. for plain TF to predict we use:
#load: json_string = open(architecture_path).read() gModel = model_from_json(json_string) gModel.load_weights(model_path) #predict: predicted = gModel.predict_on_batch(X) # predicted = 0.2...
3b. for TF-servin via REST-API to predict we use:
dataToSend = X.tolist() predicted = requests.post('http://tf-serving:8501/v1/models/gModel_Volume:predict', json = {"signature_name": "serving_default", "instances": dataToSend}) # predicted = -1.17...
3c. for TF-servin via gRPC to predict we use:
GRPC_MAX_RECEIVE_MESSAGE_LENGTH = 4096 * 4096 * 3 channel = grpc.insecure_channel('tf-serving:8500', options=[('grpc.max_receive_message_length', GRPC_MAX_RECEIVE_MESSAGE_LENGTH)]) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) grpc_request = predict_pb2.PredictRequest() grpc_request.model_spec.name = 'gModel_Volume' grpc_request.model_spec.signature_name = 'serving_default' grpc_request.inputs['input_1'].CopyFrom(tf.make_tensor_proto(X)) result = stub.Predict(grpc_request,10) predicted = result.outputs['dense_1'].float_val[0] # predicted = -1.17...
As one can see the results (0.2 and -1.17) are completely different. Same saved_model was used in all cases.
Note: TF-serving is run under docker-compose and the service “tf-serving” is using the official tensorflow/serving image
I would appreciate any hits and testing suggestions.
If there are better solutions to run multiple TF models in production than TF-serving (except Cloud-based solutions like SageMaker) please share some knowledge with me.
Notes:
- Batching is disabled by default settings, but I disabled it manually too
- There are 4 different models that produce different type of output (single value, set of values, bitmap data). ALL 4 have different result while being served by TFServing and plain TF. Dimensions of the output tensors are fine, values they have are “wrong”.
submitted by /u/SG_Able
[visit reddit] [comments]