Currently I am working on a federated-learning project using TensorFlow Federated. I was making a request from a server to check if my code was working when I got this error:
RuntimeError: No default context installed.
You should not expect to get this error using the TFF API.
However, I only encounter it under some specific conditions.
Scenario goes like this (all the code is bellow):
A http request is made from the website. The function upload_and_train in routes/developers.py handles the request. Inside this, the start_processing function is called which starts the training preprocess (gathering train data, initializing hyperparameters etc). Finally the federated_computation_new function is called (which is where it also crashes) which starts the federated learning. It crashes when it reaches the call: iterative_process.initialize().
iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
state = iterative_process.initialize()
The confusing part is the following. If I run the code locally, everything goes well, the training process is working; no errors. If I run it on the server It also works for the first request made. Afterwards it crashes and returns the same error (stated in more details bellow) on all the following requests until I restart the server. Then it again works perfectly for the first call, and proceeds to crash on subsequent calls.
This issue is driving me nuts, I can't figure it out. My only remaining idea is that something is happening after the first call (a process is not closed or something like that) and on subsequent calls it doesn't get a "fresh" start? Although it shouldn't happen in the first place.
Full error message bellow:
143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
INFO:werkzeug:143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
ERROR:main:Exception on /api/Developers/use_cases/text_processing/developer_id/4/upload_and_train [POST]
Traceback (most recent call last):
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/decorator.py", line 48, in wrapper
response = function(request)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/uri_parsing.py", line 144, in wrapper
response = function(request)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/validation.py", line 384, in wrapper
return function(request)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/parameter.py", line 121, in wrapper
return function(**kwargs)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/routes/developers.py", line 46, in upload_and_train
last_train_metrics = main_proc.start_processing(use_case,developer_id)
File "processing/text_processing/main_proc.py", line 17, in start_processing
state,metrics = federated_computation_new(train_dataset,test_dataset)
File "processing/text_processing/federated_algorithm.py", line 29, in federated_computation_new
state = iterative_process.initialize()
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py", line 521, in __call__
return context.invoke(self, arg)
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 41, in invoke
self._raise_runtime_error()
File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 23, in _raise_runtime_error
raise RuntimeError(
RuntimeError: No default context installed.
You should not expect to get this error using the TFF API.
If you are getting this error when testing a module inside of `tensorflow_federated/python/core/...`, you may need to explicitly invoke `execution_contexts.set_local_execution_context()` in the `main` function of your test.
First Function which handles the incoming requests. The request contains 4 parameters: 2 identifiers the "use_case" and the "developer_"id" and 2 formData files which contain the training data, which is stored locally.
def upload_and_train(use_case: str, developer_id: int):
use_case_path = 'processing/'+use_case+'/'
sys.path.append(use_case_path)
import main_proc
app_path = dirname(dirname(abspath(__file__)))
file_dict = request.files
db_File_True = file_dict["dataset_file1"]
db_File_Fake = file_dict["dataset_file2"]
true_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "True.csv")
fake_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "Fake.csv")
db_File_True.save(true_csv_path)
db_File_Fake.save(fake_csv_path)
time.sleep(5) #wait for the files to be copied before proceeding
#THEN start processing
last_train_metrics = main_proc.start_processing(use_case,developer_id) # <============== GOES INTO HERE & CRASHES
metricsJson = trainMetricsToJSON(last_train_metrics)
return Response(status=200, response=metricsJson)
The function which starts the preprocessing:
def start_processing(use_case, developer_id:int = 0):
globals.initialize(use_case,developer_id)
globals.TRAINER_ID = developer_id
train_dataset, test_dataset= get_preprocessed_train_test_data()
state,metrics = federated_computation_new(train_dataset,test_dataset) # <============== GOES INTO HERE & CRASHES
trained_metrics= metrics['train']
timestamp = int(time.time())
globals.DATASET_ID = timestamp
written_row = save_to_file_CSV(use_case,globals.TRAINER_ID,timestamp,globals.DATASET_ID,trained_metrics['sparse_categorical_accuracy'],trained_metrics['loss'])
return written_row
The function where the federated training is being done:
def federated_computation_new(train_dataset,test_dataset):
# Training and evaluating the model
iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
state = iterative_process.initialize() # <============== CRASHES HERE
print(type(state))
for n in range(globals.EPOCHS):
state, metrics = iterative_process.next(state, train_dataset)
print('round {}, training metrics={}'.format(n+1, metrics))
evaluation = tff.learning.build_federated_evaluation(model_fn)
eval_metrics = evaluation(state.model, train_dataset)
print('Training evaluation metrics={}'.format(eval_metrics))
test_metrics = evaluation(state.model, test_dataset)
print('Test evaluation metrics={}'.format(test_metrics))
#############################################################################################
#Save Last Trained Model
import pickle
with open("processing/"+globals.USE_CASE+"/last_model",'wb') as f:
pickle.dump(state, f)
return state,metrics
def model_fn():
keras_model = get_simple_LSTM_model()
return tff.learning.from_keras_model(
keras_model,
input_spec=globals.INPUT_SPEC,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
The function: /home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py", line 521,
def __call__(self, *args, **kwargs):
context = self._context_stack.current
arg = pack_args(self._type_signature.parameter, args, kwargs, context)
return context.invoke(self, arg) # <============== This returns the runtime Error
Thank you very much in advance for your time and patience.