2

Currently I am working on a federated-learning project using TensorFlow Federated. I was making a request from a server to check if my code was working when I got this error:

    RuntimeError: No default context installed.
    
    You should not expect to get this error using the TFF API.

However, I only encounter it under some specific conditions.

Scenario goes like this (all the code is bellow):

A http request is made from the website. The function upload_and_train in routes/developers.py handles the request. Inside this, the start_processing function is called which starts the training preprocess (gathering train data, initializing hyperparameters etc). Finally the federated_computation_new function is called (which is where it also crashes) which starts the federated learning. It crashes when it reaches the call: iterative_process.initialize().

iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
state = iterative_process.initialize()

The confusing part is the following. If I run the code locally, everything goes well, the training process is working; no errors. If I run it on the server It also works for the first request made. Afterwards it crashes and returns the same error (stated in more details bellow) on all the following requests until I restart the server. Then it again works perfectly for the first call, and proceeds to crash on subsequent calls.

This issue is driving me nuts, I can't figure it out. My only remaining idea is that something is happening after the first call (a process is not closed or something like that) and on subsequent calls it doesn't get a "fresh" start? Although it shouldn't happen in the first place.

Full error message bellow:

    143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
INFO:werkzeug:143.205.173.225 - - [12/Oct/2021 13:18:05] "[35m[1mPOST /api/Developers/use_cases/text_processing/developer_id/3/upload_and_train HTTP/1.1[0m" 500 -
 doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU
ERROR:main:Exception on /api/Developers/use_cases/text_processing/developer_id/4/upload_and_train [POST]
Traceback (most recent call last):
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/decorator.py", line 48, in wrapper
    response = function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/uri_parsing.py", line 144, in wrapper
    response = function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/validation.py", line 384, in wrapper
    return function(request)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/connexion/decorators/parameter.py", line 121, in wrapper
    return function(**kwargs)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/routes/developers.py", line 46, in upload_and_train
    last_train_metrics = main_proc.start_processing(use_case,developer_id)
  File "processing/text_processing/main_proc.py", line 17, in start_processing
    state,metrics = federated_computation_new(train_dataset,test_dataset)
  File "processing/text_processing/federated_algorithm.py", line 29, in federated_computation_new
    state = iterative_process.initialize()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py", line 521, in __call__
    return context.invoke(self, arg)
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 41, in invoke
    self._raise_runtime_error()
  File "/home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/context_stack/runtime_error_context.py", line 23, in _raise_runtime_error
    raise RuntimeError(
RuntimeError: No default context installed.

You should not expect to get this error using the TFF API.

If you are getting this error when testing a module inside of `tensorflow_federated/python/core/...`, you may need to explicitly invoke `execution_contexts.set_local_execution_context()` in the `main` function of your test.

First Function which handles the incoming requests. The request contains 4 parameters: 2 identifiers the "use_case" and the "developer_"id" and 2 formData files which contain the training data, which is stored locally.

def upload_and_train(use_case: str, developer_id: int):


    use_case_path = 'processing/'+use_case+'/'
    sys.path.append(use_case_path)
    import main_proc

    app_path = dirname(dirname(abspath(__file__)))
    file_dict = request.files
    db_File_True = file_dict["dataset_file1"]
    db_File_Fake = file_dict["dataset_file2"]
    true_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "True.csv")
    fake_csv_path = os.path.join(app_path+"/"+use_case_path+"db/", "Fake.csv")
    db_File_True.save(true_csv_path)
    db_File_Fake.save(fake_csv_path)
    time.sleep(5) #wait for the files to be copied before proceeding
    #THEN start processing
    last_train_metrics = main_proc.start_processing(use_case,developer_id) # <============== GOES INTO HERE & CRASHES
    metricsJson = trainMetricsToJSON(last_train_metrics)    

    return Response(status=200, response=metricsJson)

The function which starts the preprocessing:

def start_processing(use_case, developer_id:int = 0):
    globals.initialize(use_case,developer_id)
    globals.TRAINER_ID = developer_id
    
    
    train_dataset, test_dataset= get_preprocessed_train_test_data()

    state,metrics = federated_computation_new(train_dataset,test_dataset) # <============== GOES INTO HERE & CRASHES  
    trained_metrics= metrics['train']
    
    timestamp = int(time.time())
    globals.DATASET_ID = timestamp
    
    written_row = save_to_file_CSV(use_case,globals.TRAINER_ID,timestamp,globals.DATASET_ID,trained_metrics['sparse_categorical_accuracy'],trained_metrics['loss'])
    return written_row

The function where the federated training is being done:

def federated_computation_new(train_dataset,test_dataset):

    # Training and evaluating the model
    iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))
    state = iterative_process.initialize() # <============== CRASHES HERE

    print(type(state))

    for n in range(globals.EPOCHS):
        state, metrics = iterative_process.next(state, train_dataset)
        print('round  {}, training metrics={}'.format(n+1, metrics))

    evaluation = tff.learning.build_federated_evaluation(model_fn)
    eval_metrics = evaluation(state.model, train_dataset)
    print('Training evaluation metrics={}'.format(eval_metrics))

    test_metrics = evaluation(state.model, test_dataset)
    print('Test evaluation metrics={}'.format(test_metrics))
    #############################################################################################
    #Save Last Trained Model
    import pickle
    with open("processing/"+globals.USE_CASE+"/last_model",'wb') as f:
        pickle.dump(state, f)
    return state,metrics
def model_fn():
  keras_model = get_simple_LSTM_model()

  return tff.learning.from_keras_model(
      keras_model,
      input_spec=globals.INPUT_SPEC,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

The function: /home/itec/bogdan/Articonf/smart/tools/federated-training/app/venv/lib/python3.8/site-packages/tensorflow_federated/python/core/impl/utils/function_utils.py", line 521,

def __call__(self, *args, **kwargs):
    context = self._context_stack.current
    arg = pack_args(self._type_signature.parameter, args, kwargs, context)
    return context.invoke(self, arg) # <============== This returns the runtime Error

Thank you very much in advance for your time and patience.

1 Answer 1

3

I think we can point to the mechanism which 'should' be preventing this, and give a workaround--but as for diagnosing the root cause here currently I have only guesses.

When you run import tensorflow_federated as tff, this line should execute, installing the execution context at the base of a global context stack which TFF uses to manage meanings of __call__. It is this context stack which is delegated to by the implementation of __call__ in function_utils.py.

Before this line executes, there is a 'default' RuntimeErrorContext installed at the base of the stack, which just throws when anyone tries to invoke anything against this context (for that matter, ingesting something into this context raises as well, but you are failing on invoking a no-arg computation so there is no need to ingest the argument).

So one possibility I think here is that this code is just not running the __init__.py file TFF uses to install the context. It's not obvious to me from the code snippets, but I suppose it might be possible.

There is a reasonable workaround we can give you while we try and diagnose this issue a little further. If inside your federated_computation_new function you call tff.backends.native.set_local_python_execution_context() (or set_local_execution_context, depending on your TFF version) this error should resolve itself.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the reply. In my case it was tff.backends.native.set_local_python_execution_context. However it still returns the same error. I called it before state = iterative_process.initiazlize() and checked, and it is executed. However it doesn't solve anything as the error pattern is the same.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.