2

I am trying to, for the first time, deploy an NLP ML model. To do this it was suggested that I use FastAPI and uvicorn. I have had some success in getting FastAPI to respond; however, I have not been able to successfully pass the dataframe and have it process it. I've tried using dictionaries and even attempted to convert the passed json to a dataframe.

With data_dict = data.dict() I get: ValueError: Iterable over raw text documents expected, string object received.

With data_dict = pd.DataFrame(data.dict()) I get: ValueError: If using all scalar values, you must pass an index

I believe I understand the problem, my Data class is expecting a string which this is not; however, I have not been able to determine how to set and / or pass the expected data so that fit_transform() will work. Ultimately I will have a prediction returned based on the submitted messages value. Bonus if I can pass a dataframe of 1 or more rows and have predictions made and returned for each of the rows. The response will include the id, project, and the prediction so that we are in future able to leverage this response to post the prediction back to the original (requesting) system.

test_connection.py

#%%
import requests
import pandas as pd
import json
import os
from pprint import pprint

url = 'http://127.0.0.1:8000/predict'
print(os.getcwd())
#%%
df = pd.DataFrame(
    {
        'id': ['ab410483801c38', 'cd34148639180'],
        'project': ['project1', 'project2'], 
        'messages': ['This is message 1', 'This is message 2']
    }
)
to_predict_dict = df.iloc[0].to_dict()
#%%
r = requests.post(url, json=to_predict_dict)

main.py

#!/usr/bin/env python
# coding: utf-8

import pickle
import pandas as pd
import numpy as np
from pydantic import BaseModel
from sklearn.feature_extraction.text import TfidfVectorizer

# Server
import uvicorn
from fastapi import FastAPI
# Model
import xgboost as xgb


app = FastAPI()

clf = pickle.load(open('data/xgbmodel.pickle', 'rb'))

class Data(BaseModel):
    # id: str
    project: str
    messages: str

@app.get("/ping")
async def test():
    return {"ping": "pong"}

@app.post("/predict")
async def predict(data: Data):
#    data_dict = data.dict()
    data_dict = pd.DataFrame(data.dict())
    tfidf_vect = TfidfVectorizer(stop_words="english", analyzer='word', token_pattern=r'\w{1,}')
    tfidf_vect.fit_transform(data_dict['messages'])
#   to_predict = tfidf_vect.transform(data_dict['messages'])
#   prediction = clf.predict(to_predict)

    return {"response": "Success"}
5
  • can't you do it without DataFrame in main.py ? fit_transform(data.messages) ? Commented Jul 31, 2020 at 7:49
  • No that’s when I get the ValueError String Received. I apologize this wasn’t clear in my post, but those errors actually occur at the fit_transform() step. Commented Jul 31, 2020 at 11:54
  • I’ll add that I haven’t tried with the dot notation, I’ve only tried with brackets. Not sure there’s a difference but will give it a try. Commented Jul 31, 2020 at 11:57
  • Skipping the whole data_dict = data.dict() and simply using data.messages did not work. The issue is my Data class where I have defined data features as str and fit_transofrm is expecting raw text documents. Commented Jul 31, 2020 at 13:26
  • my mistake - name messages was missleading - I thought it gives list of messages. For single message (single string) I would use name message without s Commented Jul 31, 2020 at 13:34

4 Answers 4

1

Probably not the most elegant solution but I've made progress using the following:

def predict(data: Data):
    data_dict = pd.DataFrame(
        {
            'id': [data.id],
            'project': [data.project],
            'messages': [data.messages]
        }
    )
Sign up to request clarification or add additional context in comments.

3 Comments

Uncommenting the remaining code, tfidf_vect, to_predict, prediction, and attempting to return {"Prediction": prediction} results in a dump of data ending in in input data` and an error JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Wouldn't this solution be difficult to implement if I have many (e.g.40+) columns?
@KennethLeung great question but I think this answer is extensible to that case using a dict comprehension. e.g.: data_dict = {c: df[c] for c in df.columns}
1

Frist, encode your dataFrame df to JSON record-oriented:

r = requests.post(url, json=df.to_json(orient='records')).

Then, decode your data inside the /predict/ endpoint with:

df = pd.DataFrame(jsonable_encoder(data))

Remember to import the module from fastapi.encoders import jsonable_encoder.

2 Comments

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
for me df.to_json() did the trick!
1

A new library called pandera now supports direct passage of DataFrames without conversion via FastAPI. The docs are bit basic as of posting this, but may be worth reading: https://pandera.readthedocs.io/en/latest/fastapi.html#fastapi-integration.

Comments

0

I was able to address the issue by simply converting data.messages into a list. I also had to make some unrelated changes, I had failed to pickle my vectorizer (string tokenizer).

import pickle
import pandas as pd
import numpy as np
import json
import time
from pydantic import BaseModel
from sklearn.feature_extraction.text import TfidfVectorizer

# Server / endpoint
import uvicorn
from fastapi import FastAPI
# Model
import xgboost as xgb


app = FastAPI(debug=True)

clf = pickle.load(open('data/xgbmodel.pickle', 'rb'))
vect = pickle.load(open('data/tfidfvect.pickle', 'rb'))

class Data(BaseModel):
    id: str = None
    project: str
    messages: str

@app.get("/ping")
async def ping():
    return {"ping": "pong"}

@app.post("/predict/")
def predict(data: Data):
    start = time.time()
    data_l = [data.messages] # make messages iterable.
    to_predict = vect.transform(data_l)
    prediction = clf.predict(to_predict)

    exec_time = round((time.time() - start), 3)
    return {
        "id": data.id,
        "project": data.project,
        "prediction": prediction[0], 
        "execution_time": exec_time
        }

if __name__ == "__main__":
    uvicorn.run(app, host="127.0.0.1", port=8000)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.