1

I'm using Generative AI API to return text responses as JSON strings which I intend to feed data into an application in real time. The problem is that often the JSON response provided by GenAI API includes small errors- most commonly with double quotes. These syntax issues in the response JSON string trigger errors in my python code when converting them to JSON.

For instance, I have the following JSON string:
'{"test":"this is "test" of "a" test"","result":"your result is "out" in our website"}'

As you can see, the value for "test" has multiple double quotations. So if I try to convert this to json, I get an error. What I want to do is utilize regex to convert the double quotations to single quotations. So the result can look as follows:
'{"test":"this is 'test' of 'a' test'", "result": "your result is 'out' in our website"}'

The best I can do is as follows:

def repl_call(m):
    preq = m.group(1)
    qbody = m.group(2)
    qbody = re.sub( r'"', "'", qbody )
    return preq + '"' + qbody + '"'

print( re.sub( r'([:\[,{]\s*)"(.*?)"(?=\s*[:,\]}])', repl_call, text ))

The following code successfully returns the intended result. However, if I were to add a comma, such as
{"test":"this is "test" of "a", test"","result":"your result is "out" in our website"}

...the code breaks and returns the following:
'{"test":"this is 'test' of 'a", test"","result":"your result is 'out' in our website"}'

:(

I've presently have tried to improve my AI prompt (prompt engineering) to avoid the double quotations and return only a valid JSON string. This works to some degree, but I still encounter enough errors in syntax that require me to retry the same prompt multiple times- which incurs unnecessary delays and costs.

My question is: Is there such thing as a common function and REGEX pattern I can apply in python to fix my JSON string so that it properly cleanses syntax errors? Specifically relating to misplaced double quotes.

I'm open to a variety of suggestions, including possible Python packages that can deal with JSON string cleansing. Even any advice on advanced GenAI tools that do JSON enforcement. I presently use Gemeni- which I like a lot. But doesn't allow JSON enforcement like OpenAI's API allows more explicitly.

3
  • About I presently use Gemeni- which I like a lot. But doesn't allow JSON enforcement like OpenAI's API allows more explicitly., is this report useful? medium.com/google-cloud/… Commented Aug 15, 2024 at 5:18
  • Please edit your question and include minimal reproducible example Commented Aug 15, 2024 at 6:35
  • Using regex you could match the value up to the next key or end of the string. Capture the value and use a function to convert double to single quotes, see this Python demo (regex101) Commented Aug 15, 2024 at 9:29

1 Answer 1

3

If you are requesting JSon back you should be using the response_mime_type and then you will not have these issues with parsing the JSon.

from dotenv import load_dotenv
import google.generativeai as genai
import os

load_dotenv()
genai.configure(api_key=os.environ['API_KEY'])
MODEL_NAME_LATEST = os.environ['MODEL_NAME_LATEST']

model = genai.GenerativeModel(
    model_name=MODEL_NAME_LATEST,
    # Set the `response_mime_type` to output JSON
    generation_config={"response_mime_type": "application/json"})

prompt = """
  List 5 popular cookie recipes.
  Using this JSON schema:
    Recipe = {"recipe_name": str}
  Return a `list[Recipe]`
  """

response = model.generate_content(prompt)
print(response.text)

Just remember to ensure that the JSon object you tell it to use is actually correct JSon or it may build it incorrectly include all , where they should be

response schema

Another option would be to use response schema.

from dotenv import load_dotenv
import google.generativeai as genai
import os
import typing_extensions as typing

load_dotenv()
genai.configure(api_key=os.environ['API_KEY'])
MODEL_NAME_LATEST = os.environ['MODEL_NAME_LATEST']


class Recipe(typing.TypedDict):
    recipe_name: str


model = genai.GenerativeModel(
    model_name=MODEL_NAME_LATEST,
    # Set the `response_mime_type` to output JSON
    # Pass the schema object to the `response_schema` field
    generation_config={"response_mime_type": "application/json",
                       "response_schema": list[Recipe]})

prompt = "List 5 popular cookie recipes"

response = model.generate_content(prompt)
print(response.text)

see Json mode

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.