Skip to main content

Gemini - Google AI Studio

PropertyDetails
DescriptionGoogle AI Studio is a fully-managed AI development platform for building and using generative AI.
Provider Route on LiteLLMgemini/
Provider DocGoogle AI Studio ↗
API Endpoint for Providerhttps://generativelanguage.googleapis.com
Supported OpenAI Endpoints/chat/completions, /embeddings, /completions
Pass-through EndpointSupported

API Keys​

import os
os.environ["GEMINI_API_KEY"] = "your-api-key"

Sample Usage​

from litellm import completion
import os

os.environ['GEMINI_API_KEY'] = ""
response = completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
)

Supported OpenAI Params​

  • temperature
  • top_p
  • max_tokens
  • stream
  • tools
  • tool_choice
  • response_format
  • n
  • stop

See Updated List

Passing Gemini Specific Params​

Response schema​

LiteLLM supports sending response_schema as a param for Gemini-1.5-Pro on Google AI Studio.

Response Schema

from litellm import completion 
import json
import os

os.environ['GEMINI_API_KEY'] = ""

messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]

response_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"recipe_name": {
"type": "string",
},
},
"required": ["recipe_name"],
},
}


completion(
model="gemini/gemini-1.5-pro",
messages=messages,
response_format={"type": "json_object", "response_schema": response_schema} # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))

Validate Schema

To validate the response_schema, set enforce_validation: true.

from litellm import completion, JSONSchemaValidationError
try:
completion(
model="gemini/gemini-1.5-pro",
messages=messages,
response_format={
"type": "json_object",
"response_schema": response_schema,
"enforce_validation": true # 👈 KEY CHANGE
}
)
except JSONSchemaValidationError as e:
print("Raw Response: {}".format(e.raw_response))
raise e

LiteLLM will validate the response against the schema, and raise a JSONSchemaValidationError if the response does not match the schema.

JSONSchemaValidationError inherits from openai.APIError

Access the raw response with e.raw_response

GenerationConfig Params​

To pass additional GenerationConfig params - e.g. topK, just pass it in the request body of the call, and LiteLLM will pass it straight through as a key-value pair in the request body.

See Gemini GenerationConfigParams

from litellm import completion 
import json
import os

os.environ['GEMINI_API_KEY'] = ""

messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]

completion(
model="gemini/gemini-1.5-pro",
messages=messages,
topK=1 # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))

Validate Schema

To validate the response_schema, set enforce_validation: true.

from litellm import completion, JSONSchemaValidationError
try:
completion(
model="gemini/gemini-1.5-pro",
messages=messages,
response_format={
"type": "json_object",
"response_schema": response_schema,
"enforce_validation": true # 👈 KEY CHANGE
}
)
except JSONSchemaValidationError as e:
print("Raw Response: {}".format(e.raw_response))
raise e

Specifying Safety Settings​

In certain use-cases you may need to make calls to the models and pass safety settigns different from the defaults. To do so, simple pass the safety_settings argument to completion or acompletion. For example:

response = completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}],
safety_settings=[
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
},
]
)

Tool Calling​

from litellm import completion
import os
# set env
os.environ["GEMINI_API_KEY"] = ".."

tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
messages = [{"role": "user", "content": "What's the weather like in Boston today?"}]

response = completion(
model="gemini/gemini-1.5-flash",
messages=messages,
tools=tools,
)
# Add any assertions, here to check response args
print(response)
assert isinstance(response.choices[0].message.tool_calls[0].function.name, str)
assert isinstance(
response.choices[0].message.tool_calls[0].function.arguments, str
)


JSON Mode​

from litellm import completion 
import json
import os

os.environ['GEMINI_API_KEY'] = ""

messages = [
{
"role": "user",
"content": "List 5 popular cookie recipes."
}
]



completion(
model="gemini/gemini-1.5-pro",
messages=messages,
response_format={"type": "json_object"} # 👈 KEY CHANGE
)

print(json.loads(completion.choices[0].message.content))
# Gemini-Pro-Vision LiteLLM Supports the following image types passed in `url` - Images with direct links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg - Image in local storage - ./localimage.jpeg

Sample Usage​

import os
import litellm
from dotenv import load_dotenv

# Load the environment variables from .env file
load_dotenv()
os.environ["GEMINI_API_KEY"] = os.getenv('GEMINI_API_KEY')

prompt = 'Describe the image in a few sentences.'
# Note: You can pass here the URL or Path of image directly.
image_url = 'https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg'

# Create the messages payload according to the documentation
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt
},
{
"type": "image_url",
"image_url": {"url": image_url}
}
]
}
]

# Make the API call to Gemini model
response = litellm.completion(
model="gemini/gemini-pro-vision",
messages=messages,
)

# Extract the response content
content = response.get('choices', [{}])[0].get('message', {}).get('content')

# Print the result
print(content)

Usage - PDF / Videos / etc. Files​

Inline Data (e.g. audio stream)​

LiteLLM follows the OpenAI format and accepts sending inline data as an encoded base64 string.

The format to follow is

data:<mime_type>;base64,<encoded_data>

LITELLM CALL

import litellm
from pathlib import Path
import base64
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

audio_bytes = Path("speech_vertex.mp3").read_bytes()
encoded_data = base64.b64encode(audio_bytes).decode("utf-8")
print("Audio Bytes = {}".format(audio_bytes))
model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the audio."},
{
"type": "image_url",
"image_url": "data:audio/mp3;base64,{}".format(encoded_data), # 👈 SET MIME_TYPE + DATA
},
],
}
],
)

Equivalent GOOGLE API CALL

# Initialize a Gemini model appropriate for your use case.
model = genai.GenerativeModel('models/gemini-1.5-flash')

# Create the prompt.
prompt = "Please summarize the audio."

# Load the samplesmall.mp3 file into a Python Blob object containing the audio
# file's bytes and then pass the prompt and the audio to Gemini.
response = model.generate_content([
prompt,
{
"mime_type": "audio/mp3",
"data": pathlib.Path('samplesmall.mp3').read_bytes()
}
])

# Output Gemini's response to the prompt and the inline audio.
print(response.text)

https:// file​

import litellm
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the file."},
{
"type": "image_url",
"image_url": "https://storage..." # 👈 SET THE IMG URL
},
],
}
],
)

gs:// file​

import litellm
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the file."},
{
"type": "image_url",
"image_url": "gs://..." # 👈 SET THE cloud storage bucket url
},
],
}
],
)

Chat Models​

tip

We support ALL Gemini models, just set model=gemini/<any-model-on-gemini> as a prefix when sending litellm requests

Model NameFunction CallRequired OS Variables
gemini-procompletion(model='gemini/gemini-pro', messages)os.environ['GEMINI_API_KEY']
gemini-1.5-pro-latestcompletion(model='gemini/gemini-1.5-pro-latest', messages)os.environ['GEMINI_API_KEY']
gemini-pro-visioncompletion(model='gemini/gemini-pro-vision', messages)os.environ['GEMINI_API_KEY']

Context Caching​

Use Google AI Studio context caching is supported by

{
{
"role": "system",
"content": ...,
"cache_control": {"type": "ephemeral"} # 👈 KEY CHANGE
},
...
}

in your message content block.

Architecture Diagram​

Notes:

  • Relevant code

  • Gemini Context Caching only allows 1 block of continuous messages to be cached.

  • If multiple non-continuous blocks contain cache_control - the first continuous block will be used. (sent to /cachedContent in the Gemini format)

  • The raw request to Gemini's /generateContent endpoint looks like this:
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-001:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"contents": [
{
"parts":[{
"text": "Please summarize this transcript"
}],
"role": "user"
},
],
"cachedContent": "'$CACHE_NAME'"
}'

Example Usage​

from litellm import completion 

for _ in range(2):
resp = completion(
model="gemini/gemini-1.5-pro",
messages=[
# System Message
{
"role": "system",
"content": [
{
"type": "text",
"text": "Here is the full text of a complex legal agreement" * 4000,
"cache_control": {"type": "ephemeral"}, # 👈 KEY CHANGE
}
],
},
# marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.
{
"role": "user",
"content": [
{
"type": "text",
"text": "What are the key terms and conditions in this agreement?",
"cache_control": {"type": "ephemeral"},
}
],
}]
)

print(resp.usage) # 👈 2nd usage block will be less, since cached tokens used