Generating Text
Introduction
At the interaction station you have the ability to run various machine learning models on the station's server. Our server is hosting a service named LocalAI which can be used as a drop-in replacement and is compatible with OpenAI's API specification. It allows you to use large language models (LLMs), transcribe audio, generate images and generate audio. The only thing you need to know is how to request the server what model to run so that the server can give you a response. Using the Python programming language, this tutorial will walk you to the process. To follow along, it is advised that you have some understanding of Python.
As mentioned before, LocalAI is compatible with OpenAI's API specification. That means you can also read the OpenAI's API Reference for more information. This guide borrows heavily from their documentation.
Installation
OpenAI has created an easy to use Python library to interact with their service. Luckily with some minor modifications we are able to use this library with the stations' service. To install their Python library, run the following command in your terminal.
pip install openai
Authentication
API Keys
The server uses API keys for authentication to not have everyone outside of WdKA also using the service. You can think of an API key as the key you use to open your front door or your password for your email account. For security reasons, every week on Sunday this API key will change automatically. You check what the current API key is by clicking on this link.
If you have a project that requires a static API key for a longer period of time, you can write me an email at b.smeenk@hr.nl or ask me (Boris) at the station.
Connection
In a new Python script, create a variable named client
and set it to an instance of the OpenAI
class to establish a connection with the service. The current API key and the service's URL should be entered as the values for the parameters api_key
and base_url
.
from openai import OpenAI
client = OpenAI(
api_key="8195436a-9add-4281-ba7d-8595d266aab4",
base_url="https://ml-api.interactionstation.wdka.hro.nl"
)
The created instance of OpenAI
, which we stored in a variable named client
is now connected to the station's LocalAI service using the passed api_key
. We can now use the client
variable to make all sorts of requests to the server. Lets start by writing some code to chat with a LLM.
Chat
Completion
At the moment of writing this tutorial, the LocalAI instance on the server is running Meta's Llama 3 8B Instruct LLM to perform various chat instructions. We will always name the current loaded model gpt-3.5-turbo
. This maybe seems a bit strange as gpt-3.5-turbo
is a model made by OpenAI, but in actuality another model is loaded. We chose to do this in order to be compatible with various libraries and plugins that rely on this naming scheme. Just remember that whenever you see gpt-3.5-turbo
, we're actually using another model.
The following code is used to start a chat with the model. Note that we use the client
variable we created earlier to call a function named create
. The result we save in the variable named response
. If you would like to see the result you can print()
this to the console.
# Ask the model a question and save the result in a variable
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": "Are you feeling cold?"}
]
)
# Show the result in the console
print(response)
Chat models take a list of messages as input and return a generated response as output. To keep the context of a conversation, always send the entire list of messages to the chat model. The printed response in the code above will look something like this:
ChatCompletion(
id="247e11d7-e594-4333-8b12-f3103b29dc9e",
choices=[
Choice(
finish_reason="stop",
index=0,
logprobs=None,
message=ChatCompletionMessage(
content="I'm just an AI, I don't have a physical body, so I don't feel cold or any other temperature. But I'm happy to chat with you about the weather or anything else you'd like to talk about!",
refusal=None,
role="assistant",
function_call=None,
tool_calls=None,
),
)
],
created=1724536813,
model="gpt-3.5-turbo",
object="chat.completion",
service_tier=None,
system_fingerprint=None,
usage=CompletionUsage(completion_tokens=0, prompt_tokens=0, total_tokens=0),
)
As you can see, there is a lot of information the chat model returns to us. Most of the time however, we are only interested in the textual response of the model. In Python, you can grab specific parts of a class (in this case the ChatCompletion
class) by using something called "dot notation". So in order to grab the value of content
, we can print(response.choices[0].message.content)
. Note how we use [0]
at choices
because we want to grab the first item in the choiches
list. swapping this print statement with your older print statement will result in the output below.
"I'm just an AI, I don't have a physical body, so I don't feel cold or any other temperature. But I'm happy to chat with you about the weather or anything else you'd like to talk about!"
Great! Right now, all we have is the chat model's textual response. Looking at our response however, we can see that the content generated is very uninteresting. How can we instruct the chat model to follow our instructions? This is where system prompts come in to play.