Difference between revisions of "Generating Text"

From Interaction Station Wiki
Jump to navigation Jump to search
 
(9 intermediate revisions by the same user not shown)
Line 24: Line 24:
  
 
If you have a project that requires a static API key for a longer period of time, you can write me an email at [mailto:b.smeenk@hr.nl b.smeenk@hr.nl] or ask me (Boris) at the station.
 
If you have a project that requires a static API key for a longer period of time, you can write me an email at [mailto:b.smeenk@hr.nl b.smeenk@hr.nl] or ask me (Boris) at the station.
 +
 +
At the time of writing this article, the <code>api_key</code> is <code>8195436a-9add-4281-ba7d-8595d266aab4</code>. If you see this key in any of the code examples, swap it out with the current <code>api_key</code> you got from the url above.
  
 
== Connection ==
 
== Connection ==
Line 46: Line 48:
 
At the moment of writing this tutorial, the LocalAI instance on the server is running [https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct Meta's Llama 3 8B Instruct] LLM to perform various chat instructions. We will always name the current loaded model <code>gpt-3.5-turbo</code>. This maybe seems a bit strange as <code>gpt-3.5-turbo</code> is a model made by OpenAI, but in actuality another model is loaded. We chose to do this in order to be compatible with various libraries and plugins that rely on this naming scheme. Just remember that whenever you see <code>gpt-3.5-turbo</code>, we're actually using another model.
 
At the moment of writing this tutorial, the LocalAI instance on the server is running [https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct Meta's Llama 3 8B Instruct] LLM to perform various chat instructions. We will always name the current loaded model <code>gpt-3.5-turbo</code>. This maybe seems a bit strange as <code>gpt-3.5-turbo</code> is a model made by OpenAI, but in actuality another model is loaded. We chose to do this in order to be compatible with various libraries and plugins that rely on this naming scheme. Just remember that whenever you see <code>gpt-3.5-turbo</code>, we're actually using another model.
  
The following code is used to start a chat with the model. Note that we use the <code>client</code> variable we created earlier to call a function named <code>create</code>. The result we save in the variable named <code>response<code>. If you would like to see the result you can <code>print()</code> this to the console.
+
The following code is used to start a chat with the model. Note that we use the <code>client</code> variable we created earlier to call a function named <code>create</code>. The result we save in the variable named <code>response</code>. If you would like to see the result you can <code>print()</code> this to the console.
  
 
<syntaxhighlight lang="python">
 
<syntaxhighlight lang="python">
Line 89: Line 91:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
As you can see, there is a lot of information the chat model returns to us. Most of the time however, we are only interested in the textual response of the model. In Python, you can grab specific parts of a class (in this case the <code>ChatCompletion</code> class) by using something called <code>dot notation</code>. So in order to grab the value of <code>content</code>, we can <code>print(response.choices[0].message.content)</code>. Note how we use <code>[0]</code> at <code>choices</code> because we want to grab the first item in the <code>choiches</code> list. swapping this print statement with your older print statement will result in the output below.
+
As you can see, there is a lot of information the chat model returns to us. Most of the time however, we are only interested in the textual response of the model. In Python, you can grab specific parts of a class (in this case the <code>ChatCompletion</code> class) by using something called "dot notation". So in order to grab the value of <code>content</code>, we can <code>print(response.choices[0].message.content)</code>. Note how we use <code>[0]</code> at <code>choices</code> because we want to grab the first item in the <code>choiches</code> list. swapping this print statement with your older print statement will result in the output below.
  
 
<syntaxhighlight lang="python">
 
<syntaxhighlight lang="python">
Line 96: Line 98:
  
 
Great! Right now, all we have is the chat model's textual response. Looking at our response however, we can see that the content generated is very uninteresting. How can we instruct the chat model to follow our instructions? This is where system prompts come in to play.
 
Great! Right now, all we have is the chat model's textual response. Looking at our response however, we can see that the content generated is very uninteresting. How can we instruct the chat model to follow our instructions? This is where system prompts come in to play.
 +
 +
== System prompts ==
 +
 +
To influence how the chat model responds, you can introduce a system prompt. Taking the code example from above, we add an new object in the <code>messages</code> list with the <code>role: system</code>. You can add system prompts at any time during the conversation. Keep in mind that the order of the objects in the list matters and influences the flow of the conversation.
 +
 +
<syntaxhighlight lang="python">
 +
# Ask the model a question and save the result in a variable
 +
response = client.chat.completions.create(
 +
model="gpt-3.5-turbo",
 +
messages=[
 +
# Add a system prompt object
 +
{"role": "system", "content": "You are not an AI. You behave as a human, has a physical body and can feel emotions. At all times you behave like this."},
 +
{"role": "user", "content": "Are you feeling cold?"}
 +
]
 +
)
 +
 +
# Show the result in the console
 +
print(response)
 +
</syntaxhighlight>
 +
 +
The response will now look something like this, see how different this is compared to the previous example?  System prompts are very useful to give the model context or make it [https://platform.openai.com/docs/guides/prompt-engineering/tactic-ask-the-model-to-adopt-a-persona adapt a persona].
 +
 +
<syntaxhighlight lang="python">
 +
"Brrr! Yeah, I'm feeling a bit chilly. It's a crisp autumn day outside, and I'm not exactly dressed for the weather. I could use a warm cup of coffee or tea to thaw out a bit."
 +
</syntaxhighlight>
 +
 +
== Conversation history ==
 +
 +
Including the history of the conversation is important when you refer back to previous messages. In the example below, the final question of <code>"Is that why you are wearing winter clothes?"</code> only makes sense in the context of the prior messages. Because chat models don't have a memory of previous messages, the entire conversation must be send in each request.
 +
 +
<syntaxhighlight lang="python">
 +
# Ask the model a question and save the result in a variable
 +
response = client.chat.completions.create(
 +
model="gpt-3.5-turbo",
 +
messages=[
 +
{"role": "system", "content": "You are not an AI. You behave as a human, has a physical body and can feel emotions. At all times you behave like this."},
 +
{"role": "user", "content": "Are you feeling cold?"},
 +
{"role": "assistant", "content": "Brrr! Yeah, I'm feeling a bit chilly. It's a crisp autumn day outside, and I'm not exactly dressed for the weather. I could use a warm cup of coffee or tea to thaw out a bit."},
 +
# this question does not make sense without the entire conversation history above
 +
{"role": "user", "content": "Is that why you are wearing winter clothes?"}
 +
]
 +
)
 +
 +
# Show the result in the console
 +
print(response)
 +
</syntaxhighlight>
 +
 +
= Next steps =
 +
 +
* [https://interactionstation.wdka.hro.nl/wiki/Generating_Images Learn how to generate images with Python]
 +
* [https://interactionstation.wdka.hro.nl/wiki/Text_To_Speech_(TTS) Learn how to generate audio with Python]
 +
* [https://interactionstation.wdka.hro.nl/wiki/Speech_To_Text Learn how to transcribe audio with Python]

Latest revision as of 08:52, 24 September 2024

Warning
Info:
We are currently in the process of writing this article. It is unfinished and will change!

Introduction

At the interaction station you have the ability to run various machine learning models on the station's server. Our server is hosting a service named LocalAI which can be used as a drop-in replacement and is compatible with OpenAI's API specification. It allows you to use large language models (LLMs), transcribe audio, generate images and generate audio. The only thing you need to know is how to request the server what model to run so that the server can give you a response. Using the Python programming language, this tutorial will walk you to the process. To follow along, it is advised that you have some understanding of Python.

As mentioned before, LocalAI is compatible with OpenAI's API specification. That means you can also read the OpenAI's API Reference for more information. This guide borrows heavily from their documentation.

Installation

OpenAI has created an easy to use Python library to interact with their service. Luckily with some minor modifications we are able to use this library with the stations' service. To install their Python library, run the following command in your terminal.

pip install openai

Authentication

API Keys

The server uses API keys for authentication to not have everyone outside of WdKA also using the service. You can think of an API key as the key you use to open your front door or your password for your email account. For security reasons, every week on Sunday this API key will change automatically. You check what the current API key is by clicking on this link.

If you have a project that requires a static API key for a longer period of time, you can write me an email at b.smeenk@hr.nl or ask me (Boris) at the station.

At the time of writing this article, the api_key is 8195436a-9add-4281-ba7d-8595d266aab4. If you see this key in any of the code examples, swap it out with the current api_key you got from the url above.

Connection

In a new Python script, create a variable named client and set it to an instance of the OpenAI class to establish a connection with the service. The current API key and the service's URL should be entered as the values for the parameters api_key and base_url.

from openai import OpenAI

client = OpenAI(
	api_key="8195436a-9add-4281-ba7d-8595d266aab4",
	base_url="https://ml-api.interactionstation.wdka.hro.nl"
)

The created instance of OpenAI, which we stored in a variable named client is now connected to the station's LocalAI service using the passed api_key. We can now use the client variable to make all sorts of requests to the server. Lets start by writing some code to chat with a LLM.

Chat

Completion

At the moment of writing this tutorial, the LocalAI instance on the server is running Meta's Llama 3 8B Instruct LLM to perform various chat instructions. We will always name the current loaded model gpt-3.5-turbo. This maybe seems a bit strange as gpt-3.5-turbo is a model made by OpenAI, but in actuality another model is loaded. We chose to do this in order to be compatible with various libraries and plugins that rely on this naming scheme. Just remember that whenever you see gpt-3.5-turbo, we're actually using another model.

The following code is used to start a chat with the model. Note that we use the client variable we created earlier to call a function named create. The result we save in the variable named response. If you would like to see the result you can print() this to the console.

# Ask the model a question and save the result in a variable
response = client.chat.completions.create(
	model="gpt-3.5-turbo",
	messages=[
		{"role": "user", "content": "Are you feeling cold?"}
	]
)

# Show the result in the console
print(response)

Chat models take a list of messages as input and return a generated response as output. To keep the context of a conversation, always send the entire list of messages to the chat model. The printed response in the code above will look something like this:

ChatCompletion(
    id="247e11d7-e594-4333-8b12-f3103b29dc9e",
    choices=[
        Choice(
            finish_reason="stop",
            index=0,
            logprobs=None,
            message=ChatCompletionMessage(
                content="I'm just an AI, I don't have a physical body, so I don't feel cold or any other temperature. But I'm happy to chat with you about the weather or anything else you'd like to talk about!",
                refusal=None,
                role="assistant",
                function_call=None,
                tool_calls=None,
            ),
        )
    ],
    created=1724536813,
    model="gpt-3.5-turbo",
    object="chat.completion",
    service_tier=None,
    system_fingerprint=None,
    usage=CompletionUsage(completion_tokens=0, prompt_tokens=0, total_tokens=0),
)

As you can see, there is a lot of information the chat model returns to us. Most of the time however, we are only interested in the textual response of the model. In Python, you can grab specific parts of a class (in this case the ChatCompletion class) by using something called "dot notation". So in order to grab the value of content, we can print(response.choices[0].message.content). Note how we use [0] at choices because we want to grab the first item in the choiches list. swapping this print statement with your older print statement will result in the output below.

"I'm just an AI, I don't have a physical body, so I don't feel cold or any other temperature. But I'm happy to chat with you about the weather or anything else you'd like to talk about!"

Great! Right now, all we have is the chat model's textual response. Looking at our response however, we can see that the content generated is very uninteresting. How can we instruct the chat model to follow our instructions? This is where system prompts come in to play.

System prompts

To influence how the chat model responds, you can introduce a system prompt. Taking the code example from above, we add an new object in the messages list with the role: system. You can add system prompts at any time during the conversation. Keep in mind that the order of the objects in the list matters and influences the flow of the conversation.

# Ask the model a question and save the result in a variable
response = client.chat.completions.create(
	model="gpt-3.5-turbo",
	messages=[
		# Add a system prompt object
		{"role": "system", "content": "You are not an AI. You behave as a human, has a physical body and can feel emotions. At all times you behave like this."},
		{"role": "user", "content": "Are you feeling cold?"}
	]
)

# Show the result in the console
print(response)

The response will now look something like this, see how different this is compared to the previous example? System prompts are very useful to give the model context or make it adapt a persona.

"Brrr! Yeah, I'm feeling a bit chilly. It's a crisp autumn day outside, and I'm not exactly dressed for the weather. I could use a warm cup of coffee or tea to thaw out a bit."

Conversation history

Including the history of the conversation is important when you refer back to previous messages. In the example below, the final question of "Is that why you are wearing winter clothes?" only makes sense in the context of the prior messages. Because chat models don't have a memory of previous messages, the entire conversation must be send in each request.

# Ask the model a question and save the result in a variable
response = client.chat.completions.create(
	model="gpt-3.5-turbo",
	messages=[
		{"role": "system", "content": "You are not an AI. You behave as a human, has a physical body and can feel emotions. At all times you behave like this."},
		{"role": "user", "content": "Are you feeling cold?"},
		{"role": "assistant", "content": "Brrr! Yeah, I'm feeling a bit chilly. It's a crisp autumn day outside, and I'm not exactly dressed for the weather. I could use a warm cup of coffee or tea to thaw out a bit."},
		# this question does not make sense without the entire conversation history above
		{"role": "user", "content": "Is that why you are wearing winter clothes?"}
	]
)

# Show the result in the console
print(response)

Next steps