llm
- Description
- Interface to pluggable llm backends
- Latest
- llm-0.15.0.tar (.sig), 2024-May-18, 310 KiB
- Maintainer
- Andrew Hyatt <ahyatt@gmail.com>
- Website
- https://github.com/ahyatt/llm
- Browse ELPA's repository
- CGit or Gitweb
- Badge
To install this package from Emacs, use package-install
or list-packages
.
Full description
1. Introduction
This library provides an interface for interacting with Large Language Models (LLMs). It allows elisp code to use LLMs while also giving end-users the choice to select their preferred LLM. This is particularly beneficial when working with LLMs since various high-quality models exist, some of which have paid API access, while others are locally installed and free but offer medium quality. Applications using LLMs can utilize this library to ensure compatibility regardless of whether the user has a local LLM or is paying for API access.
LLMs exhibit varying functionalities and APIs. This library aims to abstract functionality to a higher level, as some high-level concepts might be supported by an API while others require more low-level implementations. An example of such a concept is "examples," where the client offers example interactions to demonstrate a pattern for the LLM. While the GCloud Vertex API has an explicit API for examples, OpenAI's API requires specifying examples by modifying the system prompt. OpenAI also introduces the concept of a system prompt, which does not exist in the Vertex API. Our library aims to conceal these API variations by providing higher-level concepts in our API.
Certain functionalities might not be available in some LLMs. Any such unsupported functionality will raise a 'not-implemented
signal.
2. Setting up providers
Users of an application that uses this package should not need to install it themselves. The llm package should be installed as a dependency when you install the package that uses it. However, you do need to require the llm module and set up the provider you will be using. Typically, applications will have a variable you can set. For example, let's say there's a package called "llm-refactoring", which has a variable llm-refactoring-provider
. You would set it up like so:
(use-package llm-refactoring :init (require 'llm-openai) (setq llm-refactoring-provider (make-llm-openai :key my-openai-key))
Here my-openai-key
would be a variable you set up before with your OpenAI key. Or, just substitute the key itself as a string. It's important to remember never to check your key into a public repository such as GitHub, because your key must be kept private. Anyone with your key can use the API, and you will be charged.
For embedding users. if you store the embeddings, you must set the embedding model. Even though there's no way for the llm package to tell whether you are storing it, if the default model changes, you may find yourself storing incompatible embeddings.
2.1. Open AI
You can set up with make-llm-openai
, with the following parameters:
:key
, the Open AI key that you get when you sign up to use Open AI's APIs. Remember to keep this private. This is non-optional.:chat-model
: A model name from the list of Open AI's model names. Keep in mind some of these are not available to everyone. This is optional, and will default to a reasonable 3.5 model.:embedding-model
: A model name from list of Open AI's embedding model names. This is optional, and will default to a reasonable model.
2.2. Open AI Compatible
There are many Open AI compatible APIs and proxies of Open AI. You can set up one with make-llm-openai-compatible
, with the following parameter:
:url
, the URL of leading up to the command ("embeddings" or "chat/completions"). So, for example, "https://api.openai.com/v1/" is the URL to use Open AI (although if you wanted to do that, just usemake-llm-openai
instead.
2.3. Gemini (not via Google Cloud)
This is Google's AI model. You can get an API key via their page on Google AI Studio.
Set this up with make-llm-gemini
, with the following parameters:
:key
, the Google AI key that you get from Google AI Studio.:chat-model
, the model name, from the [[https://ai.google.dev/models][list of models. This is optional and will default to the text Gemini model.:embedding-model
: the model name, currently must be "embedding-001". This is optional and will default to "embedding-001".
2.4. Vertex (Gemini via Google Cloud)
This is mostly for those who want to use Google Cloud specifically, most users should use Gemini instead, which is easier to set up.
You can set up with make-llm-vertex
, with the following parameters:
:project
: Your project number from Google Cloud that has Vertex API enabled.:chat-model
: A model name from the list of Vertex's model names. This is optional, and will default to a reasonable model.:embedding-model
: A model name from the list of Vertex's embedding model names. This is optional, and will default to a reasonable model.
In addition to the provider, which you may want multiple of (for example, to charge against different projects), there are customizable variables:
llm-vertex-gcloud-binary
: The binary to use for generating the API key.llm-vertex-gcloud-region
: The gcloud region to use. It's good to set this to a region near where you are for best latency. Defaults to "us-central1".If you haven't already, you must run the following command before using this:
gcloud beta services identity create --service=aiplatform.googleapis.com --project=PROJECT_ID
2.5. Claude
Claude is Anthropic's large language model. It does not support embeddings. It does support function calling, but currently not in streaming. You can set it up with the following parameters:
:key
: The API key you get from Claude's settings page. This is required.
:chat-model
: One of the Claude models. Defaults to "claude-3-opus-20240229", the most powerful model.
2.6. Ollama
Ollama is a way to run large language models locally. There are many different models you can use with it. You set it up with the following parameters:
:scheme
: The scheme (http/https) for the connection to ollama. This default to "http".:host
: The host that ollama is run on. This is optional and will default to localhost.:port
: The port that ollama is run on. This is optional and will default to the default ollama port.:chat-model
: The model name to use for chat. This is not optional for chat use, since there is no default.:embedding-model
: The model name to use for embeddings. This is not optional for embedding use, since there is no default.
2.7. GPT4All
GPT4All is a way to run large language models locally. To use it with llm
package, you must click "Enable API Server" in the settings. It does not offer embeddings or streaming functionality, though, so Ollama might be a better fit for users who are not already set up with local models. You can set it up with the following parameters:
:host
: The host that GPT4All is run on. This is optional and will default to localhost.:port
: The port that GPT4All is run on. This is optional and will default to the default ollama port.:chat-model
: The model name to use for chat. This is not optional for chat use, since there is no default.
2.8. llama.cpp
llama.cpp is a way to run large language models locally. To use it with the llm
package, you need to start the server (with the "–embedding" flag if you plan on using embeddings). The server must be started with a model, so it is not possible to switch models until the server is restarted to use the new model. As such, model is not a parameter to the provider, since the model choice is already set once the server starts.
There is a deprecated provider, however it is no longer needed. Instead, llama cpp is Open AI compatible, so the Open AI Compatible provider should work.
2.9. Fake
This is a client that makes no call, but it just there for testing and debugging. Mostly this is of use to programmatic clients of the llm package, but end users can also use it to understand what will be sent to the LLMs. It has the following parameters:
:output-to-buffer
: if non-nil, the buffer or buffer name to append the request sent to the LLM to.:chat-action-func
: a function that will be called to provide a string or symbol and message cons which are used to raise an error.:embedding-action-func
: a function that will be called to provide a vector or symbol and message cons which are used to raise an error.
3. llm
and the use of non-free LLMs
The llm
package is part of GNU Emacs by being part of GNU ELPA. Unfortunately, the most popular LLMs in use are non-free, which is not what GNU software should be promoting by inclusion. On the other hand, by use of the llm
package, the user can make sure that any client that codes against it will work with free models that come along. It's likely that sophisticated free LLMs will, emerge, although it's unclear right now what free software means with respsect to LLMs. Because of this tradeoff, we have decided to warn the user when using non-free LLMs (which is every LLM supported right now except the fake one). You can turn this off the same way you turn off any other warning, by clicking on the left arrow next to the warning when it comes up. Alternatively, you can set llm-warn-on-nonfree
to nil
. This can be set via customization as well.
To build upon the example from before:
(use-package llm-refactoring :init (require 'llm-openai) (setq llm-refactoring-provider (make-llm-openai :key my-openai-key) llm-warn-on-nonfree nil)
4. Programmatic use
Client applications should require the llm
package, and code against it. Most functions are generic, and take a struct representing a provider as the first argument. The client code, or the user themselves can then require the specific module, such as llm-openai
, and create a provider with a function such as (make-llm-openai :key user-api-key)
. The client application will use this provider to call all the generic functions.
For all callbacks, the callback will be executed in the buffer the function was first called from. If the buffer has been killed, it will be executed in a temporary buffer instead.
4.1. Main functions
llm-chat provider prompt
: With user-chosenprovider
, and allm-chat-prompt
structure (created byllm-make-chat-prompt
), send that prompt to the LLM and wait for the string output.llm-chat-async provider prompt response-callback error-callback
: Same asllm-chat
, but executes in the background. Takes aresponse-callback
which will be called with the text response. Theerror-callback
will be called in case of error, with the error symbol and an error message.llm-chat-streaming provider prompt partial-callback response-callback error-callback
: Similar tollm-chat-async
, but request a streaming response. As the response is built up,partial-callback
is called with the all the text retrieved up to the current point. Finally,reponse-callback
is called with the complete text.llm-embedding provider string
: With the user-chosenprovider
, send a string and get an embedding, which is a large vector of floating point values. The embedding represents the semantic meaning of the string, and the vector can be compared against other vectors, where smaller distances between the vectors represent greater semantic similarity.llm-embedding-async provider string vector-callback error-callback
: Same asllm-embedding
but this is processed asynchronously.vector-callback
is called with the vector embedding, and, in case of error,error-callback
is called with the same arguments as inllm-chat-async
.llm-count-tokens provider string
: Count how many tokens are instring
. This may vary byprovider
, because some provideres implement an API for this, but typically is always about the same. This gives an estimate if the provider has no API support.llm-cancel-request request
Cancels the given request, if possible. Therequest
object is the return value of async and streaming functions.llm-name provider
. Provides a short name of the model or provider, suitable for showing to users.llm-chat-token-limit
. Gets the token limit for the chat model. This isn't possible for some backends likellama.cpp
, in which the model isn't selected or known by this library.And the following helper functions:
llm-make-chat-prompt text &keys context examples functions temperature max-tokens
: This is how you make prompts.text
can be a string (the user input to the llm chatbot), or a list representing a series of back-and-forth exchanges, of odd number, with the last element of the list representing the user's latest input. This supports inputting context (also commonly called a system prompt, although it isn't guaranteed to replace the actual system prompt), examples, and other important elements, all detailed in the docstring for this function.llm-chat-prompt-to-text prompt
: From a prompt, return a string representation. This is not usually suitable for passing to LLMs, but for debugging purposes.llm-chat-streaming-to-point provider prompt buffer point finish-callback
: Same basic arguments asllm-chat-streaming
, but will stream topoint
inbuffer
.llm-chat-prompt-append-response prompt response role
: Append a new response (from the user, usually) to the prompt. Therole
is optional, and defaults to'user
.
4.2. Logging
Interactions with the llm
package can be logged by setting llm-log
to a non-nil value. This should be done only when developing. The log can be found in the *llm log*
buffer.
4.3. How to handle conversations
Conversations can take place by repeatedly calling llm-chat
and its variants. The prompt should be constructed with llm-make-chat-prompt
. For a conversation, the entire prompt must be kept as a variable, because the llm-chat-prompt-interactions
slot will be getting changed by the chat functions to store the conversation. For some providers, this will store the history directly in llm-chat-prompt-interactions
, but other LLMs have an opaque conversation history. For that reason, the correct way to handle a conversation is to repeatedly call llm-chat
or variants with the same prompt structure, kept in a variable, and after each time, add the new user text with llm-chat-prompt-append-response
. The following is an example:
(defvar-local llm-chat-streaming-prompt nil) (defun start-or-continue-conversation (text) "Called when the user has input TEXT as the next input." (if llm-chat-streaming-prompt (llm-chat-prompt-append-response llm-chat-streaming-prompt text) (setq llm-chat-streaming-prompt (llm-make-chat-prompt text)) (llm-chat-streaming-to-point provider llm-chat-streaming-prompt (current-buffer) (point-max) (lambda ()))))
4.4. Caution about llm-chat-prompt-interactions
The interactions in a prompt may be modified by conversation or by the conversion of the context and examples to what the LLM understands. Different providers require different things from the interactions. Some can handle system prompts, some cannot. Some require alternating user and assistant chat interactions, others can handle anything. It's important that clients keep to behaviors that work on all providers. Do not attempt to read or manipulate llm-chat-prompt-interactions
after initially setting it up for the first time, because you are likely to make changes that only work for some providers. Similarly, don't directly create a prompt with make-llm-chat-prompt
, because it is easy to create something that wouldn't work for all providers.
4.5. Function calling
Note: function calling functionality is currently alpha quality. If you want to use function calling, please watch the llm
discussions for any announcements about changes.
Function calling is a way to give the LLM a list of functions it can call, and have it call the functions for you. The standard interaction has the following steps:
- The client sends the LLM a prompt with functions it can call.
- The LLM may return which functions to execute, and with what arguments, or text as normal.
- If the LLM has decided to call one or more functions, those functions should be called, and their results sent back to the LLM.
- The LLM will return with a text response based on the initial prompt and the results of the function calling.
- The client can now can continue the conversation.
This basic structure is useful because it can guarantee a well-structured output
(if the LLM does decide to call the function). Not every LLM can handle function
calling, and those that do not will ignore the functions entirely. The function
llm-capabilities
will return a list with function-calls
in it if the LLM
supports function calls. Right now only Gemini, Vertex, Claude, and Open AI
support function calling. Ollama should get function calling soon. However, even
for LLMs that handle function calling, there is a fair bit of difference in the
capabilities. Right now, it is possible to write function calls that succeed in
Open AI but cause errors in Gemini, because Gemini does not appear to handle
functions that have types that contain other types. So client programs are
advised for right now to keep function to simple types.
The way to call functions is to attach a list of functions to the
llm-function-call
slot in the prompt. This is a list of llm-function-call
structs, which takes a function, a name, a description, and a list of
llm-function-arg
structs. The docstrings give an explanation of the format.
The various chat APIs will execute the functions defined in llm-function-call
with the arguments supplied by the LLM. Instead of returning (or passing to a
callback) a string, instead an alist will be returned of function names and
return values.
The client must then send this back to the LLM, to get a textual response from the LLM based on the results of the function call. These have already been added to the prompt, so the client only has to call the LLM again. Gemini and Vertex require this extra call to the LLM, but Open AI does not.
Be aware that there is no gaurantee that the function will be called correctly. While the LLMs mostly get this right, they are trained on Javascript functions, so imitating Javascript names is recommended. So, "writeemail" is a better name for a function than "write-email".
Examples can be found in llm-tester
. There is also a function call to generate
function calls from existing elisp functions in
utilities/elisp-to-function-call.el
.
5. Contributions
If you are interested in creating a provider, please send a pull request, or open a bug. This library is part of GNU ELPA, so any major provider that we include in this module needs to be written by someone with FSF papers. However, you can always write a module and put it on a different package archive, such as MELPA.
Old versions
llm-0.14.3.tar.lz | 2024-May-17 | 54.5 KiB |
llm-0.14.2.tar.lz | 2024-May-15 | 46.3 KiB |
llm-0.12.3.tar.lz | 2024-Mar-31 | 46.0 KiB |
llm-0.12.2.tar.lz | 2024-Mar-25 | 45.9 KiB |
llm-0.12.1.tar.lz | 2024-Mar-22 | 45.5 KiB |
llm-0.12.0.tar.lz | 2024-Mar-17 | 45.4 KiB |
llm-0.10.0.tar.lz | 2024-Mar-02 | 44.2 KiB |
llm-0.9.1.tar.lz | 2024-Feb-04 | 35.6 KiB |
llm-0.9.0.tar.lz | 2024-Jan-21 | 35.3 KiB |
llm-0.8.0.tar.lz | 2023-Dec-30 | 34.2 KiB |
llm-0.7.0.tar.lz | 2023-Dec-18 | 31.9 KiB |
llm-0.6.0.tar.lz | 2023-Dec-09 | 31.6 KiB |
llm-0.5.2.tar.lz | 2023-Nov-05 | 30.2 KiB |
llm-0.5.1.tar.lz | 2023-Nov-01 | 29.5 KiB |
llm-0.5.0.tar.lz | 2023-Oct-26 | 28.7 KiB |
llm-0.4.0.tar.lz | 2023-Oct-14 | 26.1 KiB |
llm-0.3.0.tar.lz | 2023-Oct-02 | 24.3 KiB |
llm-0.2.1.tar.lz | 2023-Oct-01 | 22.4 KiB |
llm-0.2.tar.lz | 2023-Sep-30 | 22.0 KiB |
llm-0.1.1.tar.lz | 2023-Sep-21 | 21.3 KiB |
News
1. Version 0.15.0
- Move to
plz
backend, which usescurl
. This helps move this package to a stronger foundation backed by parsing to spec. Thanks to Roman Scherer for contributing theplz
extensions that enable this, which are currently bundled in this package but will eventually become their own separate package. - Add model context information for Open AI's GPT 4-o.
- Add model context information for Gemini's 1.5 models.
2. Version 0.14.2
- Fix mangled copyright line (needed to get ELPA version unstuck).
- Fix Vertex response handling bug.
3. Version 0.14.1
- Fix various issues with the 0.14 release
4. Version 0.14
- Introduce new way of creating prompts: llm-make-chat-prompt, deprecating the older ways.
- Improve Vertex error handling
5. Version 0.13
- Add Claude's new support for function calling.
- Refactor of providers to centralize embedding and chat logic.
- Remove connection buffers after use.
- Fixes to provider more specific error messages for most providers.
6. Verson 0.12.3
- Refactor of warn-non-nonfree methods.
- Add non-free warnings for Gemini and Claude.
7. Version 0.12.2
- Send connection issues to error callbacks, and fix an error handling issue in Ollama.
- Fix issue where, in some cases, streaming does not work the first time attempted.
8. Version 0.12.1
- Fix issue in
llm-ollama
with not using provider host for sync embeddings. - Fix issue in
llm-openai
where were incompatible with some Open AI-compatible backends due to assumptions about inconsequential JSON details.
9. Version 0.12.0
- Add provider
llm-claude
, for Anthropic's Claude.
10. Version 0.11.0
- Introduce function calling, now available only in Open AI and Gemini.
- Introduce
llm-capabilities
, which returns a list of extra capabilities for each backend. - Fix issue with logging when we weren't supposed to.
11. Version 0.10.0
- Introduce llm logging (for help with developing against
llm
), setllm-log
to non-nil to enable logging of all interactions with thellm
package. - Change the default interaction with ollama to one more suited for converesations (thanks to Thomas Allen).
12. Version 0.9.1
- Default to the new "text-embedding-3-small" model for Open AI. Important: Anyone who has stored embeddings should either regenerate embeddings (recommended) or hard-code the old embedding model ("text-embedding-ada-002").
- Fix response breaking when prompts run afoul of Gemini / Vertex's safety checks.
- Change Gemini streaming to be the correct URL. This doesn't seem to have an effect on behavior.
13. Version 0.9
- Add
llm-chat-token-limit
to find the token limit based on the model. - Add request timeout customization.
14. Version 0.8
- Allow users to change the Open AI URL, to allow for proxies and other services that re-use the API.
- Add
llm-name
andllm-cancel-request
to the API. - Standardize handling of how context, examples and history are folded into
llm-chat-prompt-interactions
.
15. Version 0.7
- Upgrade Google Cloud Vertex to Gemini - previous models are no longer available.
- Added
gemini
provider, which is an alternate endpoint with alternate (and easier) authentication and setup compared to Cloud Vertex. - Provide default for
llm-chat-async
to fall back to streaming if not defined for a provider.
16. Version 0.6
- Add provider
llm-llamacpp
. - Fix issue with Google Cloud Vertex not responding to messages with a system interaction.
- Fix use of
(pos-eol)
which is not compatible with Emacs 28.1.
17. Version 0.5.2
- Fix incompatibility with older Emacs introduced in Version 0.5.1.
- Add support for Google Cloud Vertex model
text-bison
and variants. llm-ollama
can now be configured with a scheme (http vs https).
18. Version 0.5.1
- Implement token counting for Google Cloud Vertex via their API.
- Fix issue with Google Cloud Vertex erroring on multibyte strings.
- Fix issue with small bits of missing text in Open AI and Ollama streaming chat.
19. Version 0.5
- Fixes for conversation context storage, requiring clients to handle ongoing conversations slightly differently.
… …