This package can cache the responses of artificial intelligence service prompt requests.
It provides a class that has a function that takes a prompt text and a callback function that can call an artificial intelligence client class to send a request to a service that provides a response to the prompt request.
The class can store the response in a cache storage container, so next time the same artificial intelligence service is called to obtain a response for the same text prompt, the class retrieves the response from the cache storage container.
Currently it can:
- Store and retrieve cached responses for prompts that match exactly the prompt text used to obtain a previous response
- Store and retrieve cached responses for prompts that are semantically similar to prompt text used to obtain a previous response when an exact prompt text match was not previously cached.
- Store and retrieve cached responses for prompts that use cosine similarity with prompt text used to obtain a previous response when an exact prompt text match was not previously cached. The cosine threshold is configurable.
- Stream the responses to split each response into small chunks that can be obtained using multiple retrieve calls to get the whole response.
- Store responses using several different cache driver classes to use different types of storage containers: SQLite, JSON file, and Redis.
- Multiple artificial intelligence service providers: OpenAI, Ollama, and a built-in deterministic fallback.
- Provide statistic counters for requests, exact hits, semantic hits, misses, an estimate of tokens saved, and an approximate USD figure. |