/ NEWS

Introducing the prompt() Function:Enhancing SQL with LLMs

MotherDuck has launched the prompt() function, a groundbreaking feature that integrates large language models (LLMs) directly into SQL, allowing users to generate, summarize, and extract structured data efficiently. This new capability democratizes access to advanced natural language processing techniques, making it easier for developers to leverage AI in their data workflows.

In recent years, the costs associated with running large language models (LLMs) have significantly decreased, making advanced natural language processing techniques more accessible. The emergence of small language models (SLMs) like OpenAI’s gpt-4o-mini has further contributed to this trend, enabling cost-effective integration of AI capabilities into various applications. MotherDuck is excited to announce the introduction of the prompt() function, which simplifies the use of LLMs and SLMs within SQL queries, allowing users to generate and summarize text as well as extract structured data without needing separate infrastructure.

The prompt() function currently supports OpenAI's gpt-4o-mini and gpt-4o models, providing flexibility in terms of cost-effectiveness and performance. In its preview release, users can apply gpt-4o-mini-based prompts to all rows in a table, unlocking use cases such as bulk text summarization and structured data extraction. The function allows for single-row and constant inputs with gpt-4o, enabling high-quality responses ideal for retrieval-augmented generation (RAG) scenarios. Users can specify the model for inference using the optional (model:=) parameter, which enhances customization based on specific needs. Additionally, the prompt function supports returning structured output through parameters like struct</strong> and struct_descr, facilitating integration into analytical workflows.</p>

The prompt() function is designed for various practical applications. One prominent use case is text summarization; for instance, users can retrieve concise summaries from a text column in a dataset by executing a simple SQL query that applies the prompt function across multiple rows. In testing, processing 100 rows took approximately 2.8 seconds, showcasing significant speed improvements compared to traditional methods that would take hours without concurrency. Another compelling application is converting unstructured data into structured formats. By utilizing the struct and struct_descr parameters, users can extract specific information—such as topics, sentiments, and technologies mentioned in comments—transforming raw text into easily analyzable structured outputs.

The performance of the prompt() function is noteworthy; it allows for concurrent processing of multiple requests to enhance efficiency significantly. For example, when extracting structured information from a dataset like Hacker News, users can achieve high accuracy while maintaining reasonable compute costs. However, it is essential to consider practical limitations; for certain tasks like extracting email addresses from text, traditional SQL methods such as regex may be more efficient and reliable than using LLMs.

The prompt() function is currently available in Preview for MotherDuck users on both Free Trial and Standard Plans. To start exploring this functionality, users are encouraged to consult the documentation provided by MotherDuck. Notably, while running the prompt() function over large datasets can incur higher compute costs compared to other analytical queries, MotherDuck has set quotas to manage usage effectively—Free Trial users receive 40 compute unit hours per day while Standard Plan users have similar limits that can be adjusted upon request.

The introduction of the prompt() function represents a significant advancement in integrating LLMs with SQL databases. By enabling users to harness the power of AI within their existing workflows seamlessly, MotherDuck is paving the way for more efficient data processing and analysis. As developers explore this new functionality, they are invited to share their experiences and feedback through community channels to foster collaboration and innovation.

For further details on how to utilize the prompt() function and its capabilities, visit MotherDuck's official website or check official announcement.