Feedback on supporting AI queries in Postgres with EvaDB


We are developing the EvaDB query engine [] for supporting AI queries in Postgres. Our goal is to assist software developers in building AI-powered apps with SQL. EvaDB allows querying data with AI models in this way:

— Convert the super bowl audio to text using a speech-to-text model

CREATE TABLE transcript AS

SELECT SpeechToText(audio) FROM super_bowl_video;

— Run a ChatGPT-based query on the derived text column

SELECT ChatGPT(‘When did touchdowns happen in this game”, text)

FROM transcript;

Here is a more interesting query for analyzing sentiments expressed in food reviews and generating responses for addressing reviews with negative sentiment:

— Analyze sentiments in reviews and respond to “negative” reviews using ChatGPT


                     “Respond to the review with a solution to address the reviewer’s concern”, 


FROM postgres_data.review_table


                    “Is the review positive or negative? Only reply ‘positive’ or ‘negative’.”,  

                    review) = “negative” 

              AND location = “waffle house”;

To process this query, EvaDB’s optimizer pushes the scan operator at the bottom of the query plan down to Postgres (SELECT review FROM postgres_data.review_table;) and takes care of executing the other operators (function expression evaluation, filtering, etc.) []. The optimizer is focused on accelerating queries with AI functions written in Python. For example, it automatically reorders query predicates based on the cost of evaluating them. While processing the review analysis query, EvaDB first filters out tuples from restaurants other than “waffle house”, and then processes the remaining tuples with ChatGPT to speed up the query. 

As such optimizations are done outside Postgres, I am referring to them as “external” query optimization. Besides predicate reordering, EvaDB supports parallel query execution, function caching, etc. for speeding up AI queries [].

We would appreciate feedback on:

  1. How can we better use Postgres to accelerate such AI queries? Is PL/Python relevant for this use-case?

  2. What are some prior “external” query optimization systems tailored for Postgres that we can learn from?

Thanks for your time,


Source link