Category: AI

  • Improving Procedural AI Reliability

    One of the biggest problems I encountered while building my AI-powered meal planning and grocery shopping assistant was getting it to reliably follow all the procedures.

    I had all the skills broken down into bite-size pieces and structure laid out in a sensible way, but the agent would often improvise. When I pressed the agent why they weren’t following the skills, I would usually hear something like:

    “The skills are clearly written, I just didn’t follow them well. I skimmed them, and then improvised. I should have done x, y, z. Should I redo it?”

    The solution wasn’t to write clearer skills. It was to codify correctness so it was enforceable.

    Backend Gates FTW

    The failures were happening because there was no way for me to force the agent to follow the skills or redirect the agent if it made a misstep during the procedure. Backend gates help address this by doing things like:

    • Only give the agent the initial step. It can’t complete its full task without a backend request for the next step.
    • Don’t allow writing to the database without an access token.
    • Disallow writes that don’t have required data.

    In combination, these form powerful gaurdrails to keep your agent on track.

    Applying Backend Gates for Grocery Shopping

    For my case, I wanted the grocery shopping procedure to be split into concrete tasks on smaller single-responsibility agents that can operate in parallel:

    Item Resolution Agent

    Receives the data for what stores to shop at, and what item needs to be searched for. Then it can:

    • Use the store’s API to search for the item
    • Collect the results
    • Hand them to the parent orchestrator

    The agent would discard relevant results, not search thoroughly when it received lackluster results, or not write images.

    Backend Gate

    To address this, I now require it to write its results to the database in a transient database row (another default WordPress feature that majorly helps with AI infrastructure) with the list id, item id and shop cycle id. Now, when it writes its results, if it doesn’t satisfy the backend’s requirements, then it receives instructions on how to improve its results.

    Writing the Selection

    When the orchestrator writes the final selection, it was often discarding results or only writing the final selection instead of offering alternates. Or, capping alternates to 3 when there were really 10 valid alternates.

    Backend Gate

    To improve results, I added these backend gates:

    • Must include alternates array or pass a flag and reason there are no alternates
    • Must include images or a flag and reason why there are no images
    • Must include at least 3 items per store or flag and reason there are fewer than 3 per store

    With how significant the improvements are, I highly recommend using server-side gates to improve agents’ reliability.