Build Your First Human-in-the-Loop AI Agent with NVIDIA NIM

AI agents powered by large language models (LLMs) help organizations streamline and reduce manual workloads. These agents use multilevel, iterative reasoning to analyze problems, devise solutions, and execute tasks with various tools. Unlike traditional chatbots, LLM-powered agents automate complex tasks by effectively understanding and processing information. To avoid potential risks in specific applications, maintaining human oversight remains essential when working with autonomous AI agents.

In this post, you’ll learn how to build a human-in-the-loop AI agent using NVIDIA NIM microservices, an accelerated API optimized for AI inference. The post features a social media use case to showcase how these versatile AI agents can handle complex tasks with ease. With NIM microservices, you can seamlessly integrate advanced LLMs into your workflows, providing the scalability and flexibility required for AI-driven tasks. Whether you‘re creating promotional content or automating complex workflows, this tutorial is designed to accelerate your processes.

To see a demo, watch How to Build a Simple AI Agent in 5 Minutes with NVIDIA NIM.

Building an AI agent for personalized social media content

One of the biggest challenges marketers face today is generating high-quality, creative promotional content across platforms. The goal is to create varied promotional messages and artwork that can be published on social media. 

Traditionally, a project leader assigns these tasks to specialists like content writers and digital artists. But what if AI agents could help make this process more efficient?

This use case involves two AI agents—the Content Creator Agent and the Digital Artist Agent. These AI agents will generate promotional content and submit it to a human decision-maker for final approval, ensuring that human control remains central to the creative process. 

Architecting the human-agent decision-making workflow

Building this human-in-the-loop system involves creating a cognitive workflow where AI agents assist in specific tasks, while humans perform the final decision-making. Figure 1 outlines the interaction between the human decision-maker and the agents.

 Diagram showing the interaction flow between a human decision maker and the AI agents.
Figure 1. Human-agent interaction conceptual architecture

The Content Creator Agent uses the Llama 3.1 405B model, accelerated by NVIDIA LLM NIM microservices. LangChain ChatNVIDIA with NIM functional calling and structured output are also integrated to ensure organized, reliable results. ChatNVIDIA is an-open-source Python library contributed by NVIDIA to LangChain that enables developers to easily connect with NVIDIA NIM. These combined capabilities are consolidated into LangChain runnable chain (LCEL) expressions, creating a robust agent workflow.

Constructing the Content Creator Agent

Begin by constructing the Content Creator Agent. This agent generates promotional messages following specific formatting guidelines, using the NVIDIA API catalog preview API endpoints. NVIDIA AI Enterprise customers can also download and run NIM endpoints locally. 

Use the Python code below to get started:

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain import prompts, chat_models, hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from typing import Optional, List


## 1. construct the system prompt ---------
prompt_template = """
### [INST]


You are an expert social media content creator.
Your task is to create a different promotion message with the following 
Product Description :
------
{product_desc}
------
The output promotion message MUST use the following format :
'''
Title: a powerful, short message that dipict what this product is about 
Message: be creative for the promotion message, but make it short and ready for social media feeds.
Tags: the hash tag human will nomally use in social media
'''
Begin!
[/INST]
 """
prompt = PromptTemplate(
input_variables=['produce_desc'],
template=prompt_template,
)


## 2. provide seeded product_desc text
product_desc="Explore the latest community-built AI models with an API optimized and accelerated by NVIDIA, then deploy anywhere with NVIDIA NIM™ inference microservices."


## 3. structural output using LMFE 
class StructureOutput(BaseModel):     
    Title: str = Field(description="Title of the promotion message")
    Message : str = Field(description="The actual promotion message")
    Tags: List[str] = Field(description="Hashtags for social media, usually starts with #")


## 4. A powerful LLM 
llm_with_output_structure=ChatNVIDIA(model="meta/llama-3.1-405b-instruct").with_structured_output(StructureOutput)     


## construct the content_creator agent
content_creator = ( prompt | llm_with_output_structure )
out=content_creator.invoke({"product_desc":product_desc})

Using the digital artist agent

Next, we introduce the Digital Artist Agent, which transforms promotional text into creative visuals using the NVIDIA sdXL-turbo text-to-image model. This agent rewrites input queries and generates high-quality images designed for social media promotion campaigns. The following code provides an example of how the agent integrates:

import requests
import base64, io
from PIL import Image
import requests, json
def generate_image(prompt :str) -> str :
    """
    generate image from text
    Args:
        prompt: input text
    """
    ## re-writing the input promotion title in to appropriate image_gen prompt 
    gen_prompt=llm_rewrite_to_image_prompts(prompt)
    print("start generating image with llm re-write prompt:", gen_prompt)
    invoke_url = "https://ai.api.nvidia.com/v1/genai/stabilityai/sdxl-turbo"
    
    headers = {
        "Authorization": f"Bearer {nvapi_key}",
        "Accept": "application/json",
    }
    
    payload = {
        "text_prompts": [{"text": gen_prompt}],
        "seed": 0,
        "sampler": "K_EULER_ANCESTRAL",
        "steps": 2
    }
    
    response = requests.post(invoke_url, headers=headers, json=payload)
    
    response.raise_for_status()
    response_body = response.json()
    ## load back to numpy array 
    print(response_body['artifacts'][0].keys())
    imgdata = base64.b64decode(response_body["artifacts"][0]["base64"])
    filename = 'output.jpg'
    with open(filename, 'wb') as f:
        f.write(imgdata)   
    im = Image.open(filename)  
    img_location=f"the output of the generated image will be stored in this path : {filename}"
    return img_location

Use the following Python script to rewrite user input queries into image generation prompts:

from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain import prompts, chat_models, hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate


def llm_rewrite_to_image_prompts(user_query):
    prompt = prompts.ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Summarize the following user query into a very short, one-sentence theme for image generation, MUST follow this format : A iconic, futuristic image of , no text, no amputation, no face, bright, vibrant",
            ),
            ("user", "{input}"),
        ]
    )
    model = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
    chain = ( prompt    | model   | StrOutputParser() )
    out= chain.invoke({"input":user_query})
    #print(type(out))
    return out}

Next, bind the image generation into the selected LLM and wrap it in LCEL to create the Digital Artist Agent:

## bind image generation as tool into llama3.1-405b llm
llm=ChatNVIDIA(model="meta/llama-3.1-405b-instruct")
llm_with_img_gen_tool=llm.bind_tools([generate_image],tool_choice="generate_image")
## use LCEL to construct Digital Artist Agent
digital_artist = (
    llm_with_img_gen_tool
    | output_to_invoke_tools
)

Integrating human-in-the-loop with the role of the decision-maker

To maintain human oversight, the agents will share their outputs for final approval. A human decision-maker will review both the text generated by the Content Creator Agent and the artwork produced by the Digital Artist Agent.

This interaction allows for multiple iterations, ensuring that both the promotional messages and images are polished and ready for deployment.  

The agentic logic places humans at the center as decision-makers, assigning the appropriate agents for each task. LangGraph is used to orchestrate the agentic cognitive architecture.

This involves a function that asks for human input:

# Or you can directly instantiate the tool
from langchain_community.tools import HumanInputRun
from langchain.agents import AgentType, load_tools
from langchain.agents import AgentType, initialize_agent, load_tools


def get_human_input() -> str:
    """ Put human as decision maker, human will decide which agent is best for the task"""
    print("You have been given 2 agents. Please select exactly _ONE_ agent to help you with the task, enter 'y' to confirm your choice.")
    print("""Available agents are : \n
            1 ContentCreator  \n
            2 DigitalArtist \n          
            Enter 1 or 2""")
    contents = []
    while True:
        try:            
            line = input()
            if line=='1':
                tool="ContentCreator"                
                line=tool                
            elif line=='2':
                tool="DigitalArtist"                
                line=tool                
            else:
                pass
            
        except EOFError:
            break
        if line == "y":
            print(f"tool selected : {tool} ")
            break
        contents.append(line)       
    return "\n".join(contents)


# You can modify the tool when loading


ask_human = HumanInputRun(input_func=get_human_input)

Next, create two additional Python functions to serve as graph nodes, which LangGraph uses to represent steps or actions within a workflow. These nodes enable the agent to execute specific tasks sequentially or in parallel, creating a flexible and structured process: 

from langgraph.graph import END, StateGraph
from langgraph.prebuilt import ToolInvocation
from colorama  import Fore,Style
# Define the functions needed 
def human_assign_to_agent(state):
    # ensure using original prompt 
    inputs = state["input"]
    input_to_agent = state["input_to_agent"]
    concatenate_str = Fore.BLUE+inputs+ ' : '+Fore.CYAN+input_to_agent + Fore.RESET
    print(concatenate_str)
    print("---"*10)  
    agent_choice=ask_human.invoke(concatenate_str)
    print(Fore.CYAN+ "choosen_agent : " + agent_choice + Fore.RESET)
    return {"agent_choice": agent_choice }


def agent_execute_task(state):    
    inputs= state["input"]
    input_to_agent = state["input_to_agent"]
    print(Fore.CYAN+input_to_agent + Fore.RESET)
    # choosen agent will execute the task
    choosen_agent = state['agent_choice']
    if choosen_agent=='ContentCreator':
        structured_respond=content_creator.invoke({"product_desc":input_to_agent})
        respond='\n'.join([structured_respond.Title,structured_respond.Message,''.join(structured_respond.Tags)])       
    elif choosen_agent=="DigitalArtist":
        respond=digital_artist.invoke(input_to_agent)
    else:
        respond="please reselect the agent, there are only 2 agents available: 1.ContentCreator or 2.DigitalArtist"
    
    print(Fore.CYAN+ "agent_output: \n" + respond + Fore.RESET)
    return {"agent_use_tool_respond": respond} 

Finally, bring everything together by connecting the nodes and edges to form the human-in-the-loop multi-agent workflow. Once the graph is compiled, you’re ready to proceed:

from langgraph.graph import END, StateGraph


# Define a new graph
workflow = StateGraph(State)


# Define the two nodes 
workflow.add_node("start", human_assign_to_agent)
workflow.add_node("end", agent_execute_task)


# This means that this node is the first one called
workflow.set_entry_point("start")
workflow.add_edge("start", "end")
workflow.add_edge("end", END)


# Finally, we compile it!
# This compiles it into a LangChain Runnable,
# meaning you can use it as you would any other runnable
app = workflow.compile()

Launching the human-agent workflow

Now, launch the app. It prompts you to assign one of the available agents for the given task.

A prompt for writing the promotional text 

First, query the Content Creator Agent to write promotion text, including a title, message, and social media hashtags (Figure 2). Repeat this until satisfied with the output.

Flow diagram illustrating human assigning Content Creator Agent to create promotion text.
Figure 2. A human queries the Content Creator Agent to generate social media promotion text

A Python code sample: 

my_query="create a good promotional message for social promotion events using the following inputs"
product_desc="NVIDIA NIM microservices power GenAI workflow"
respond=app.invoke({"input":my_query, "input_to_agent":product_desc})

The human selects 1 = Content Creator Agent for the task. The agent  executes and returns the agent_output, as shown in Figure 3.

Screenshot of a sample response from Content Creator Agent.
Figure 3. Sample output from invoking Content Creator Agent pipeline

A prompt for creating illustrations

Once satisfied with the results, move on to query the Digital Artist Agent to create artwork for social media promotion (Figure 4).

Flow chart showing human assigning the Digital Artist Agent to create artwork.
Figure 4. Human assigns Digital Artist Agent to generate social media artwork

The following Python code sample uses the title generated by the Content Creator Agent as input for the image prompt:

## taken the output from the Title from the output of Content Creator Agent 
prompt_for_image=respond['agent_use_tool_respond'].split('\n')[0].split(':')[-1].strip()
## Human decision maker give instruction to the agent workflow app
input_query="generate an image for me from the below promotion message"
respond2=app.invoke({"input":input_query, "input_to_agent":prompt_for_image})

The generated image is saved as output.jpg. 

Screenshot of a sample response from Digital Artist Agent.
Figure 5. Sample output from invoking Digital Artist Agent pipeline

Iterating for high-quality results

You can iterate on the generated images to obtain different variations of artworks to get the results you’re looking for (Figure 6). Adjusting the input prompt slightly from the Content Creator Agent can yield diverse images from the Digital Artist Agent.

Three sample images generated by the Digital Artist Agent.
Figure 6. Sample images generated by the Digital Artist Agent

Refining the final product

Finally, perform post-processing and refine the combined outputs from both agents, formatting them in markdown for final visual review (Figure 7).

An image of a robot head and shoulders with a social media post entitled ‘Unlock Next-Gen AI Innovation.'
Figure 7. Post-processed output of a social media post and accompanying image produced by AI agents and approved by a human

In this blog post, you’ve learned how to build a human-in-the-loop AI agent using NVIDIA NIM microservices and LangGraph by LangChain to streamline content creation workflows. By incorporating AI agents into your workflow, you accelerate content production, reduce manual effort, and retain full control over the creative process.

NVIDIA NIM microservices enable you to scale your AI-driven tasks with efficiency and flexibility. Whether you’re crafting promotional messages or designing visuals, human-in-the-loop AI agents provide a powerful solution for optimizing workflows and boosting productivity.

Learn more with these additional resources: 

Latest articles

Related articles