AI agents powered by large language models (LLMs) help organizations streamline and reduce manual workloads. These agents use multilevel, iterative reasoning to analyze problems, devise solutions, and execute tasks with various tools. Unlike traditional chatbots, LLM-powered agents automate complex tasks by effectively understanding and processing information. To avoid potential risks in specific applications, maintaining human oversight remains essential when working with autonomous AI agents.
In this post, you’ll learn how to build a human-in-the-loop AI agent using NVIDIA NIM microservices, an accelerated API optimized for AI inference. The post features a social media use case to showcase how these versatile AI agents can handle complex tasks with ease. With NIM microservices, you can seamlessly integrate advanced LLMs into your workflows, providing the scalability and flexibility required for AI-driven tasks. Whether you‘re creating promotional content or automating complex workflows, this tutorial is designed to accelerate your processes.
To see a demo, watch How to Build a Simple AI Agent in 5 Minutes with NVIDIA NIM.
Building an AI agent for personalized social media content
One of the biggest challenges marketers face today is generating high-quality, creative promotional content across platforms. The goal is to create varied promotional messages and artwork that can be published on social media.
Traditionally, a project leader assigns these tasks to specialists like content writers and digital artists. But what if AI agents could help make this process more efficient?
This use case involves two AI agents—the Content Creator Agent and the Digital Artist Agent. These AI agents will generate promotional content and submit it to a human decision-maker for final approval, ensuring that human control remains central to the creative process.
Architecting the human-agent decision-making workflow
Building this human-in-the-loop system involves creating a cognitive workflow where AI agents assist in specific tasks, while humans perform the final decision-making. Figure 1 outlines the interaction between the human decision-maker and the agents.
The Content Creator Agent uses the Llama 3.1 405B model, accelerated by NVIDIA LLM NIM microservices. LangChain ChatNVIDIA with NIM functional calling and structured output are also integrated to ensure organized, reliable results. ChatNVIDIA is an-open-source Python library contributed by NVIDIA to LangChain that enables developers to easily connect with NVIDIA NIM. These combined capabilities are consolidated into LangChain runnable chain (LCEL) expressions, creating a robust agent workflow.
Constructing the Content Creator Agent
Begin by constructing the Content Creator Agent. This agent generates promotional messages following specific formatting guidelines, using the NVIDIA API catalog preview API endpoints. NVIDIA AI Enterprise customers can also download and run NIM endpoints locally.
Use the Python code below to get started:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain import prompts, chat_models, hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from typing import Optional, List
## 1. construct the system prompt ---------
prompt_template = """
### [INST]
You are an expert social media content creator.
Your task is to create a different promotion message with the following
Product Description :
------
{product_desc}
------
The output promotion message MUST use the following format :
'''
Title: a powerful, short message that dipict what this product is about
Message: be creative for the promotion message, but make it short and ready for social media feeds.
Tags: the hash tag human will nomally use in social media
'''
Begin!
[/INST]
"""
prompt = PromptTemplate(
input_variables=['produce_desc'],
template=prompt_template,
)
## 2. provide seeded product_desc text
product_desc="Explore the latest community-built AI models with an API optimized and accelerated by NVIDIA, then deploy anywhere with NVIDIA NIM™ inference microservices."
## 3. structural output using LMFE
class StructureOutput(BaseModel):
Title: str = Field(description="Title of the promotion message")
Message : str = Field(description="The actual promotion message")
Tags: List[str] = Field(description="Hashtags for social media, usually starts with #")
## 4. A powerful LLM
llm_with_output_structure=ChatNVIDIA(model="meta/llama-3.1-405b-instruct").with_structured_output(StructureOutput)
## construct the content_creator agent
content_creator = ( prompt | llm_with_output_structure )
out=content_creator.invoke({"product_desc":product_desc})
Using the digital artist agent
Next, we introduce the Digital Artist Agent, which transforms promotional text into creative visuals using the NVIDIA sdXL-turbo text-to-image model. This agent rewrites input queries and generates high-quality images designed for social media promotion campaigns. The following code provides an example of how the agent integrates:
import requests
import base64, io
from PIL import Image
import requests, json
def generate_image(prompt :str) -> str :
"""
generate image from text
Args:
prompt: input text
"""
## re-writing the input promotion title in to appropriate image_gen prompt
gen_prompt=llm_rewrite_to_image_prompts(prompt)
print("start generating image with llm re-write prompt:", gen_prompt)
invoke_url = "https://ai.api.nvidia.com/v1/genai/stabilityai/sdxl-turbo"
headers = {
"Authorization": f"Bearer {nvapi_key}",
"Accept": "application/json",
}
payload = {
"text_prompts": [{"text": gen_prompt}],
"seed": 0,
"sampler": "K_EULER_ANCESTRAL",
"steps": 2
}
response = requests.post(invoke_url, headers=headers, json=payload)
response.raise_for_status()
response_body = response.json()
## load back to numpy array
print(response_body['artifacts'][0].keys())
imgdata = base64.b64decode(response_body["artifacts"][0]["base64"])
filename = 'output.jpg'
with open(filename, 'wb') as f:
f.write(imgdata)
im = Image.open(filename)
img_location=f"the output of the generated image will be stored in this path : {filename}"
return img_location
Use the following Python script to rewrite user input queries into image generation prompts:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain import prompts, chat_models, hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
def llm_rewrite_to_image_prompts(user_query):
prompt = prompts.ChatPromptTemplate.from_messages(
[
(
"system",
"Summarize the following user query into a very short, one-sentence theme for image generation, MUST follow this format : A iconic, futuristic image of , no text, no amputation, no face, bright, vibrant",
),
("user", "{input}"),
]
)
model = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")
chain = ( prompt | model | StrOutputParser() )
out= chain.invoke({"input":user_query})
#print(type(out))
return out}
Next, bind the image generation into the selected LLM and wrap it in LCEL to create the Digital Artist Agent:
## bind image generation as tool into llama3.1-405b llm
llm=ChatNVIDIA(model="meta/llama-3.1-405b-instruct")
llm_with_img_gen_tool=llm.bind_tools([generate_image],tool_choice="generate_image")
## use LCEL to construct Digital Artist Agent
digital_artist = (
llm_with_img_gen_tool
| output_to_invoke_tools
)
Integrating human-in-the-loop with the role of the decision-maker
To maintain human oversight, the agents will share their outputs for final approval. A human decision-maker will review both the text generated by the Content Creator Agent and the artwork produced by the Digital Artist Agent.
This interaction allows for multiple iterations, ensuring that both the promotional messages and images are polished and ready for deployment.
The agentic logic places humans at the center as decision-makers, assigning the appropriate agents for each task. LangGraph is used to orchestrate the agentic cognitive architecture.
This involves a function that asks for human input:
# Or you can directly instantiate the tool
from langchain_community.tools import HumanInputRun
from langchain.agents import AgentType, load_tools
from langchain.agents import AgentType, initialize_agent, load_tools
def get_human_input() -> str:
""" Put human as decision maker, human will decide which agent is best for the task"""
print("You have been given 2 agents. Please select exactly _ONE_ agent to help you with the task, enter 'y' to confirm your choice.")
print("""Available agents are : \n
1 ContentCreator \n
2 DigitalArtist \n
Enter 1 or 2""")
contents = []
while True:
try:
line = input()
if line=='1':
tool="ContentCreator"
line=tool
elif line=='2':
tool="DigitalArtist"
line=tool
else:
pass
except EOFError:
break
if line == "y":
print(f"tool selected : {tool} ")
break
contents.append(line)
return "\n".join(contents)
# You can modify the tool when loading
ask_human = HumanInputRun(input_func=get_human_input)
Next, create two additional Python functions to serve as graph nodes, which LangGraph uses to represent steps or actions within a workflow. These nodes enable the agent to execute specific tasks sequentially or in parallel, creating a flexible and structured process:
from langgraph.graph import END, StateGraph
from langgraph.prebuilt import ToolInvocation
from colorama import Fore,Style
# Define the functions needed
def human_assign_to_agent(state):
# ensure using original prompt
inputs = state["input"]
input_to_agent = state["input_to_agent"]
concatenate_str = Fore.BLUE+inputs+ ' : '+Fore.CYAN+input_to_agent + Fore.RESET
print(concatenate_str)
print("---"*10)
agent_choice=ask_human.invoke(concatenate_str)
print(Fore.CYAN+ "choosen_agent : " + agent_choice + Fore.RESET)
return {"agent_choice": agent_choice }
def agent_execute_task(state):
inputs= state["input"]
input_to_agent = state["input_to_agent"]
print(Fore.CYAN+input_to_agent + Fore.RESET)
# choosen agent will execute the task
choosen_agent = state['agent_choice']
if choosen_agent=='ContentCreator':
structured_respond=content_creator.invoke({"product_desc":input_to_agent})
respond='\n'.join([structured_respond.Title,structured_respond.Message,''.join(structured_respond.Tags)])
elif choosen_agent=="DigitalArtist":
respond=digital_artist.invoke(input_to_agent)
else:
respond="please reselect the agent, there are only 2 agents available: 1.ContentCreator or 2.DigitalArtist"
print(Fore.CYAN+ "agent_output: \n" + respond + Fore.RESET)
return {"agent_use_tool_respond": respond}
Finally, bring everything together by connecting the nodes and edges to form the human-in-the-loop multi-agent workflow. Once the graph is compiled, you’re ready to proceed:
from langgraph.graph import END, StateGraph
# Define a new graph
workflow = StateGraph(State)
# Define the two nodes
workflow.add_node("start", human_assign_to_agent)
workflow.add_node("end", agent_execute_task)
# This means that this node is the first one called
workflow.set_entry_point("start")
workflow.add_edge("start", "end")
workflow.add_edge("end", END)
# Finally, we compile it!
# This compiles it into a LangChain Runnable,
# meaning you can use it as you would any other runnable
app = workflow.compile()
Launching the human-agent workflow
Now, launch the app. It prompts you to assign one of the available agents for the given task.
A prompt for writing the promotional text
First, query the Content Creator Agent to write promotion text, including a title, message, and social media hashtags (Figure 2). Repeat this until satisfied with the output.
A Python code sample:
my_query="create a good promotional message for social promotion events using the following inputs"
product_desc="NVIDIA NIM microservices power GenAI workflow"
respond=app.invoke({"input":my_query, "input_to_agent":product_desc})
The human selects 1 = Content Creator Agent for the task. The agent executes and returns the agent_output
, as shown in Figure 3.
A prompt for creating illustrations
Once satisfied with the results, move on to query the Digital Artist Agent to create artwork for social media promotion (Figure 4).
The following Python code sample uses the title generated by the Content Creator Agent as input for the image prompt:
## taken the output from the Title from the output of Content Creator Agent
prompt_for_image=respond['agent_use_tool_respond'].split('\n')[0].split(':')[-1].strip()
## Human decision maker give instruction to the agent workflow app
input_query="generate an image for me from the below promotion message"
respond2=app.invoke({"input":input_query, "input_to_agent":prompt_for_image})
The generated image is saved as output.jpg.
Iterating for high-quality results
You can iterate on the generated images to obtain different variations of artworks to get the results you’re looking for (Figure 6). Adjusting the input prompt slightly from the Content Creator Agent can yield diverse images from the Digital Artist Agent.
Refining the final product
Finally, perform post-processing and refine the combined outputs from both agents, formatting them in markdown for final visual review (Figure 7).
In this blog post, you’ve learned how to build a human-in-the-loop AI agent using NVIDIA NIM microservices and LangGraph by LangChain to streamline content creation workflows. By incorporating AI agents into your workflow, you accelerate content production, reduce manual effort, and retain full control over the creative process.
NVIDIA NIM microservices enable you to scale your AI-driven tasks with efficiency and flexibility. Whether you’re crafting promotional messages or designing visuals, human-in-the-loop AI agents provide a powerful solution for optimizing workflows and boosting productivity.
Learn more with these additional resources: