Agentic Economy Glossary

Key terms and definitions for professionals building in the age of autonomous AI — from foundational concepts to emerging roles.

Agent Orchestration

Agent orchestration is the coordination of multiple AI agents, tools, and sub-tasks within a larger workflow — managing how agents communicate, share context, and hand off work to one another.

Orchestration systems determine execution order (sequential, parallel, or conditional), route outputs between agents, and handle failures or retries. Common orchestration patterns include supervisor-worker hierarchies, pipeline chains, and event-driven architectures.

Agentic Economy

The agentic economy is the emerging economic paradigm where autonomous AI agents work alongside humans to execute tasks, make decisions, and create value at scale.

In this model, AI agents are not merely tools that respond to direct commands but active participants in workflows — browsing the web, writing code, booking services, and managing processes with minimal human intervention. The agentic economy is reshaping labour markets, business operations, and the very definition of productivity.

Agentic Operator

An agentic operator is a professional who deploys, monitors, and directs AI agent systems on behalf of a business — acting as the human-in-the-loop supervisor responsible for agent performance and outcomes.

Agentic operators occupy an emerging occupational category between traditional operations roles and AI engineering. They configure agent tasks, review outputs, escalate edge cases, and continuously refine agent instructions to improve reliability. This role is rapidly growing as enterprises adopt autonomous AI workflows.

Agentic Workflow

An agentic workflow is a multi-step automated process in which one or more AI agents execute tasks, evaluate intermediate results, and adapt their actions dynamically until a goal is achieved.

Unlike a rigid scripted automation, an agentic workflow leverages the reasoning capabilities of LLMs to handle variability and ambiguity. Typical examples include research pipelines, code generation-and-test loops, customer support escalation flows, and document processing systems.

AI Agent

An AI agent is a software system powered by a large language model that perceives its environment, reasons about goals, and takes autonomous actions — including calling external tools and APIs — to complete multi-step tasks.

Unlike a simple chatbot, an AI agent maintains memory across steps, plans sequences of actions, and can loop back to re-evaluate its approach when it encounters unexpected results. Agents are the core building block of agentic workflows and the agentic economy.

AI Agent Engineer

An AI Agent Engineer designs, builds, and maintains autonomous AI agent systems — including the orchestration logic, tool integrations, memory layers, and safety constraints that enable agents to operate reliably.

This role combines software engineering, prompt design, and systems thinking. AI Agent Engineers typically work with frameworks such as LangChain, LlamaIndex, AutoGen, or custom orchestration stacks, and they are responsible for ensuring agents behave predictably in production environments.

AI Alignment

AI alignment is the field of research and engineering focused on ensuring that AI systems pursue goals and behave in ways that are consistent with human intentions, values, and welfare.

Misaligned AI systems may achieve their specified objective while causing unintended harm — a problem known as specification gaming. Alignment research spans reward modelling, interpretability, constitutional AI, and reinforcement learning from human feedback (RLHF), and it is considered foundational to safe deployment of increasingly capable agents.

AI Infrastructure

AI infrastructure refers to the hardware, software, and services that underpin the training, deployment, and operation of AI models — including GPUs, model-serving platforms, vector databases, orchestration frameworks, and observability tooling.

As AI moves from research to production, AI infrastructure has become a distinct engineering discipline. Key concerns include latency, throughput, cost-per-token, fault tolerance, and the ability to swap or version models without disrupting downstream applications.

AI Safety

AI safety is the interdisciplinary effort to ensure that advanced AI systems remain beneficial, controllable, and free from catastrophic risks as their capabilities increase.

AI safety encompasses both near-term concerns — such as preventing harmful outputs, jailbreaks, and misuse — and long-term concerns about highly capable or general-purpose AI systems acting in ways that are difficult to predict or reverse. Safety considerations are increasingly integrated into the product development process at leading AI labs.

Autonomous Agent

An autonomous agent is an AI system that pursues objectives over extended time horizons without requiring step-by-step human instruction, self-directing its actions based on environmental feedback and internal reasoning.

Autonomy exists on a spectrum. A fully autonomous agent sets its own sub-goals, manages its own context, and decides when a task is complete. Partial autonomy involves the agent handling individual steps while a human approves higher-level decisions. The degree of autonomy deployed in production is typically calibrated to the risk tolerance of the use case.

Chain of Thought

Chain of thought (CoT) is a prompting technique that encourages a language model to generate intermediate reasoning steps before producing a final answer, improving accuracy on complex tasks.

By prompting the model to 'think step by step,' CoT elicits explicit reasoning traces that break problems into manageable sub-steps. Extended thinking variants — used in models like Claude and o1 — perform this reasoning internally before outputting a response. CoT is particularly effective for mathematical reasoning, multi-step logic, and code generation.

Context Window

The context window is the maximum amount of text (measured in tokens) that a language model can process in a single inference call — encompassing the system prompt, conversation history, retrieved documents, and the model's own output.

Context window size directly determines what information an agent can 'hold in mind' at one time. Larger context windows (e.g. 200k or 1M tokens) enable longer documents and richer conversation histories, but increase inference cost and latency. Effective context management — deciding what to include, summarise, or offload to external memory — is a core concern in agent engineering.

Embeddings

Embeddings are dense numerical vector representations of text (or other data) that encode semantic meaning, such that similar concepts are positioned close together in the resulting high-dimensional space.

Embedding models convert words, sentences, or documents into fixed-length vectors. These vectors are stored in vector databases and used to perform semantic similarity search — a foundation of retrieval-augmented generation. Embeddings are also used for clustering, classification, recommendation, and deduplication tasks.

Fine-Tuning

Fine-tuning is the process of continuing to train a pre-trained language model on a domain-specific or task-specific dataset to improve its performance or adapt its behaviour for a particular application.

Fine-tuned models can produce outputs that more closely match a desired style, vocabulary, or format, and they can encode knowledge that was not present in the original training data. Common fine-tuning methods include supervised fine-tuning (SFT), parameter-efficient fine-tuning (PEFT), and low-rank adaptation (LoRA). Fine-tuning is typically considered after prompt engineering has been exhausted.

Guardrails

Guardrails are constraints, filters, and validation mechanisms applied to the inputs and outputs of AI systems to prevent harmful, inaccurate, off-topic, or policy-violating behaviour.

Input guardrails screen user requests for prompt injection, sensitive data, or disallowed topics. Output guardrails evaluate model responses for toxicity, hallucination, schema compliance, or brand safety before they are delivered to end users. Frameworks such as NVIDIA NeMo Guardrails and LangChain's output parsers provide tooling for implementing these controls programmatically.

Hallucination

Hallucination is the phenomenon where a language model generates text that is factually incorrect, fabricated, or unsupported by the provided context, presented with apparent confidence.

Hallucinations arise because language models are trained to produce statistically plausible continuations of text, not to verify factual accuracy. Common mitigation strategies include retrieval-augmented generation, constrained decoding, model grounding with citations, and post-hoc fact-checking. Hallucination rate is a key quality metric in production AI systems.

Human-in-the-Loop

Human-in-the-loop (HITL) is a design pattern in which a human reviewer approves, corrects, or redirects an AI system's actions at defined checkpoints before execution continues.

HITL mechanisms range from fully manual review of every output to exception-based escalation triggered only when the agent's confidence falls below a threshold. As AI agents take on higher-stakes tasks — such as sending emails, executing financial transactions, or modifying production databases — HITL controls are an essential component of responsible deployment.

Inference

Inference is the process of running a trained AI model on new input to generate a prediction or output — as opposed to training, which updates the model's parameters.

In production systems, inference is the primary compute cost driver. Inference optimisation techniques — including quantisation, batching, speculative decoding, and model distillation — are critical for reducing latency and cost per request. Inference infrastructure providers (e.g. Together AI, Fireworks, Groq) specialise in high-throughput, low-latency model serving.

LLM (Large Language Model)

(LLM)

A large language model is a deep learning model trained on vast quantities of text to understand and generate human language, forming the cognitive core of most modern AI agents and conversational AI systems.

LLMs such as GPT-4o, Claude, Gemini, and Llama learn statistical relationships across billions of tokens, enabling them to perform tasks ranging from code generation and translation to reasoning and summarisation — often without task-specific training. Their emergent capabilities have made them the foundational technology of the agentic economy.

LLM Engineer

(LLM Engineer)

An LLM Engineer is a software engineer specialising in the integration, optimisation, and deployment of large language models into production applications.

LLM Engineers work at the intersection of machine learning and software engineering — selecting and evaluating models, designing prompting strategies, building evaluation pipelines, managing context efficiently, and operating LLM-powered services at scale. The role is distinct from traditional ML engineering in its focus on foundation model capabilities rather than training custom models from scratch.

MLOps

(MLOps)

MLOps (Machine Learning Operations) is the set of practices, tools, and cultural principles for reliably and efficiently deploying, monitoring, and maintaining machine learning models in production.

MLOps applies DevOps principles — such as continuous integration, versioning, and automated testing — to the machine learning lifecycle. In the LLM era, MLOps increasingly encompasses prompt versioning, evaluation harnesses, A/B testing of model variants, token cost monitoring, and drift detection as model providers release new versions.

Model Context Protocol (MCP)

(MCP)

The Model Context Protocol (MCP) is an open standard developed by Anthropic that defines a universal interface for AI models to connect with external data sources, tools, and services in a composable, secure manner.

MCP standardises the way agents discover and call capabilities — replacing the bespoke, fragile integrations that previously existed between LLMs and external systems. By adopting MCP, tool providers can expose their services once and have them consumed by any MCP-compatible agent, dramatically reducing integration overhead and enabling a richer ecosystem of agent capabilities.

Multi-Agent System

A multi-agent system is an architecture in which multiple specialised AI agents collaborate, delegate tasks, and share information to accomplish goals that would be difficult or inefficient for a single agent to handle alone.

Individual agents in a multi-agent system may have different models, tools, personas, or areas of expertise. Coordination patterns include hierarchical (a manager agent delegates to worker agents), peer-to-peer (agents communicate directly), and broadcast (one agent publishes results consumed by others). Multi-agent systems are particularly powerful for parallelising independent sub-tasks and for checks-and-balances architectures where one agent verifies another's work.

Prompt Engineer

A prompt engineer is a specialist who designs, tests, and iterates on the instructions and context provided to language models to elicit accurate, reliable, and useful outputs for specific applications.

Prompt engineering involves understanding how LLMs respond to different phrasings, formats, examples, and constraints. Skilled prompt engineers combine linguistic intuition with empirical testing — using evaluation datasets to measure output quality and iterating systematically. The discipline has grown into a distinct professional practice as organisations deploy LLMs across high-stakes use cases.

Prompt Engineering

Prompt engineering is the practice of crafting inputs to language models — including instructions, examples, personas, and context — to reliably produce desired outputs without modifying the model's underlying parameters.

Core techniques include zero-shot prompting (instruction only), few-shot prompting (instruction with examples), chain-of-thought prompting (eliciting step-by-step reasoning), and role prompting (assigning a persona). As models grow more capable, prompt engineering has become a critical competency for both developers building AI products and business users automating workflows.

RAG (Retrieval-Augmented Generation)

(RAG)

Retrieval-Augmented Generation (RAG) is a technique that enhances language model outputs by dynamically retrieving relevant documents or data from an external knowledge base and injecting them into the model's context before generating a response.

RAG addresses two key limitations of static LLMs: knowledge cutoffs and hallucination on specialised topics. A retriever component — typically backed by a vector database — identifies the most semantically relevant content for a given query. The retrieved passages are then prepended to the prompt, giving the model grounded evidence to draw upon. RAG is the dominant architecture for enterprise knowledge management and question-answering systems.

ReAct Pattern

The ReAct pattern (Reasoning and Acting) is an agent design paradigm in which the model alternates between generating explicit reasoning traces and taking actions — such as calling tools or searching the web — before producing a final answer.

ReAct agents produce a thought step that outlines their plan, followed by an action step that executes a tool call, followed by an observation step that records the result. This cycle repeats until the agent determines it has enough information to answer. ReAct improves transparency and reliability compared to agents that act without showing their reasoning.

Token

A token is the fundamental unit of text that a language model processes — roughly corresponding to four characters or three-quarters of a word in English, though tokenisation varies by model and language.

Language models do not process raw text; they convert text into sequences of integer token IDs using a tokeniser. Token count determines inference cost, context window usage, and the maximum length of inputs and outputs. Understanding tokenisation is essential for managing API costs, designing prompts that fit within context limits, and anticipating model behaviour near token boundaries.

Tool Use / Function Calling

Tool use (also called function calling) is a capability that allows a language model to invoke external functions, APIs, or services during inference — retrieving real-time data, executing code, or taking actions in the world.

A model with tool use can decide mid-generation to call a weather API, run a SQL query, or send an HTTP request, receive the result, and continue its response incorporating that information. This transforms LLMs from passive text generators into active participants in software systems. Tool definitions are passed to the model as structured schemas, and the model returns structured calls that the host application executes.

Vector Database

A vector database is a data store optimised for storing, indexing, and performing similarity searches over high-dimensional embedding vectors at scale.

Traditional databases excel at exact-match lookups, but semantic search requires finding vectors that are mathematically close in a high-dimensional space. Vector databases — such as Pinecone, Weaviate, Qdrant, and pgvector — use approximate nearest-neighbour (ANN) algorithms to return the most semantically similar results efficiently. They are a foundational component of RAG pipelines and long-term agent memory systems.