Skip to main content

Glossary of Terms

This glossary provides definitions for key terms used throughout the Physical AI & Humanoid Robotics textbook.

Key Concept Relationships

Physical AI & Humanoid Robotics Ecosystem
┌─────────────────────────────────────────────────────────────────────────┐
│ HUMANOID ROBOT │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ HARDWARE │ │ SOFTWARE │ │ AI/ML │ │
│ │ │ │ │ │ │ │
│ │ • Sensors │ │ • ROS 2 │ │ • Perception │ │
│ │ • Actuators │ │ • Isaac ROS │ │ • Planning │ │
│ │ • Controllers │ │ • Nav2 │ │ • Control │ │
│ │ • IMU, LiDAR │ │ • Gazebo/Unity │ │ • LLM Integration │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
└─────────────────────────────────┬───────────────────────────────────────┘

┌─────────────────────────────────▼───────────────────────────────────────┐
│ ENVIRONMENT │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
│ │ SIMULATION │ │ PHYSICAL │ │ HUMAN-ROBOT │ │
│ │ │ │ │ │ INTERACTION │ │
│ │ • Gazebo │ │ • Real World │ │ • Voice Commands │ │
│ │ • Isaac Sim │ │ • Physics │ │ • Natural Language │ │
│ │ • Unity │ │ • Gravity │ │ • HRI Principles │ │
│ │ • Domain Rand. │ │ • Obstacles │ │ • Multimodal Input │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
└─────────────────────────────────┬───────────────────────────────────────┘

┌───────▼───────┐
│ TASKS │
│ & GOALS │
└───────────────┘

The following diagram shows how key terms connect across different domains:

ROS 2 ──► Nodes ──► Topics ──► Services ──► Actions
│ │ │ │ │
│ ▼ ▼ ▼ ▼
└─► URDF ──► TF ──► QoS ──► DDS ──► rclpy
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
Robot ──► Transform ──► Communication ──► Middleware ──► Client Libs
Model System Patterns Framework (Python/C++)

AI/ML ──► Perception ──► Planning ──► Control ──► Learning
│ │ │ │ │
│ ▼ ▼ ▼ ▼
└─► Computer ──► Task Planning ──► MPC ──► Reinforcement
Vision Learning

Simulation ──► Gazebo ──► Isaac Sim ──► Unity ──► Physics
│ │ │ │ │
│ ▼ ▼ ▼ ▼
└─► Sensors ──► Rendering ──► Omniverse ──► Dynamics
(LiDAR, (PBR, RT) Platform (ODE, PhysX)
Camera,
IMU)

A

Action Server: A ROS 2 component that handles long-running tasks with feedback and the ability to cancel operations.

AI Perception: The use of artificial intelligence techniques to interpret sensor data and understand the environment.

Any-angle Path Planning: Path planning algorithms that can create paths at any angle, not just grid-aligned paths.

Artificial Intelligence (AI): The simulation of human intelligence processes by machines, especially computer systems.

B

Behavior Tree: A hierarchical structure used to organize and execute complex behaviors in robotics and AI.

Bipedal Locomotion: The act of walking on two legs, as opposed to quadrupedal locomotion on four legs.

Bounding Box: A rectangular or cuboidal volume used to define the extent of an object in 2D or 3D space.

Bridge: In robotics, a connection between different software frameworks or simulation environments.

C

Cognitive Planning: The process of using higher-level reasoning to decompose complex tasks into executable actions.

Commonsense Knowledge: General knowledge about the world that humans typically acquire through experience.

Computer Vision: A field of artificial intelligence that trains computers to interpret and understand the visual world.

Costmap: A representation of the environment used in navigation that assigns costs to different areas based on obstacles and other factors.

Cross-Modal Integration: The combination of information from different sensory modalities (e.g., vision and language).

D

Data Distribution Service (DDS): A middleware protocol and API standard for real-time, scalable, and fault-tolerant machine-to-machine communication.

Deep Learning: A subset of machine learning based on artificial neural networks with representation learning.

Domain Randomization: A technique for training AI models by systematically varying environmental parameters to improve robustness.

Dynamic Window Approach (DWA): A local path planning algorithm that considers robot dynamics and kinematics.

E

Embodied Intelligence: Intelligence that emerges from the interaction between an agent and its physical environment.

Event Camera: A neuromorphic camera that captures changes in brightness asynchronously rather than intensity.

Extensible Markup Language (XML): A markup language that defines rules for encoding documents in a format that is both human-readable and machine-readable.

F

Feature Detection: The process of identifying distinctive points or regions in an image for computer vision tasks.

Field of View (FOV): The extent of the observable world that is seen at any given moment by a camera or sensor.

Forward Kinematics: The use of kinematic equations to compute the position of the end-effector from specified joint parameters.

G

Gazebo: A physics-based simulation environment for robotics research and development.

Generalized Coherence: A measure of the linear relationship between two signals as a function of frequency.

Geometric Reasoning: The ability to understand and manipulate spatial relationships and properties.

H

Hardware-in-the-Loop (HIL): A testing technique that connects real hardware components to a simulation environment.

Human-Robot Interaction (HRI): The study of interactions between humans and robots.

Humanoid Robot: A robot with a body structure similar to that of a human.

I

Inertial Measurement Unit (IMU): An electronic device that measures and reports a body's specific force, angular rate, and sometimes the magnetic field surrounding the body.

Instance Segmentation: The computer vision task of identifying each object instance in an image at the pixel level.

Isaac ROS: NVIDIA's collection of GPU-accelerated packages that enhance robotic perception and navigation capabilities.

Isaac Sim: NVIDIA's high-fidelity simulation environment built on the Omniverse platform.

Iterative Closest Point (ICP): An algorithm that minimizes the distance between two point clouds.

J

Joint State: The position, velocity, and effort values of a robot's joints at a given time.

JSON Web Token (JWT): A compact, URL-safe means of representing claims to be transferred between two parties.

K

Kinematics: The branch of mechanics concerned with the motion of objects without reference to the forces that cause the motion.

Knowledge Graph: A structured representation of knowledge that describes relationships between entities.

L

Large Language Model (LLM): A language model with many parameters that is trained on large amounts of text data.

Laser Detection and Ranging (LiDAR): A remote sensing method that uses light in the form of a pulsed laser to measure distances.

Latent Semantic Analysis (LSA): A technique in natural language processing for analyzing relationships between a set of documents and the terms they contain.

Linear Matrix Inequalities (LMI): A mathematical framework used in control theory and optimization.

Local Binary Pattern (LBP): A type of visual descriptor used for classification in computer vision.

M

Machine Learning (ML): The study of computer algorithms that can improve automatically through experience and data.

Manipulation: The ability of a robot to physically interact with and control objects in its environment.

Markov Decision Process (MDP): A mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker.

Model Predictive Control (MPC): An advanced method of process control that uses a model of the system to predict future behavior.

Modular Architecture: A design approach that breaks down a system into separate, interchangeable modules.

N

Navigation 2 (Nav2): The next-generation navigation framework for ROS 2, designed for robust and flexible path planning.

Natural Language Processing (NLP): A subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language.

Neural Radiance Fields (NeRF): A method for synthesizing novel views of complex 3D scenes based on a sparse set of 2D images.

Non-Maximum Suppression: An algorithm used in computer vision to eliminate suboptimal detection results.

O

Object Detection: A computer vision technique for identifying objects in images and videos.

Occupancy Grid: A probabilistic representation of space that divides the environment into discrete cells.

Open Dynamic Simulator (ODE): A free, industrial quality library for simulating articulated rigid body dynamics.

Open Graphics Library (OpenGL): A cross-language, cross-platform application programming interface for rendering 2D and 3D vector graphics.

Open Robot Operating System (OpenRAVE): A robotics simulation environment focused on kinematic and geometric analysis.

Open Sound System (OSS): A software interface for sound card device drivers.

OpenCL (Open Computing Language): An open standard for parallel programming of heterogeneous platforms.

OpenCV (Open Source Computer Vision Library): An open-source computer vision and machine learning software library.

Omniverse Platform: NVIDIA's real-time 3D design collaboration and virtual world simulation platform.

P

Perception Pipeline: A sequence of processing steps that convert sensor data into meaningful information about the environment.

Physically Based Rendering (PBR): A shading and lighting method that more accurately simulates how light interacts with surfaces.

Point Cloud: A collection of data points in a coordinate system, typically representing the external surface of an object.

Pose Estimation: The process of determining the position and orientation of an object in 3D space.

Probabilistic Robotics: An approach to robotics that explicitly handles uncertainty using probability theory.

Python Client Library (rclpy): The Python client library for ROS 2.

Q

Quality of Service (QoS): A set of policies that define the behavior of ROS 2 publishers and subscribers.

Quaternion: A mathematical concept used to represent rotations in 3D space.

R

Range Sensor: A sensor that measures distances to objects in the environment.

Ray Tracing: A rendering technique for generating realistic images by tracing the path of light.

Real-time: Systems that process data and respond within strict time constraints.

Recurrent Neural Network (RNN): A class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence.

Reinforcement Learning: A type of machine learning where an agent learns to make decisions by interacting with an environment.

Robot Operating System (ROS): A flexible framework for writing robot software that provides services designed for a heterogeneous computer cluster.

ROS Bridge: Software that connects different robotics software frameworks, often connecting simulation to ROS.

ROS 2: The second generation of the Robot Operating System with improved security, real-time capabilities, and scalability.

ROS Package: A reusable software module in ROS that contains nodes, libraries, and other resources.

ROS Node: A process that performs computation in ROS, which can be written in any of several supported programming languages.

ROS Topic: A named bus over which nodes exchange messages in the publish-subscribe communication pattern.

ROS Service: A synchronous request-response communication pattern in ROS.

ROS Action: An asynchronous communication pattern for long-running tasks with feedback in ROS.

Rigid Body Dynamics: The study of the motion of rigid bodies under the influence of forces and torques.

Robotics Middleware: Software that provides services for robot applications, such as communication, coordination, and resource management.

Robotics Operating System (ROS): A flexible framework for writing robot software that provides services designed for a heterogeneous computer cluster.

RGB-D: A combination of color (RGB) and depth (D) information for 3D scene understanding.

Reactive System: A system that responds to changes in its environment in real-time.

S

Semantic Segmentation: The computer vision task of classifying each pixel in an image into a category.

Simultaneous Localization and Mapping (SLAM): The computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it.

Simulation: The imitation of the operation of a real-world process or system over time.

Spatial Reasoning: The ability to understand and manipulate spatial relationships and properties.

Speech Recognition: The ability of a machine to identify and interpret human speech.

Standard Template Library (STL): A software library for the C++ programming language that provides common programming data structures and functions.

Stereophonic Audio: Sound reproduction that creates an impression of multi-directional audible perspective.

Sensor Fusion: The combining of sensory data or data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources were used individually.

Synchronous Communication: Communication where the sender and receiver are coordinated in time.

System of Systems (SoS): A collection of task-oriented or dedicated systems that pool their resources and capabilities together to create a new, more complex system.

T

Tensor Processing Unit (TPU): An application-specific integrated circuit developed by Google for machine learning.

Time Elastic Band (TEB): A trajectory optimization method for path planning with time constraints.

Timed Elastic Band (TEB): A trajectory optimization method for path planning with time constraints.

TensorRT: NVIDIA's inference optimizer and runtime for deep learning applications.

Task Planning: The process of decomposing high-level goals into sequences of executable actions.

Trajectory Optimization: The process of finding the optimal path or trajectory for a system.

Temporal Reasoning: The ability to understand and manipulate time-based relationships and properties.

Transform (TF): A component in ROS that keeps track of coordinate frame relationships over time.

U

Universal Scene Description (USD): A scalable 3D scene description and file format developed by Pixar.

Unified Memory: A memory management feature in CUDA that allows the same memory addresses to be accessed by both CPU and GPU.

Unified Robot Description Format (URDF): An XML format for representing robot models in ROS.

User Experience (UX): The overall experience of a person using a product such as a website or computer application.

V

Vision-Language-Action (VLA): Systems that integrate visual perception, language understanding, and physical action.

Visual SLAM (VSLAM): Visual Simultaneous Localization and Mapping using camera imagery.

Virtual Reality (VR): A simulated experience that can be similar to or completely different from the real world.

Voxel: A volume element, representing a value on a regular grid in three-dimensional space.

Volumetric Representation: A 3D representation that captures the internal structure of objects, not just their surfaces.

W

Waveform Generation: The creation of audio signals with specific characteristics.

World Model: A representation of the environment used by an agent for planning and decision-making.

Wheeled Locomotion: The act of moving using wheels as the primary means of ground contact.

X

XML Schema: A way to describe and validate the structure and content of XML documents.

Y

Yaw: The rotation of an object around its vertical axis.

Z

Zero Moment Point (ZMP): A criterion for static and dynamic stability of legged robots.

Z-buffer: A component in computer graphics that stores depth information for 3D rendering.

Z-order: The order in which objects are drawn in 2D graphics, determining which objects appear in front of others.

Accessibility Features

This glossary includes the following accessibility features:

  • Semantic HTML structure with proper heading hierarchy (H1, H2, H3)
  • Sufficient color contrast for text and background
  • Clear navigation structure with logical tab order
  • Descriptive headings and section titles
  • Keyboard navigable interactive elements