Panduan komprehensif dari sensor hingga reinforcement learning — arsitektur, algoritma, dan implementasi nyata sistem robot cerdas.
Kecerdasan buatan (AI) dalam robotika adalah bidang interdisipliner yang menggabungkan ilmu komputer, teknik mekanika, elektronika, dan matematika untuk menciptakan mesin yang mampu berinteraksi secara cerdas dengan lingkungannya. Robot modern bukan sekadar otomasi mekanik — mereka adalah agen kognitif yang dapat mempersepsi, memutuskan, dan bertindak secara adaptif.
Sejak Alan Turing mengemukakan konsep "Thinking Machine" pada 1950 hingga era Large Language Models dan Embodied AI saat ini, perjalanan AI dalam robotika telah melalui revolusi paradigma yang mendasar.
Turing Test, Mesin Von Neumann, robot elektro-mekanik pertama (Unimate di GM, 1961).
SHAKEY (Stanford), WABOT-1 (Waseda), robot industri berbasis logika simbolik, algoritma A*.
Kalman Filter, Particle Filter, Bayes Network. Robot Cye, AIBO (Sony), Roomba prototype.
AlexNet (2012), CNN untuk visi robot, SLAM modern, ROS dirilis (2007), DARPA Challenge.
RT-2, PaLM-E, AlphaFold untuk robotika protein, sim-to-real transfer, LLM sebagai planner.
Manipulator 6-DOF, welding robot, pick-and-place. Presisi tinggi, lingkungan terstruktur.
AMR/AGV di gudang, kendaraan otonom, drone. Navigasi dinamis, obstacle avoidance.
Atlas (Boston Dynamics), Optimus (Tesla), H1 (Unitree). Locomotion biped, dexterous manipulation.
da Vinci System, haptic feedback, tremor filtering. AI untuk segmentasi jaringan real-time.
AUV/ROV, drone surveying. GPS-denied navigation, visual-inertial odometry.
Tentacle robot, pneumatic gripper, robot lebah. Compliance control, morfologi adaptif.
Arsitektur sebuah robot cerdas dapat didekomposisi menjadi tiga lapisan utama yang saling berinteraksi: lapisan persepsi, lapisan pemrosesan kognitif, dan lapisan aktuasi. Ketiganya beroperasi dalam loop kontrol tertutup (closed-loop control).
// Pipeline Kognitif Robot
Feedback loop: aktuasi mengubah state lingkungan → sensor membaca ulang → siklus berlanjut
| Komponen | Contoh | Fungsi AI | Kompleksitas |
|---|---|---|---|
| CPU/GPU/TPU | NVIDIA Jetson Orin, RPi 5 | Inferensi model neural network real-time | Tinggi |
| IMU | ICM-42688, BMI088 | State estimation, dead reckoning | Rendah |
| LiDAR | Ouster OS1, Livox MID-360 | 3D point cloud, SLAM, object detection | Sedang |
| Depth Camera | Intel RealSense D435i, ZED 2 | RGB-D SLAM, grasp planning | Sedang |
| Force/Torque | ATI Mini45, OnRobot HEX-E | Compliant control, contact detection | Sedang |
| Motor Controller | ODrive, VESC, Moteus | Low-level torque control, FOC | Tinggi |
Persepsi adalah kemampuan robot untuk menginterpretasikan sinyal sensor menjadi representasi lingkungan yang berguna untuk pengambilan keputusan. Karena setiap sensor memiliki noise dan keterbatasan, sensor fusion menggabungkan multiple modalitas untuk meningkatkan akurasi dan ketahanan.
EKF adalah metode paling umum untuk menggabungkan data IMU (frekuensi tinggi, drift) dengan data GPS atau LiDAR (frekuensi rendah, akurat absolut).
# EKF IMU + Odometry fusion (simplified)
import numpy as np
class EKFRobot:
def __init__(self):
# State: [x, y, theta, vx, vy, omega]
self.x = np.zeros(6)
self.P = np.eye(6) * 0.1 # covariance
self.Q = np.diag([0.01]*6) # process noise
self.R = np.diag([0.05, 0.05]) # measurement noise
self.dt = 0.01 # 100Hz IMU
def predict(self, accel, gyro):
# Motion model: integrate IMU
th = self.x[2]
F = np.eye(6)
F[0,3] = self.dt * np.cos(th)
F[1,4] = self.dt * np.sin(th)
F[2,5] = self.dt
self.x[0] += self.x[3] * self.dt
self.x[1] += self.x[4] * self.dt
self.x[2] += self.x[5] * self.dt
self.x[3] += accel[0] * self.dt
self.x[4] += accel[1] * self.dt
self.x[5] = gyro[2]
self.P = F @ self.P @ F.T + self.Q
return self.x.copy()
def update(self, gps_pos):
# Measurement: GPS [x, y]
H = np.zeros((2,6))
H[0,0] = H[1,1] = 1.0
y = gps_pos - H @ self.x
S = H @ self.P @ H.T + self.R
K = self.P @ H.T @ np.linalg.inv(S)
self.x += K @ y
self.P = (np.eye(6) - K @ H) @ self.P
return self.x.copy()
SLAM memecahkan masalah ayam-telur: robot perlu peta untuk lokalisasi, tetapi perlu lokalisasi untuk membuat peta. Solusi modern menggunakan faktor graf (factor graph optimization).
| Algoritma SLAM | Sensor | Keunggulan | Use Case |
|---|---|---|---|
| GMapping | 2D LiDAR | Simpel, stabil | Indoor 2D |
| Cartographer | 2D/3D LiDAR | Loop closure, Google | Warehouse |
| LIO-SAM | LiDAR + IMU | Akurasi tinggi outdoor | Outdoor 3D |
| ORB-SLAM3 | Mono/Stereo/RGBD | Visual, tanpa LiDAR | Low-cost |
| RTAB-Map | RGBD + LiDAR | Multi-modal, ROS native | Research |
| KISS-ICP | LiDAR | Tanpa tuning, real-time | Production |
Persepsi visual menggunakan deep learning CNN/Transformer untuk menyelesaikan tugas persepsi tingkat tinggi:
YOLOv8/v11, RT-DETR. Bounding box + klasifikasi real-time untuk obstacle & target identification.
SegFormer, SAM2. Klasifikasi per-piksel untuk pemahaman scene: lantai, dinding, objek.
6-DoF object pose (FoundationPose), human pose (MediaPipe). Kritis untuk grasping.
Depth Anything v2, ZoeDepth. Monocular depth untuk robot tanpa sensor 3D aktif.
ML dalam robotika mencakup spektrum luas dari supervised learning untuk klasifikasi sensor hingga reinforcement learning untuk policy yang kompleks. Berikut adalah kategori utama beserta algoritma dan implementasinya.
import torch
from ultralytics import YOLO
import cv2
class RobotVision:
def __init__(self, model_path='yolov8n.pt'):
self.model = YOLO(model_path)
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
self.model.to(self.device)
print(f"[RobotVision] Running on {self.device}")
def detect(self, frame: np.ndarray, conf_threshold=0.5):
results = self.model(frame, conf=conf_threshold, verbose=False)[0]
detections = []
for box in results.boxes:
x1,y1,x2,y2 = box.xyxy[0].cpu().numpy()
detections.append({
'class': results.names[int(box.cls)],
'conf': float(box.conf),
'bbox': (x1,y1,x2,y2),
'center': ((x1+x2)/2, (y1+y2)/2)
})
return detections
def compute_grasp_center(self, detection, depth_img):
# Project pixel center to 3D using depth
cx, cy = [int(v) for v in detection['center']]
depth = depth_img[cy, cx]
# Camera intrinsics (fx,fy,cx0,cy0) — replace with actual
fx, fy, cx0, cy0 = 615.0, 615.0, 320.0, 240.0
X = (cx - cx0) * depth / fx
Y = (cy - cy0) * depth / fy
Z = depth
return np.array([X, Y, Z]) / 1000.0 # mm → m
RL memungkinkan robot belajar policy melalui trial-and-error dalam simulasi, kemudian di-transfer ke hardware nyata (sim-to-real transfer).
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import SubprocVecEnv
from stable_baselines3.common.callbacks import EvalCallback
def make_env(env_id, seed):
def _init():
env = gym.make(env_id)
env.reset(seed=seed)
return env
return _init
# Parallel env untuk training lebih cepat
n_envs = 8
envs = SubprocVecEnv([make_env("AntBulletEnv-v0", i) for i in range(n_envs)])
model = PPO(
"MlpPolicy", envs,
learning_rate = 3e-4,
n_steps = 2048,
batch_size = 512,
n_epochs = 10,
gamma = 0.99,
gae_lambda = 0.95,
clip_range = 0.2,
ent_coef = 0.005, # entropy bonus
policy_kwargs = dict(net_arch=[256,256]),
verbose = 1,
device = "cuda"
)
eval_cb = EvalCallback(
gym.make("AntBulletEnv-v0"),
eval_freq = 10000,
n_eval_episodes = 5,
best_model_save_path = "./best_ant_model"
)
model.learn(total_timesteps=5_000_000, callback=eval_cb)
model.save("ant_locomotion_ppo")
IL memungkinkan robot belajar dari demonstrasi manusia tanpa perlu desain reward function yang kompleks. Dua pendekatan utama:
Supervised learning dari pasangan (state, action) demonstrasi. Cepat, tapi rentan terhadap covariate shift.
Dataset Aggregation — secara iteratif menambah data dengan meminta expert koreksi pada state yang dikunjungi learner.
Transformer-based, prediksi sekuens aksi. State-of-the-art untuk robot manipulation (digunakan di Mobile ALOHA).
Denoising diffusion untuk generasi aksi. Menangkap distribusi multi-modal dari demonstrasi, performa superior di manipulation tasks.
Motion planning adalah masalah menemukan trajectory dari konfigurasi awal ke tujuan sambil menghindari obstacle dan memenuhi constraint kinematik/dinamik robot.
import numpy as np
from scipy.spatial import KDTree
class RRTStar:
def __init__(self, start, goal, bounds, obstacles,
max_iter=5000, step=0.1, radius=0.5):
self.start = np.array(start)
self.goal = np.array(goal)
self.bounds = bounds # [(xmin,xmax),(ymin,ymax)]
self.obstacles = obstacles # list of (center, radius)
self.max_iter = max_iter
self.step = step
self.radius = radius
self.nodes = [self.start]
self.parents = {0: None}
self.costs = {0: 0.0}
def sample(self):
if np.random.rand() < 0.1: # goal bias
return self.goal
return np.array([
np.random.uniform(*b) for b in self.bounds
])
def collision_free(self, p1, p2):
for (c, r) in self.obstacles:
# Check segment-circle intersection
d = p2 - p1
f = p1 - c
a = d @ d; b = 2*(f @ d); cc = f @ f - r**2
disc = b*b - 4*a*cc
if disc >= 0:
t1 = (-b - disc**0.5) / (2*a)
t2 = (-b + disc**0.5) / (2*a)
if 0 <= t1 <= 1 or 0 <= t2 <= 1:
return False
return True
def plan(self):
for _ in range(self.max_iter):
q_rand = self.fn_sample()
tree = KDTree(self.nodes)
_, near_i = tree.query(q_rand)
q_near = self.nodes[near_i]
direction = q_rand - q_near
dist = np.linalg.norm(direction)
q_new = q_near + (direction/dist) * min(self.step, dist)
if not self.fn_collision_free(q_near, q_new):
continue
# RRT* rewire
new_i = len(self.nodes)
near_inds = [i for i,n in enumerate(self.nodes)
if np.linalg.norm(n - q_new) < self.radius]
best_parent, best_cost = near_i, self.costs[near_i] + np.linalg.norm(q_new - q_near)
for ni in near_inds:
c = self.costs[ni] + np.linalg.norm(q_new - self.nodes[ni])
if c < best_cost and self.fn_collision_free(self.nodes[ni], q_new):
best_parent, best_cost = ni, c
self.nodes.append(q_new)
self.parents[new_i] = best_parent
self.costs[new_i] = best_cost
return self.fn_extract_path()
import rclpy
from moveit.planning import MoveItPy
def pick_place_demo():
rclpy.init()
robot = MoveItPy(node_name="robot_arm")
arm = robot.get_planning_component("manipulator")
# Set start state
arm.set_start_state_to_current_state()
# Define target pose (SE3)
from geometry_msgs.msg import Pose
target = Pose()
target.position.x = 0.4
target.position.y = 0.1
target.position.z = 0.3
target.orientation.w = 1.0
arm.set_goal_state(pose_stamped_msg=target,
pose_link="tool0")
# Plan with RRTConnect / STOMP / OMPL
plan_result = arm.plan()
if plan_result:
robot.execute(plan_result.trajectory,
controllers=["joint_trajectory_controller"])
else:
print("Planning FAILED — check collision objects")
rclpy.shutdown()
Kontrol robot adalah disiplin yang menjembatani output planner (trajectory setpoint) dengan aktuator fisik melalui algoritma feedback. Ketepatan, kecepatan respons, dan stabilitas adalah tiga metrik utama.
PID (Proportional-Integral-Derivative) adalah kontroler paling umum digunakan untuk single-joint position/velocity control:
class PIDController:
def __init__(self, Kp, Ki, Kd, dt,
out_min=-float('inf'), out_max=float('inf')):
self.Kp, self.Ki, self.Kd = Kp, Ki, Kd
self.dt = dt
self.out_min, self.out_max = out_min, out_max
self._integral = 0.0
self._prev_err = 0.0
def update(self, setpoint, measurement):
err = setpoint - measurement
self._integral = np.clip(
self._integral + err * self.dt,
self.out_min / self.Ki if self.Ki != 0 else -1e9,
self.out_max / self.Ki if self.Ki != 0 else 1e9
)
derivative = (err - self._prev_err) / self.dt
output = self.Kp*err + self.Ki*self._integral + self.Kd*derivative
self._prev_err = err
return np.clip(output, self.out_min, self.out_max)
# Contoh: Joint position control @ 1kHz
pid = PIDController(Kp=150.0, Ki=5.0, Kd=8.0, dt=0.001,
out_min=-20.0, out_max=20.0)
torque = pid.update(setpoint=1.57, measurement=current_angle)
MPC mengoptimalkan sekuens kontrol N-langkah ke depan pada setiap timestep, mempertimbangkan constraint secara eksplisit. Sangat cocok untuk legged robot dan autonomous driving.
WBC memungkinkan robot humanoid menyelesaikan multiple task secara simultan (balance + manipulation + gaze) dengan task-space prioritization:
| Metode | Kelebihan | Kekurangan | Aplikasi Ideal |
|---|---|---|---|
| PID | Simpel, stabil, fast | Tidak handle constraint | Single joint, servo |
| Feedforward + PID | Kompensasi gravitasi | Perlu model | Manipulator industri |
| MPC | Constraint-aware | Komputasi berat | Autonomous vehicle, legged |
| WBC / QP | Multi-task | Sangat kompleks | Humanoid robot |
| RL Policy | Adaptive, robust | Butuh training, black-box | Locomotion, dexterous hand |
ROS2 (Robot Operating System 2) adalah middleware standar industri untuk pengembangan robot. Berbeda dengan ROS1, ROS2 menggunakan DDS (Data Distribution Service) untuk komunikasi real-time yang deterministik dan mendukung sistem embedded.
# Install ROS2 Humble (Ubuntu 22.04)
sudo apt install ros-humble-desktop python3-colcon-common-extensions
# Install robotics packages
sudo apt install \
ros-humble-moveit \
ros-humble-nav2-bringup \
ros-humble-slam-toolbox \
ros-humble-ros2-control \
ros-humble-hardware-interface
# Create workspace
mkdir -p ~/robot_ws/src
cd ~/robot_ws
colcon build --symlink-install
source install/setup.bash
# Create package
cd src
ros2 pkg create --build-type ament_python ai_robot_demo \
--dependencies rclpy sensor_msgs geometry_msgs cv_bridge
#!/usr/bin/env python3
# ai_robot_demo/vision_node.py
# Blue Dragon Security — AI Robotics Tutorial
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image, PointCloud2
from geometry_msgs.msg import PoseArray, Pose
from std_msgs.msg import String
from cv_bridge import CvBridge
import numpy as np
import cv2
from ultralytics import YOLO
class AIRobotVisionNode(Node):
def __init__(self):
super().__init__('ai_robot_vision')
# Parameters
self.declare_parameter('model_path', 'yolov8n.pt')
self.declare_parameter('confidence', 0.5)
model_path = self.get_parameter('model_path').value
self.conf = self.get_parameter('confidence').value
# Load model
self.model = YOLO(model_path)
self.bridge = CvBridge()
# Subscribers
self.create_subscription(Image, '/camera/color/image_raw',
self.fn_rgb_callback, 10)
self.create_subscription(Image, '/camera/depth/image_rect_raw',
self.fn_depth_callback, 10)
# Publishers
self.det_pub = self.create_publisher(String, '/detections', 10)
self.pose_pub = self.create_publisher(PoseArray, '/object_poses', 10)
self.vis_pub = self.create_publisher(Image, '/vision/overlay', 10)
self._depth_img = None
self.get_logger().info("🐉 AIRobotVisionNode started")
def depth_callback(self, msg):
self._depth_img = self.bridge.imgmsg_to_cv2(msg, 'passthrough')
def rgb_callback(self, msg):
frame = self.bridge.imgmsg_to_cv2(msg, 'bgr8')
results = self.model(frame, conf=self.conf, verbose=False)[0]
poses = PoseArray()
poses.header.stamp = self.get_clock().now().to_msg()
poses.header.frame_id = 'camera_link'
detections = []
for box in results.boxes:
cls_name = results.names[int(box.cls)]
conf = float(box.conf)
x1,y1,x2,y2 = [int(v) for v in box.xyxy[0]]
cx, cy = (x1+x2)//2, (y1+y2)//2
detections.append(f"{cls_name}:{conf:.2f}")
# 3D pose from depth
if self._depth_img is not None:
depth = self._depth_img[cy, cx]
p = Pose()
p.position.x = (cx - 320) * depth / 615000.0
p.position.y = (cy - 240) * depth / 615000.0
p.position.z = depth / 1000.0
p.orientation.w = 1.0
poses.poses.append(p)
# Draw overlay
cv2.rectangle(frame, (x1,y1), (x2,y2), (0,200,255), 2)
cv2.putText(frame, f"{cls_name} {conf:.0%}",
(x1, y1-8), cv2.FONT_HERSHEY_SIMPLEX,
0.5, (0,200,255), 1)
self.det_pub.publish(String(data=','.join(detections)))
self.pose_pub.publish(poses)
self.vis_pub.publish(self.bridge.cv2_to_imgmsg(frame, 'bgr8'))
def main():
rclpy.init()
node = AIRobotVisionNode()
rclpy.spin(node)
rclpy.shutdown()
if __name__ == '__main__':
main()
# launch/robot_ai.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node
def generate_launch_description():
return LaunchDescription([
Node(package='ai_robot_demo', executable='vision_node',
name='vision',
parameters=[{'model_path': 'yolov8s.pt', 'confidence': 0.6}]),
Node(package='slam_toolbox', executable='async_slam_toolbox_node',
name='slam',
parameters=['config/slam_params.yaml']),
Node(package='nav2_bringup', executable='bringup_launch',
name='nav2'),
Node(package='ai_robot_demo', executable='task_planner_node',
name='task_planner'),
])
Robotika AI sedang memasuki era baru yang dipicu oleh konvergensi foundation models, hardware yang semakin powerful, dan dataset berskala besar. Beberapa tren yang akan mendefinisikan dekade berikutnya:
RT-2, π0, OpenVLA — model yang ditraining dari internet-scale data dan dapat di-prompt dengan bahasa natural untuk task baru tanpa fine-tuning.
NVIDIA Isaac, 1X Technologies, Figure AI — robot humanoid generasi pertama yang siap produksi untuk manufaktur dan logistik.
Intel Loihi 3, IBM NorthPole — chip yang meniru arsitektur otak untuk inferensi ultra-low-power pada edge robot.
Cobots yang benar-benar aman berbagi ruang kerja, komunikasi intent melalui NLU, dan shared autonomy untuk assistive robotics.
Multi-agent RL untuk koordinasi ratusan robot, desentralisasi, emergent behavior dari aturan lokal sederhana.
Adversarial attacks pada model persepsi, firmware exploitation pada motor controller, threat modeling untuk critical robot infrastructure.
| Sumber | Topik | Level |
|---|---|---|
| Probabilistic Robotics — Thrun, Burgard, Fox | SLAM, sensor fusion, Bayes | Intermediate |
| Modern Robotics — Lynch & Park | Kinematik, dinamik, kontrol | Advanced |
| Deep RL — Sutton & Barto | Reinforcement Learning teori | Intermediate |
| Springer Handbook of Robotics | Ensiklopedia komprehensif | Advanced |
| ROS2 Documentation — docs.ros.org | Middleware, tooling | Beginner |
| NVIDIA Isaac Lab Docs | Sim-to-real RL, GPU sim | Intermediate |
| Papers: RT-2, ACT, Diffusion Policy | Foundation model robotics | Research |