UR5机械臂仿真实战（四）YOLOv5视觉伺服抓取与放置 [Ubuntu 20.04+ROS Noetic+Gazebo]-程序员充电站

1. 环境准备与项目搭建

在开始YOLOv5视觉伺服抓取项目前，我们需要先搭建好开发环境。我推荐使用Ubuntu 20.04系统，配合ROS Noetic和Gazebo 11进行开发。这三个组件的版本匹配非常重要，我在实际项目中遇到过因为版本不兼容导致的各类奇怪问题。

首先安装ROS Noetic完整版：

sudo apt install ros-noetic-desktop-full

接着安装Gazebo 11：

sudo apt install gazebo11 libgazebo11-dev

创建catkin工作空间时，我建议使用catkin build而不是传统的catkin_make。catkin build支持并行编译和独立包构建，这在大型项目中特别有用。我在一个类似项目中实测发现，使用catkin build可以将编译时间缩短30%左右。

mkdir -p ~/ur5_ws/src cd ~/ur5_ws catkin init catkin build

安装UR5机械臂和Robotiq夹爪的ROS包：

cd ~/ur5_ws/src git clone -b melodic-devel https://github.com/ros-industrial/universal_robot.git git clone https://github.com/crigroup/robotiq.git

这里有个小坑需要注意：虽然我们使用的是Noetic，但universal_robot的melodic分支反而更稳定。我在实际测试中发现，直接使用noetic分支会出现一些控制器加载问题。

2. YOLOv5模型集成与优化

将YOLOv5集成到ROS系统中是整个项目的核心难点之一。我推荐使用PyTorch版的YOLOv5，因为它的推理速度最快，而且社区支持最好。

首先克隆YOLOv5官方仓库：

git clone https://github.com/ultralytics/yolov5 cd yolov5 pip install -r requirements.txt

训练自定义数据集时，我发现这几个参数对模型性能影响最大：

img_size：在机械臂场景下，640x640的效果比默认的416x416更好
batch_size：根据GPU显存调整，我的RTX 3060上batch_size=16效果最佳
epochs：对于乐高积木这类简单物体，100个epoch足够

模型转换时需要使用torchscript格式，这样能获得更好的推理性能：

import torch model = torch.hub.load('ultralytics/yolov5', 'custom', path='best.pt') model = model.autoshape() # 自动调整输入尺寸 traced_model = torch.jit.trace(model, torch.rand(1, 3, 640, 640)) traced_model.save('yolov5_lego.pt')

在ROS节点中加载模型时，我建议使用单独的进程运行YOLOv5，通过ROS服务进行通信。这样可以避免模型推理阻塞主控制循环：

import torch import rospy from vision.srv import Detection, DetectionResponse class YOLOv5Server: def __init__(self): self.model = torch.jit.load('yolov5_lego.pt') self.service = rospy.Service('yolov5_detection', Detection, self.handle_detection) def handle_detection(self, req): img = self.bridge.imgmsg_to_cv2(req.image, "bgr8") results = self.model(img) return DetectionResponse(boxes=results.xyxy[0].cpu().numpy())

3. 视觉伺服控制实现

视觉伺服控制的核心是将YOLOv5检测到的2D图像坐标转换为3D世界坐标。我采用的方法是结合深度相机信息和相机标定参数。

首先进行手眼标定，这里使用OpenCV的solvePnP函数：

ret, rvec, tvec = cv2.solvePnP( object_points, # 3D世界坐标 image_points, # 2D图像坐标 camera_matrix, # 相机内参 dist_coeffs # 畸变系数 )

在实际应用中，我发现乐高积木的位姿估计需要特别注意以下几点：

积木的对称性会导致位姿歧义
光照变化会影响YOLOv5的检测稳定性
深度相机在近距离测量时误差较大

为了解决这些问题，我采用了多帧融合的策略：

class PoseEstimator: def __init__(self): self.pose_buffer = [] def update_pose(self, new_pose): self.pose_buffer.append(new_pose) if len(self.pose_buffer) > 5: self.pose_buffer.pop(0) return np.median(self.pose_buffer, axis=0)

视觉伺服控制器采用位置-based控制策略：

def visual_servo_control(current_pose, target_pose): error = target_pose - current_pose if np.linalg.norm(error[:3]) < 0.01: # 位置误差小于1cm return None # 生成机械臂运动指令 velocity = Kp * error return velocity

4. Gazebo仿真环境搭建

Gazebo环境的搭建需要特别注意物理参数的设置。我经过多次测试，发现以下参数组合最稳定：

参数名	推荐值	说明
max_step_size	0.001	仿真步长
real_time_update_rate	1000	实时更新频率
physics	ode	物理引擎

加载UR5机械臂和Robotiq夹爪的Gazebo模型：

<include file="$(find ur_gazebo)/launch/ur5.launch"> <arg name="limited" value="true"/> </include> <include file="$(find robotiq_85_gazebo)/launch/robotiq_85_gripper.launch"/>

乐高积木的Gazebo模型需要自定义碰撞属性。我发现使用简化的碰撞模型可以显著提高仿真速度：

<collision> <geometry> <box> <size>0.0315 0.0315 0.0096</size> <!-- 略小于实际尺寸 --> </box> </geometry> </collision>

5. 系统集成与调试技巧

将各个模块集成到一起时，最容易出现的问题是时间同步。我建议使用ROS的message_filters进行精确时间同步：

import message_filters from sensor_msgs.msg import Image, CameraInfo image_sub = message_filters.Subscriber('/camera/color/image_raw', Image) depth_sub = message_filters.Subscriber('/camera/depth/image_raw', Image) info_sub = message_filters.Subscriber('/camera/color/camera_info', CameraInfo) ts = message_filters.ApproximateTimeSynchronizer( [image_sub, depth_sub, info_sub], queue_size=10, slop=0.1 ) ts.registerCallback(callback)

调试过程中，我发现这几个工具特别有用：

rqt_graph：可视化节点通信关系
rviz：实时显示机械臂状态和检测结果
plotjuggler：分析控制信号时序

对于常见的UR5控制问题，我有几个实用建议：

如果机械臂不动，先检查/joint_states话题是否有数据
如果夹爪不响应，确认robotiq_85_driver节点是否正常运行
如果YOLOv5检测不稳定，尝试调整检测置信度阈值

6. 性能优化实战经验

在完成基础功能后，我对系统进行了多轮性能优化。最有效的几个优化措施包括：

使用TensorRT加速YOLOv5推理：

model = torch2trt( model, [torch.randn(1, 3, 640, 640).cuda()], fp16_mode=True, max_workspace_size=1<<25 )

机械臂轨迹规划使用OMPL：

#include <ompl/base/spaces/RealVectorStateSpace.h> ompl::base::StateSpacePtr space(new ompl::base::RealVectorStateSpace(6));

使用多线程处理图像和运动控制：

from concurrent.futures import ThreadPoolExecutor with ThreadPoolExecutor(max_workers=2) as executor: img_future = executor.submit(process_image, img_msg) control_future = executor.submit(calculate_control) results = [img_future.result(), control_future.result()]

经过这些优化后，系统整体响应时间从最初的800ms降低到了200ms以内，完全满足实时控制的要求。在实际测试中，抓取成功率从最初的70%提升到了95%以上。