保姆级教程：用Python快速解析Charades和Action Genome数据标注（附避坑指南）-程序员充电站

保姆级教程：用Python快速解析Charades和Action Genome数据标注（附避坑指南）

第一次接触Charades或Action Genome数据集时，面对各种.pkl、.csv文件格式和复杂的标注结构，很多研究者都会感到无从下手。本文将手把手教你如何用Python高效解析这些数据集，避开常见陷阱，快速将原始标注转换为模型可用的结构化数据。

1. 环境准备与数据下载

在开始解析数据之前，需要确保你的Python环境已经安装了必要的库。推荐使用conda创建一个新的虚拟环境：

conda create -n charades python=3.8 conda activate charades pip install pandas numpy matplotlib opencv-python pickle5

注意：Action Genome的.pkl文件需要使用Python 3.8+和pickle5库才能正确读取

数据集下载地址：

Charades/CharadesEgo: [官方下载页面]
Action Genome标注: [Google Drive链接]

下载完成后，建议保持原始文件结构不变。典型的数据目录结构如下：

charades_data/ ├── Charades_v1_480/ # 视频文件 ├── Charades_v1_train.csv # 训练集标注 ├── Charades_v1_test.csv # 测试集标注 └── Charades_v1_classes.txt # 行为类别 action_genome/ ├── person_bbox.pkl # 人物边界框 └── object_bbox_and_relationship.pkl # 物体及关系标注

2. 解析Charades CSV标注文件

Charades的主要标注信息存储在CSV文件中，使用pandas可以轻松读取和处理：

import pandas as pd # 读取训练集标注 train_df = pd.read_csv('Charades_v1_train.csv') print(train_df.head()) # 解析actions列（最复杂的部分） def parse_actions(action_str): if pd.isna(action_str): return [] actions = action_str.split(';') result = [] for act in actions: parts = act.strip().split() if len(parts) == 3: # 格式: c092 11.90 21.20 result.append({ 'class_id': parts[0], 'start': float(parts[1]), 'end': float(parts[2]) }) return result train_df['parsed_actions'] = train_df['actions'].apply(parse_actions)

常见问题及解决方案：

编码问题：如果遇到UnicodeDecodeError，尝试指定编码：
```
pd.read_csv('file.csv', encoding='latin1')
```
路径问题：确保CSV中的视频ID与视频文件名匹配
内存不足：对于大文件，使用chunksize参数分块读取

3. 处理Action Genome的PKL标注

Action Genome的标注以Python pickle格式存储，需要使用特殊的加载方式：

import pickle5 as pickle def load_pkl(file_path): with open(file_path, 'rb') as f: data = pickle.load(f) return data # 加载人物边界框标注 person_bbox = load_pkl('person_bbox.pkl') # 查看第一个样本的标注 first_key = next(iter(person_bbox)) print(f"Key: {first_key}") print(f"Value: {person_bbox[first_key]}")

对于物体关系标注，数据结构更为复杂：

obj_relations = load_pkl('object_bbox_and_relationship.pkl') # 解析单个帧的物体关系 def parse_frame_relations(frame_data): objects = [] for obj in frame_data: obj_info = { 'class': obj['class'], 'bbox': obj['bbox'], 'attention': obj['attention_relationship'], 'spatial': obj['spatial_relationship'], 'contact': obj['contacting_relationship'] } objects.append(obj_info) return objects # 示例：解析第一个帧的关系 sample_frame = obj_relations[first_key] parsed_relations = parse_frame_relations(sample_frame)

4. 数据可视化与验证

为了确保数据解析正确，可视化是关键步骤。以下是绘制边界框和标注信息的示例：

import cv2 import matplotlib.pyplot as plt def draw_bboxes(image_path, person_data, obj_data): img = cv2.imread(image_path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 绘制人物边界框 for bbox in person_data['bbox']: x, y, w, h = bbox cv2.rectangle(img, (x, y), (x+w, y+h), (255,0,0), 2) # 绘制物体边界框和关系 for obj in obj_data: x, y, w, h = obj['bbox'] cv2.rectangle(img, (x, y), (x+w, y+h), (0,255,0), 2) # 添加关系文本 relations = ', '.join(obj['attention'] + obj['spatial'] + obj['contact']) cv2.putText(img, f"{obj['class']}: {relations}", (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 1) plt.figure(figsize=(12, 8)) plt.imshow(img) plt.axis('off') plt.show() # 示例使用（需要先提取对应帧图像） # draw_bboxes('frame.jpg', person_bbox[first_key], parsed_relations)

5. 高级技巧与性能优化

当处理大规模数据时，效率变得至关重要。以下是几个提升处理速度的技巧：

并行处理：使用multiprocessing加速数据加载

from multiprocessing import Pool def process_frame(args): frame_id, data = args # 处理逻辑 return processed_data with Pool(4) as p: # 使用4个进程 results = p.map(process_frame, person_bbox.items())

缓存中间结果：将解析后的数据保存为更高效的格式

import h5py # 保存为HDF5文件 with h5py.File('processed_data.h5', 'w') as f: for key, value in parsed_data.items(): f.create_dataset(key, data=value)

使用生成器处理大数据：避免内存爆满

def data_generator(pkl_file, batch_size=32): data = load_pkl(pkl_file) keys = list(data.keys()) for i in range(0, len(keys), batch_size): batch_keys = keys[i:i+batch_size] yield {k: data[k] for k in batch_keys}

6. 常见问题解决方案

在实际使用中，你可能会遇到以下问题：

问题1：pickle加载报错UnicodeDecodeError

解决方案：

# 尝试指定编码 with open('file.pkl', 'rb') as f: data = pickle.load(f, encoding='latin1')

问题2：视频帧与标注时间不同步

解决方案：

# 计算最接近的帧号 def time_to_frame(time_sec, fps=24): return int(time_sec * fps) # 应用到动作标注 action['start_frame'] = time_to_frame(action['start']) action['end_frame'] = time_to_frame(action['end'])

问题3：物体关系标注中的空值处理

解决方案：

def safe_get_relations(obj): return { 'attention': obj.get('attention_relationship', []), 'spatial': obj.get('spatial_relationship', []), 'contact': obj.get('contacting_relationship', []) }

7. 数据转换实战案例

最后，我们来看一个将原始标注转换为模型训练所需格式的完整示例。假设我们需要准备一个动作识别任务的数据：

import json from tqdm import tqdm def prepare_training_data(csv_path, output_json): df = pd.read_csv(csv_path) training_samples = [] for _, row in tqdm(df.iterrows(), total=len(df)): video_id = row['id'] actions = parse_actions(row['actions']) # 为每个动作创建样本 for act in actions: sample = { 'video_id': video_id, 'class_id': act['class_id'], 'start': act['start'], 'end': act['end'], 'metadata': { 'scene': row['scene'], 'objects': row['objects'] } } training_samples.append(sample) # 保存为JSON with open(output_json, 'w') as f: json.dump(training_samples, f) # 使用示例 prepare_training_data('Charades_v1_train.csv', 'train_data.json')

对于Action Genome的关系检测任务，转换过程类似但更复杂：

def prepare_relation_data(bbox_pkl, relation_pkl, output_path): person_data = load_pkl(bbox_pkl) obj_data = load_pkl(relation_pkl) relation_samples = [] common_keys = set(person_data.keys()) & set(obj_data.keys()) for key in tqdm(common_keys): frame_relations = [] for obj in obj_data[key]: relations = { 'object_class': obj['class'], 'bbox': obj['bbox'], 'relations': { 'attention': obj['attention_relationship'], 'spatial': obj['spatial_relationship'], 'contact': obj['contacting_relationship'] } } frame_relations.append(relations) sample = { 'frame_id': key, 'person_bbox': person_data[key]['bbox'], 'relations': frame_relations } relation_samples.append(sample) # 保存结果 with open(output_path, 'wb') as f: pickle.dump(relation_samples, f)

在实际项目中，处理这些数据集最耗时的部分往往是数据清洗和验证。建议在开始模型训练前，先花时间确保数据解析完全正确。一个小技巧是随机抽样检查100个样本，人工验证标注解析的准确性。