FunASR语音识别教程：JSON结果解析与二次开发-程序员充电站

FunASR语音识别教程：JSON结果解析与二次开发

1. 引言

1.1 学习目标

本文旨在帮助开发者深入理解 FunASR 语音识别系统的输出结构，重点讲解 JSON 格式结果的解析方法，并提供可扩展的二次开发实践指南。通过本教程，读者将能够：

理解 FunASR 输出 JSON 的完整结构
提取关键识别信息（文本、时间戳、置信度）
基于原始结果进行数据后处理
实现自定义功能扩展（如字幕生成、语音段落切分）

1.2 前置知识

建议读者具备以下基础：

Python 编程能力
JSON 数据格式基本理解
对语音识别任务有初步认知
熟悉命令行操作和文件系统

1.3 教程价值

FunASR 是一个功能强大的开源语音识别工具，其 WebUI 版本由社区开发者“科哥”基于speech_ngram_lm_zh-cn模型进行二次开发，提供了直观的操作界面和丰富的导出选项。然而，在实际项目中，仅使用图形界面难以满足自动化、批量化或集成化需求。

本教程填补了这一空白，专注于结果解析与程序化调用，为构建定制化语音处理系统提供技术支撑。

2. FunASR 输出结构详解

2.1 JSON 结果整体结构

当启用“输出时间戳”功能后，FunASR 返回的 JSON 数据包含多个层级的信息。以下是典型输出结构示例：

{ "result": [ { "text": "你好欢迎使用语音识别系统", "start": 0.0, "end": 5.0, "tokens": [ { "word": "你", "start": 0.0, "end": 0.5, "conf": 0.98 }, { "word": "好", "start": 0.5, "end": 1.0, "conf": 0.97 }, { "word": "欢", "start": 1.0, "end": 1.8, "conf": 0.96 } ] } ], "lang": "zh", "model": "paraformer-large", "punc": true, "vad": true }

2.2 关键字段说明

字段名	类型	说明
`result`	list	识别结果列表，每个元素代表一个语音片段
`text`	string	该片段的完整识别文本
`start`/`end`	float	片段起止时间（秒）
`tokens`	list	分词级详细信息
`word`	string	单个汉字或词语
`conf`	float	识别置信度（0~1）
`lang`	string	识别语言类型
`model`	string	使用的模型名称

注意：不同模型（如 Paraformer vs SenseVoice）返回的字段略有差异，需根据实际情况调整解析逻辑。

3. JSON 解析实战代码

3.1 环境准备

确保已安装必要依赖库：

pip install json pydantic pandas

3.2 基础解析函数实现

以下是一个完整的 JSON 解析类，支持多种输出格式转换：

import json from typing import List, Dict, Any import pandas as pd class ASRResultParser: def __init__(self, json_file: str = None, raw_data: Dict = None): if json_file: with open(json_file, 'r', encoding='utf-8') as f: self.data = json.load(f) elif raw_data: self.data = raw_data else: raise ValueError("必须提供 json_file 或 raw_data") def get_full_text(self) -> str: """获取完整识别文本""" texts = [seg['text'] for seg in self.data['result']] return ''.join(texts) def get_word_level_info(self) -> List[Dict]: """提取词级别信息（含时间戳和置信度）""" word_list = [] for seg_idx, segment in enumerate(self.data['result']): for token in segment.get('tokens', []): word_info = { 'segment_id': seg_idx + 1, 'word': token['word'], 'start_time': round(token['start'], 3), 'end_time': round(token['end'], 3), 'duration': round(token['end'] - token['start'], 3), 'confidence': round(token['conf'], 3) } word_list.append(word_info) return word_list def to_dataframe(self) -> pd.DataFrame: """转换为 Pandas DataFrame 便于分析""" word_info = self.get_word_level_info() return pd.DataFrame(word_info) def save_as_srt(self, output_path: str): """保存为 SRT 字幕文件""" df = self.to_dataframe() with open(output_path, 'w', encoding='utf-8') as f: for i, row in df.iterrows(): # SRT 序号 f.write(f"{i+1}\n") # 时间格式：HH:MM:SS,mmm start_hms = self._sec_to_hms(row['start_time']) end_hms = self._sec_to_hms(row['end_time']) f.write(f"{start_hms} --> {end_hms}\n") # 文本内容 f.write(f"{row['word']}\n\n") @staticmethod def _sec_to_hms(seconds: float) -> str: """秒转 H:M:S,mmm 格式""" h = int(seconds // 3600) m = int((seconds % 3600) // 60) s = seconds % 60 ms = int((s - int(s)) * 1000) return f"{h:02d}:{m:02d}:{int(s):02d},{ms:03d}"

3.3 使用示例

# 示例：加载并解析结果 parser = ASRResultParser(json_file="outputs/outputs_20260104123456/result_001.json") # 获取纯文本 text = parser.get_full_text() print("识别文本:", text) # 转换为表格 df = parser.to_dataframe() print("\n词级信息:") print(df.head()) # 导出 SRT 字幕 parser.save_as_srt("subtitle.srt")

4. 二次开发应用场景

4.1 自动生成视频字幕

结合moviepy可实现自动字幕嵌入：

from moviepy.editor import VideoFileClip, TextClip, CompositeVideoClip def add_subtitles_to_video(video_path: str, srt_path: str, output_path: str): video = VideoFileClip(video_path) # 读取 SRT 文件并创建字幕剪辑 subtitles = [] with open(srt_path, 'r', encoding='utf-8') as f: lines = f.readlines() i = 0 while i < len(lines): if lines[i].strip().isdigit(): time_line = lines[i+1].strip() text_line = lines[i+2].strip() start_str, end_str = time_line.split(' --> ') start_sec = _hms_to_sec(start_str) end_sec = _hms_to_sec(end_str) txt_clip = TextClip(text_line, fontsize=48, color='white', bg_color='black') txt_clip = txt_clip.set_position(('center', 'bottom')).set_duration(end_sec - start_sec) txt_clip = txt_clip.set_start(start_sec) subtitles.append(txt_clip) i += 4 else: i += 1 final_video = CompositeVideoClip([video] + subtitles) final_video.write_videofile(output_path, codec='libx264')

4.2 语音段落智能切分

利用 VAD 和标点信息实现自然段落划分：

def split_into_paragraphs(parser: ASRResultParser, max_words=50) -> List[str]: """按语义和长度切分段落""" sentences = [] current_para = "" df = parser.to_dataframe() # 按句号、问号等切分句子 for _, row in df.iterrows(): current_para += row['word'] if row['word'] in ['。', '？', '！', '；'] or len(current_para) >= max_words: sentences.append(current_para.strip()) current_para = "" if current_para: sentences.append(current_para.strip()) return sentences

4.3 批量处理管道设计

构建自动化处理流水线：

import os import glob def batch_process_results(input_dir: str, output_dir: str): json_files = glob.glob(os.path.join(input_dir, "**", "result_*.json"), recursive=True) for json_file in json_files: try: parser = ASRResultParser(json_file=json_file) # 生成各种格式 base_name = os.path.splitext(os.path.basename(json_file))[0] dir_name = os.path.dirname(json_file).replace(input_dir, output_dir) os.makedirs(dir_name, exist_ok=True) # 保存文本 with open(os.path.join(dir_name, f"{base_name}.txt"), 'w') as f: f.write(parser.get_full_text()) # 保存 CSV 分析表 parser.to_dataframe().to_csv( os.path.join(dir_name, f"{base_name}_detail.csv"), index=False ) # 生成 SRT parser.save_as_srt(os.path.join(dir_name, f"{base_name}.srt")) print(f"✅ 处理完成: {json_file}") except Exception as e: print(f"❌ 处理失败 {json_file}: {str(e)}")