使用Qwen3-ASR-0.6B构建.NET语音识别API服务
最近阿里开源的Qwen3-ASR-0.6B语音识别模型挺火的,支持52种语言和方言,识别效果据说很不错。作为一个.NET开发者,我就在想,能不能把这个模型集成到.NET应用里,做个自己的语音识别服务呢?
试了一下,发现还真行。虽然Qwen3-ASR主要是Python生态的,但通过一些巧妙的方法,我们完全可以在.NET里调用它,搭建一个稳定可用的语音识别API。今天我就来分享一下具体的做法,从环境准备到API部署,一步步带你走完整个流程。
1. 准备工作:理解Qwen3-ASR-0.6B
在开始动手之前,先简单了解一下我们要用的这个模型。
Qwen3-ASR-0.6B是阿里开源的一个轻量级语音识别模型,参数大约9亿。别看它体积不大,能力可不弱:
- 支持52种语言和方言:包括30种国际语言和22种中国方言
- 识别效果不错:在中文、英文等主要语言上表现都挺好
- 效率很高:并发128的时候,每秒能处理2000秒的音频
- 支持流式和离线识别:一个模型搞定两种场景
对我们.NET开发者来说,最大的挑战是它原生是Python生态的。不过别担心,我们有办法解决。
2. 整体方案设计
要在.NET里用Qwen3-ASR,我想到两种主要方案:
方案一:Python服务 + .NET调用这是比较稳妥的做法。用Python搭建一个语音识别服务,然后.NET通过HTTP或者gRPC来调用。好处是Python生态的工具可以直接用,部署也简单。
方案二:.NET直接调用通过.NET的Python互操作,或者用ONNX转换后直接调用。这个方案更直接,但技术难度大一些。
考虑到稳定性和易用性,我选择了方案一。具体架构是这样的:
.NET应用 → HTTP请求 → Python FastAPI服务 → Qwen3-ASR模型 → 返回识别结果这样.NET部分只需要处理HTTP请求,Python部分负责复杂的模型推理,各司其职。
3. 搭建Python语音识别服务
首先我们来搭建Python端的服务。我选择用FastAPI,因为它轻量、快速,而且自动生成API文档。
3.1 环境安装
创建一个新的Python环境,安装必要的包:
# 创建虚拟环境 python -m venv qwen_asr_env source qwen_asr_env/bin/activate # Linux/Mac # 或者 qwen_asr_env\Scripts\activate # Windows # 安装基础包 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install fastapi uvicorn pydantic # 安装Qwen3-ASR pip install qwen-asr # 如果需要vLLM后端(推荐,更快) pip install qwen-asr[vllm]如果你的GPU支持,强烈建议安装vLLM版本,速度会快很多。
3.2 创建FastAPI服务
新建一个asr_service.py文件:
from fastapi import FastAPI, File, UploadFile, HTTPException from fastapi.responses import JSONResponse from pydantic import BaseModel import torch from qwen_asr import Qwen3ASRModel import tempfile import os from typing import Optional import logging # 配置日志 logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) app = FastAPI( title="Qwen3-ASR语音识别服务", description="基于Qwen3-ASR-0.6B的语音识别API", version="1.0.0" ) # 全局模型变量 model = None class TranscriptionRequest(BaseModel): language: Optional[str] = None # 语言代码,如"zh"、"en",None表示自动检测 return_timestamps: bool = False # 是否返回时间戳 class TranscriptionResponse(BaseModel): text: str language: str timestamps: Optional[list] = None success: bool message: str @app.on_event("startup") async def startup_event(): """启动时加载模型""" global model try: logger.info("正在加载Qwen3-ASR-0.6B模型...") # 加载模型,使用vLLM后端(如果安装了) model = Qwen3ASRModel.from_pretrained( "Qwen/Qwen3-ASR-0.6B", dtype=torch.bfloat16, device_map="cuda:0" if torch.cuda.is_available() else "cpu", max_inference_batch_size=32, max_new_tokens=256, ) logger.info("模型加载完成,服务已就绪") except Exception as e: logger.error(f"模型加载失败: {str(e)}") raise @app.get("/") async def root(): """健康检查""" return { "status": "running", "model": "Qwen3-ASR-0.6B", "supported_languages": "52种语言和方言" } @app.post("/transcribe", response_model=TranscriptionResponse) async def transcribe_audio( file: UploadFile = File(...), language: Optional[str] = None, return_timestamps: bool = False ): """转录音频文件""" if model is None: raise HTTPException(status_code=503, detail="模型未加载") # 检查文件类型 if not file.filename.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')): raise HTTPException(status_code=400, detail="仅支持wav、mp3、m4a、flac格式") try: # 保存上传的文件到临时文件 with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)[1]) as tmp_file: content = await file.read() tmp_file.write(content) tmp_path = tmp_file.name logger.info(f"处理文件: {file.filename}, 大小: {len(content)}字节") # 调用模型进行识别 results = model.transcribe( audio=tmp_path, language=language, return_time_stamps=return_timestamps ) # 清理临时文件 os.unlink(tmp_path) if not results: return TranscriptionResponse( text="", language="", timestamps=[] if return_timestamps else None, success=False, message="未检测到语音" ) result = results[0] return TranscriptionResponse( text=result.text, language=result.language, timestamps=result.time_stamps if return_timestamps else None, success=True, message="识别成功" ) except Exception as e: logger.error(f"识别失败: {str(e)}") raise HTTPException(status_code=500, detail=f"识别失败: {str(e)}") @app.post("/transcribe_batch") async def transcribe_batch(files: list[UploadFile] = File(...)): """批量转录音频文件""" if model is None: raise HTTPException(status_code=503, detail="模型未加载") results = [] for file in files: try: # 这里简化处理,实际应该并行处理 response = await transcribe_audio(file) results.append({ "filename": file.filename, "text": response.text, "language": response.language, "success": response.success }) except Exception as e: results.append({ "filename": file.filename, "text": "", "language": "", "success": False, "error": str(e) }) return {"results": results} if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)这个服务提供了两个主要接口:
/transcribe: 单个音频文件识别/transcribe_batch: 批量音频文件识别
3.3 启动服务
运行服务:
python asr_service.py服务启动后,访问http://localhost:8000/docs可以看到自动生成的API文档。
4. 创建.NET客户端
现在Python服务跑起来了,接下来在.NET里调用它。
4.1 创建.NET项目
dotnet new webapi -n QwenASR.Client cd QwenASR.Client4.2 添加必要的NuGet包
<Project Sdk="Microsoft.NET.Sdk.Web"> <PropertyGroup> <TargetFramework>net8.0</TargetFramework> <Nullable>enable</Nullable> <ImplicitUsings>enable</ImplicitUsings> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.AspNetCore.OpenApi" Version="8.0.0" /> <PackageReference Include="Swashbuckle.AspNetCore" Version="6.4.0" /> <PackageReference Include="Refit" Version="7.0.0" /> <PackageReference Include="Refit.HttpClientFactory" Version="7.0.0" /> </ItemGroup> </Project>4.3 定义API接口模型
创建Models/TranscriptionModels.cs:
namespace QwenASR.Client.Models; public class TranscriptionRequest { public string? Language { get; set; } public bool ReturnTimestamps { get; set; } } public class TranscriptionResponse { public string Text { get; set; } = string.Empty; public string Language { get; set; } = string.Empty; public List<Timestamp>? Timestamps { get; set; } public bool Success { get; set; } public string Message { get; set; } = string.Empty; } public class Timestamp { public string Text { get; set; } = string.Empty; public double Start { get; set; } public double End { get; set; } } public class BatchTranscriptionRequest { public List<IFormFile> Files { get; set; } = new(); } public class BatchTranscriptionResult { public string Filename { get; set; } = string.Empty; public string Text { get; set; } = string.Empty; public string Language { get; set; } = string.Empty; public bool Success { get; set; } public string? Error { get; set; } } public class BatchTranscriptionResponse { public List<BatchTranscriptionResult> Results { get; set; } = new(); }4.4 创建Refit接口
创建Services/IASRService.cs:
using Refit; namespace QwenASR.Client.Services; public interface IASRService { [Multipart] [Post("/transcribe")] Task<TranscriptionResponse> TranscribeAsync( [AliasAs("file")] StreamPart file, [AliasAs("language")] string? language = null, [AliasAs("return_timestamps")] bool returnTimestamps = false); [Multipart] [Post("/transcribe_batch")] Task<BatchTranscriptionResponse> TranscribeBatchAsync( [AliasAs("files")] IEnumerable<StreamPart> files); }4.5 创建服务包装类
创建Services/ASRService.cs:
using Microsoft.Extensions.Options; using Refit; using System.Net.Http.Headers; namespace QwenASR.Client.Services; public class ASRService : IASRService { private readonly IASRService _api; private readonly IHttpClientFactory _httpClientFactory; private readonly ASRServiceOptions _options; private readonly ILogger<ASRService> _logger; public ASRService( IHttpClientFactory httpClientFactory, IOptions<ASRServiceOptions> options, ILogger<ASRService> logger) { _httpClientFactory = httpClientFactory; _options = options.Value; _logger = logger; var httpClient = _httpClientFactory.CreateClient("ASRService"); httpClient.BaseAddress = new Uri(_options.BaseUrl); _api = RestService.For<IASRService>(httpClient); } public async Task<TranscriptionResponse> TranscribeAsync( StreamPart file, string? language = null, bool returnTimestamps = false) { try { _logger.LogInformation("开始语音识别,语言: {Language}", language ?? "自动检测"); var response = await _api.TranscribeAsync(file, language, returnTimestamps); _logger.LogInformation("语音识别完成,语言: {Language}, 文本长度: {Length}", response.Language, response.Text.Length); return response; } catch (ApiException ex) { _logger.LogError(ex, "语音识别API调用失败"); throw new ApplicationException($"语音识别失败: {ex.Message}", ex); } } public async Task<TranscriptionResponse> TranscribeFileAsync( IFormFile file, string? language = null, bool returnTimestamps = false) { await using var stream = file.OpenReadStream(); var streamPart = new StreamPart(stream, file.FileName, file.ContentType); return await TranscribeAsync(streamPart, language, returnTimestamps); } public async Task<TranscriptionResponse> TranscribeLocalFileAsync( string filePath, string? language = null, bool returnTimestamps = false) { if (!File.Exists(filePath)) { throw new FileNotFoundException($"文件不存在: {filePath}"); } var fileName = Path.GetFileName(filePath); var contentType = GetContentType(filePath); await using var stream = File.OpenRead(filePath); var streamPart = new StreamPart(stream, fileName, contentType); return await TranscribeAsync(streamPart, language, returnTimestamps); } public async Task<BatchTranscriptionResponse> TranscribeBatchAsync(IEnumerable<StreamPart> files) { try { _logger.LogInformation("开始批量语音识别,文件数: {Count}", files.Count()); var response = await _api.TranscribeBatchAsync(files); _logger.LogInformation("批量语音识别完成,成功: {SuccessCount}/{TotalCount}", response.Results.Count(r => r.Success), response.Results.Count); return response; } catch (ApiException ex) { _logger.LogError(ex, "批量语音识别API调用失败"); throw new ApplicationException($"批量语音识别失败: {ex.Message}", ex); } } public async Task<BatchTranscriptionResponse> TranscribeBatchFilesAsync(IEnumerable<IFormFile> files) { var streamParts = files.Select(file => { var stream = file.OpenReadStream(); return new StreamPart(stream, file.FileName, file.ContentType); }).ToList(); return await TranscribeBatchAsync(streamParts); } private static string GetContentType(string filePath) { var extension = Path.GetExtension(filePath).ToLowerInvariant(); return extension switch { ".wav" => "audio/wav", ".mp3" => "audio/mpeg", ".m4a" => "audio/mp4", ".flac" => "audio/flac", _ => "application/octet-stream" }; } } public class ASRServiceOptions { public string BaseUrl { get; set; } = "http://localhost:8000"; public int TimeoutSeconds { get; set; } = 300; }4.6 配置依赖注入
在Program.cs中添加:
using QwenASR.Client.Services; var builder = WebApplication.CreateBuilder(args); // 添加服务配置 builder.Services.Configure<ASRServiceOptions>( builder.Configuration.GetSection("ASRService")); // 添加HttpClient builder.Services.AddHttpClient("ASRService", (serviceProvider, client) => { var options = serviceProvider.GetRequiredService<IOptions<ASRServiceOptions>>().Value; client.BaseAddress = new Uri(options.BaseUrl); client.Timeout = TimeSpan.FromSeconds(options.TimeoutSeconds); client.DefaultRequestHeaders.Accept.Add( new MediaTypeWithQualityHeaderValue("application/json")); }); // 添加Refit服务 builder.Services.AddRefitClient<IASRService>() .ConfigureHttpClient((serviceProvider, client) => { var options = serviceProvider.GetRequiredService<IOptions<ASRServiceOptions>>().Value; client.BaseAddress = new Uri(options.BaseUrl); client.Timeout = TimeSpan.FromSeconds(options.TimeoutSeconds); }); // 添加自定义服务 builder.Services.AddScoped<ASRService>(); // 添加控制器 builder.Services.AddControllers(); // 添加Swagger builder.Services.AddEndpointsApiExplorer(); builder.Services.AddSwaggerGen(); var app = builder.Build(); if (app.Environment.IsDevelopment()) { app.UseSwagger(); app.UseSwaggerUI(); } app.UseHttpsRedirection(); app.UseAuthorization(); app.MapControllers(); app.Run();在appsettings.json中添加配置:
{ "Logging": { "LogLevel": { "Default": "Information", "Microsoft.AspNetCore": "Warning" } }, "ASRService": { "BaseUrl": "http://localhost:8000", "TimeoutSeconds": 300 }, "AllowedHosts": "*" }4.7 创建控制器
创建Controllers/TranscriptionController.cs:
using Microsoft.AspNetCore.Mvc; using QwenASR.Client.Models; using QwenASR.Client.Services; namespace QwenASR.Client.Controllers; [ApiController] [Route("api/[controller]")] public class TranscriptionController : ControllerBase { private readonly ASRService _asrService; private readonly ILogger<TranscriptionController> _logger; public TranscriptionController( ASRService asrService, ILogger<TranscriptionController> logger) { _asrService = asrService; _logger = logger; } [HttpPost("transcribe")] public async Task<ActionResult<TranscriptionResponse>> Transcribe( IFormFile file, [FromForm] string? language = null, [FromForm] bool returnTimestamps = false) { if (file == null || file.Length == 0) { return BadRequest("请上传有效的音频文件"); } try { var result = await _asrService.TranscribeFileAsync( file, language, returnTimestamps); return Ok(result); } catch (Exception ex) { _logger.LogError(ex, "语音识别失败"); return StatusCode(500, new TranscriptionResponse { Success = false, Message = $"识别失败: {ex.Message}" }); } } [HttpPost("transcribe/local")] public async Task<ActionResult<TranscriptionResponse>> TranscribeLocalFile( [FromBody] LocalFileRequest request) { if (string.IsNullOrEmpty(request.FilePath)) { return BadRequest("文件路径不能为空"); } try { var result = await _asrService.TranscribeLocalFileAsync( request.FilePath, request.Language, request.ReturnTimestamps); return Ok(result); } catch (FileNotFoundException ex) { return NotFound(new TranscriptionResponse { Success = false, Message = ex.Message }); } catch (Exception ex) { _logger.LogError(ex, "本地文件语音识别失败"); return StatusCode(500, new TranscriptionResponse { Success = false, Message = $"识别失败: {ex.Message}" }); } } [HttpPost("transcribe/batch")] public async Task<ActionResult<BatchTranscriptionResponse>> TranscribeBatch( List<IFormFile> files) { if (files == null || files.Count == 0) { return BadRequest("请上传至少一个音频文件"); } if (files.Count > 10) { return BadRequest("单次最多处理10个文件"); } try { var result = await _asrService.TranscribeBatchFilesAsync(files); return Ok(result); } catch (Exception ex) { _logger.LogError(ex, "批量语音识别失败"); return StatusCode(500, new BatchTranscriptionResponse { Results = files.Select(f => new BatchTranscriptionResult { Filename = f.FileName, Success = false, Error = ex.Message }).ToList() }); } } [HttpGet("health")] public async Task<ActionResult> HealthCheck() { try { using var client = new HttpClient(); var response = await client.GetAsync("http://localhost:8000/"); if (response.IsSuccessStatusCode) { return Ok(new { Status = "Healthy", Message = "ASR服务运行正常" }); } return StatusCode(503, new { Status = "Unhealthy", Message = "ASR服务不可用" }); } catch (Exception ex) { return StatusCode(503, new { Status = "Unhealthy", Message = $"ASR服务连接失败: {ex.Message}" }); } } } public class LocalFileRequest { public string FilePath { get; set; } = string.Empty; public string? Language { get; set; } public bool ReturnTimestamps { get; set; } }5. 测试和使用
5.1 启动服务
首先确保Python服务在运行:
# 在Python环境目录下 python asr_service.py然后启动.NET服务:
dotnet run5.2 使用Swagger测试
访问http://localhost:5000/swagger,可以看到我们创建的所有API接口。
测试单个文件识别:
- 选择
/api/Transcription/transcribe接口 - 点击 "Try it out"
- 上传一个音频文件(wav、mp3等格式)
- 可选:指定语言代码(如 "zh" 表示中文,"en" 表示英文)
- 点击 "Execute"
测试批量识别:
- 选择
/api/Transcription/transcribe/batch接口 - 上传多个音频文件
- 点击 "Execute"
5.3 代码调用示例
你也可以在代码中直接调用:
// 在Program.cs中添加测试代码 app.MapPost("/test", async (ASRService asrService) => { // 测试本地文件 var result = await asrService.TranscribeLocalFileAsync( "test_audio.wav", language: "zh", returnTimestamps: true); Console.WriteLine($"识别结果: {result.Text}"); Console.WriteLine($"语言: {result.Language}"); if (result.Timestamps != null) { foreach (var ts in result.Timestamps) { Console.WriteLine($"[{ts.Start:F2}s-{ts.End:F2}s] {ts.Text}"); } } return result; });6. 性能优化和注意事项
6.1 性能优化建议
- 使用vLLM后端:如果GPU可用,一定要用vLLM版本,速度提升很明显
- 批量处理:多个文件尽量用批量接口,减少网络开销
- 合理设置超时:长音频识别需要较长时间,适当增加超时设置
- 内存管理:及时释放文件流,避免内存泄漏
6.2 常见问题处理
问题1:Python服务启动失败
- 检查CUDA是否可用:
torch.cuda.is_available() - 检查Python包是否安装完整
- 检查端口是否被占用
问题2:识别速度慢
- 确认是否使用了vLLM后端
- 检查GPU内存是否充足
- 考虑使用Qwen3-ASR-0.6B而不是1.7B版本
问题3:识别准确率不高
- 确保音频质量较好,背景噪声不要太大
- 明确指定语言参数,避免自动检测错误
- 对于特定方言,可以尝试使用对应的语言代码
6.3 生产环境部署建议
- 使用Docker容器化:将Python服务打包成Docker镜像,方便部署
- 添加负载均衡:如果并发量大,可以部署多个Python服务实例
- 添加监控和日志:记录识别成功率、响应时间等指标
- 实现熔断和重试:在网络不稳定的情况下自动重试
- 添加认证和限流:保护API不被滥用
7. 扩展功能
7.1 添加流式识别
Qwen3-ASR支持流式识别,我们可以扩展API支持实时音频流:
# 在asr_service.py中添加 @app.post("/transcribe_stream") async def transcribe_stream( audio_stream: bytes = Body(...), language: Optional[str] = None ): """流式语音识别""" # 将字节流保存为临时文件 with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp: tmp.write(audio_stream) tmp_path = tmp.name try: results = model.transcribe( audio=tmp_path, language=language ) os.unlink(tmp_path) if results: return {"text": results[0].text, "language": results[0].language} return {"text": "", "language": ""} except Exception as e: if os.path.exists(tmp_path): os.unlink(tmp_path) raise HTTPException(status_code=500, detail=str(e))7.2 添加WebSocket支持
对于实时语音识别场景,可以添加WebSocket支持:
from fastapi import WebSocket @app.websocket("/ws/transcribe") async def websocket_transcribe(websocket: WebSocket): await websocket.accept() try: while True: # 接收音频数据 data = await websocket.receive_bytes() # 处理音频并识别 with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp: tmp.write(data) tmp_path = tmp.name results = model.transcribe(audio=tmp_path) os.unlink(tmp_path) if results: await websocket.send_json({ "text": results[0].text, "language": results[0].language }) except Exception as e: await websocket.close(code=1011, reason=str(e))7.3 添加缓存机制
对于相同的音频内容,可以添加缓存避免重复识别:
import hashlib from functools import lru_cache def get_audio_hash(audio_data: bytes) -> str: """计算音频数据的哈希值""" return hashlib.md5(audio_data).hexdigest() @lru_cache(maxsize=1000) def cached_transcribe(audio_hash: str, language: Optional[str] = None): """带缓存的语音识别""" # 这里需要根据audio_hash获取音频数据 # 实际实现中需要存储音频数据和哈希的映射 pass获取更多AI镜像
想探索更多AI镜像和应用场景?访问 CSDN星图镜像广场,提供丰富的预置镜像,覆盖大模型推理、图像生成、视频生成、模型微调等多个领域,支持一键部署。