news 2026/4/17 16:40:39

使用Qwen3-ASR-0.6B构建.NET语音识别API服务

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
使用Qwen3-ASR-0.6B构建.NET语音识别API服务

使用Qwen3-ASR-0.6B构建.NET语音识别API服务

最近阿里开源的Qwen3-ASR-0.6B语音识别模型挺火的,支持52种语言和方言,识别效果据说很不错。作为一个.NET开发者,我就在想,能不能把这个模型集成到.NET应用里,做个自己的语音识别服务呢?

试了一下,发现还真行。虽然Qwen3-ASR主要是Python生态的,但通过一些巧妙的方法,我们完全可以在.NET里调用它,搭建一个稳定可用的语音识别API。今天我就来分享一下具体的做法,从环境准备到API部署,一步步带你走完整个流程。

1. 准备工作:理解Qwen3-ASR-0.6B

在开始动手之前,先简单了解一下我们要用的这个模型。

Qwen3-ASR-0.6B是阿里开源的一个轻量级语音识别模型,参数大约9亿。别看它体积不大,能力可不弱:

  • 支持52种语言和方言:包括30种国际语言和22种中国方言
  • 识别效果不错:在中文、英文等主要语言上表现都挺好
  • 效率很高:并发128的时候,每秒能处理2000秒的音频
  • 支持流式和离线识别:一个模型搞定两种场景

对我们.NET开发者来说,最大的挑战是它原生是Python生态的。不过别担心,我们有办法解决。

2. 整体方案设计

要在.NET里用Qwen3-ASR,我想到两种主要方案:

方案一:Python服务 + .NET调用这是比较稳妥的做法。用Python搭建一个语音识别服务,然后.NET通过HTTP或者gRPC来调用。好处是Python生态的工具可以直接用,部署也简单。

方案二:.NET直接调用通过.NET的Python互操作,或者用ONNX转换后直接调用。这个方案更直接,但技术难度大一些。

考虑到稳定性和易用性,我选择了方案一。具体架构是这样的:

.NET应用 → HTTP请求 → Python FastAPI服务 → Qwen3-ASR模型 → 返回识别结果

这样.NET部分只需要处理HTTP请求,Python部分负责复杂的模型推理,各司其职。

3. 搭建Python语音识别服务

首先我们来搭建Python端的服务。我选择用FastAPI,因为它轻量、快速,而且自动生成API文档。

3.1 环境安装

创建一个新的Python环境,安装必要的包:

# 创建虚拟环境 python -m venv qwen_asr_env source qwen_asr_env/bin/activate # Linux/Mac # 或者 qwen_asr_env\Scripts\activate # Windows # 安装基础包 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install fastapi uvicorn pydantic # 安装Qwen3-ASR pip install qwen-asr # 如果需要vLLM后端(推荐,更快) pip install qwen-asr[vllm]

如果你的GPU支持,强烈建议安装vLLM版本,速度会快很多。

3.2 创建FastAPI服务

新建一个asr_service.py文件:

from fastapi import FastAPI, File, UploadFile, HTTPException from fastapi.responses import JSONResponse from pydantic import BaseModel import torch from qwen_asr import Qwen3ASRModel import tempfile import os from typing import Optional import logging # 配置日志 logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) app = FastAPI( title="Qwen3-ASR语音识别服务", description="基于Qwen3-ASR-0.6B的语音识别API", version="1.0.0" ) # 全局模型变量 model = None class TranscriptionRequest(BaseModel): language: Optional[str] = None # 语言代码,如"zh"、"en",None表示自动检测 return_timestamps: bool = False # 是否返回时间戳 class TranscriptionResponse(BaseModel): text: str language: str timestamps: Optional[list] = None success: bool message: str @app.on_event("startup") async def startup_event(): """启动时加载模型""" global model try: logger.info("正在加载Qwen3-ASR-0.6B模型...") # 加载模型,使用vLLM后端(如果安装了) model = Qwen3ASRModel.from_pretrained( "Qwen/Qwen3-ASR-0.6B", dtype=torch.bfloat16, device_map="cuda:0" if torch.cuda.is_available() else "cpu", max_inference_batch_size=32, max_new_tokens=256, ) logger.info("模型加载完成,服务已就绪") except Exception as e: logger.error(f"模型加载失败: {str(e)}") raise @app.get("/") async def root(): """健康检查""" return { "status": "running", "model": "Qwen3-ASR-0.6B", "supported_languages": "52种语言和方言" } @app.post("/transcribe", response_model=TranscriptionResponse) async def transcribe_audio( file: UploadFile = File(...), language: Optional[str] = None, return_timestamps: bool = False ): """转录音频文件""" if model is None: raise HTTPException(status_code=503, detail="模型未加载") # 检查文件类型 if not file.filename.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')): raise HTTPException(status_code=400, detail="仅支持wav、mp3、m4a、flac格式") try: # 保存上传的文件到临时文件 with tempfile.NamedTemporaryFile(delete=False, suffix=os.path.splitext(file.filename)[1]) as tmp_file: content = await file.read() tmp_file.write(content) tmp_path = tmp_file.name logger.info(f"处理文件: {file.filename}, 大小: {len(content)}字节") # 调用模型进行识别 results = model.transcribe( audio=tmp_path, language=language, return_time_stamps=return_timestamps ) # 清理临时文件 os.unlink(tmp_path) if not results: return TranscriptionResponse( text="", language="", timestamps=[] if return_timestamps else None, success=False, message="未检测到语音" ) result = results[0] return TranscriptionResponse( text=result.text, language=result.language, timestamps=result.time_stamps if return_timestamps else None, success=True, message="识别成功" ) except Exception as e: logger.error(f"识别失败: {str(e)}") raise HTTPException(status_code=500, detail=f"识别失败: {str(e)}") @app.post("/transcribe_batch") async def transcribe_batch(files: list[UploadFile] = File(...)): """批量转录音频文件""" if model is None: raise HTTPException(status_code=503, detail="模型未加载") results = [] for file in files: try: # 这里简化处理,实际应该并行处理 response = await transcribe_audio(file) results.append({ "filename": file.filename, "text": response.text, "language": response.language, "success": response.success }) except Exception as e: results.append({ "filename": file.filename, "text": "", "language": "", "success": False, "error": str(e) }) return {"results": results} if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)

这个服务提供了两个主要接口:

  • /transcribe: 单个音频文件识别
  • /transcribe_batch: 批量音频文件识别

3.3 启动服务

运行服务:

python asr_service.py

服务启动后,访问http://localhost:8000/docs可以看到自动生成的API文档。

4. 创建.NET客户端

现在Python服务跑起来了,接下来在.NET里调用它。

4.1 创建.NET项目

dotnet new webapi -n QwenASR.Client cd QwenASR.Client

4.2 添加必要的NuGet包

<Project Sdk="Microsoft.NET.Sdk.Web"> <PropertyGroup> <TargetFramework>net8.0</TargetFramework> <Nullable>enable</Nullable> <ImplicitUsings>enable</ImplicitUsings> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.AspNetCore.OpenApi" Version="8.0.0" /> <PackageReference Include="Swashbuckle.AspNetCore" Version="6.4.0" /> <PackageReference Include="Refit" Version="7.0.0" /> <PackageReference Include="Refit.HttpClientFactory" Version="7.0.0" /> </ItemGroup> </Project>

4.3 定义API接口模型

创建Models/TranscriptionModels.cs

namespace QwenASR.Client.Models; public class TranscriptionRequest { public string? Language { get; set; } public bool ReturnTimestamps { get; set; } } public class TranscriptionResponse { public string Text { get; set; } = string.Empty; public string Language { get; set; } = string.Empty; public List<Timestamp>? Timestamps { get; set; } public bool Success { get; set; } public string Message { get; set; } = string.Empty; } public class Timestamp { public string Text { get; set; } = string.Empty; public double Start { get; set; } public double End { get; set; } } public class BatchTranscriptionRequest { public List<IFormFile> Files { get; set; } = new(); } public class BatchTranscriptionResult { public string Filename { get; set; } = string.Empty; public string Text { get; set; } = string.Empty; public string Language { get; set; } = string.Empty; public bool Success { get; set; } public string? Error { get; set; } } public class BatchTranscriptionResponse { public List<BatchTranscriptionResult> Results { get; set; } = new(); }

4.4 创建Refit接口

创建Services/IASRService.cs

using Refit; namespace QwenASR.Client.Services; public interface IASRService { [Multipart] [Post("/transcribe")] Task<TranscriptionResponse> TranscribeAsync( [AliasAs("file")] StreamPart file, [AliasAs("language")] string? language = null, [AliasAs("return_timestamps")] bool returnTimestamps = false); [Multipart] [Post("/transcribe_batch")] Task<BatchTranscriptionResponse> TranscribeBatchAsync( [AliasAs("files")] IEnumerable<StreamPart> files); }

4.5 创建服务包装类

创建Services/ASRService.cs

using Microsoft.Extensions.Options; using Refit; using System.Net.Http.Headers; namespace QwenASR.Client.Services; public class ASRService : IASRService { private readonly IASRService _api; private readonly IHttpClientFactory _httpClientFactory; private readonly ASRServiceOptions _options; private readonly ILogger<ASRService> _logger; public ASRService( IHttpClientFactory httpClientFactory, IOptions<ASRServiceOptions> options, ILogger<ASRService> logger) { _httpClientFactory = httpClientFactory; _options = options.Value; _logger = logger; var httpClient = _httpClientFactory.CreateClient("ASRService"); httpClient.BaseAddress = new Uri(_options.BaseUrl); _api = RestService.For<IASRService>(httpClient); } public async Task<TranscriptionResponse> TranscribeAsync( StreamPart file, string? language = null, bool returnTimestamps = false) { try { _logger.LogInformation("开始语音识别,语言: {Language}", language ?? "自动检测"); var response = await _api.TranscribeAsync(file, language, returnTimestamps); _logger.LogInformation("语音识别完成,语言: {Language}, 文本长度: {Length}", response.Language, response.Text.Length); return response; } catch (ApiException ex) { _logger.LogError(ex, "语音识别API调用失败"); throw new ApplicationException($"语音识别失败: {ex.Message}", ex); } } public async Task<TranscriptionResponse> TranscribeFileAsync( IFormFile file, string? language = null, bool returnTimestamps = false) { await using var stream = file.OpenReadStream(); var streamPart = new StreamPart(stream, file.FileName, file.ContentType); return await TranscribeAsync(streamPart, language, returnTimestamps); } public async Task<TranscriptionResponse> TranscribeLocalFileAsync( string filePath, string? language = null, bool returnTimestamps = false) { if (!File.Exists(filePath)) { throw new FileNotFoundException($"文件不存在: {filePath}"); } var fileName = Path.GetFileName(filePath); var contentType = GetContentType(filePath); await using var stream = File.OpenRead(filePath); var streamPart = new StreamPart(stream, fileName, contentType); return await TranscribeAsync(streamPart, language, returnTimestamps); } public async Task<BatchTranscriptionResponse> TranscribeBatchAsync(IEnumerable<StreamPart> files) { try { _logger.LogInformation("开始批量语音识别,文件数: {Count}", files.Count()); var response = await _api.TranscribeBatchAsync(files); _logger.LogInformation("批量语音识别完成,成功: {SuccessCount}/{TotalCount}", response.Results.Count(r => r.Success), response.Results.Count); return response; } catch (ApiException ex) { _logger.LogError(ex, "批量语音识别API调用失败"); throw new ApplicationException($"批量语音识别失败: {ex.Message}", ex); } } public async Task<BatchTranscriptionResponse> TranscribeBatchFilesAsync(IEnumerable<IFormFile> files) { var streamParts = files.Select(file => { var stream = file.OpenReadStream(); return new StreamPart(stream, file.FileName, file.ContentType); }).ToList(); return await TranscribeBatchAsync(streamParts); } private static string GetContentType(string filePath) { var extension = Path.GetExtension(filePath).ToLowerInvariant(); return extension switch { ".wav" => "audio/wav", ".mp3" => "audio/mpeg", ".m4a" => "audio/mp4", ".flac" => "audio/flac", _ => "application/octet-stream" }; } } public class ASRServiceOptions { public string BaseUrl { get; set; } = "http://localhost:8000"; public int TimeoutSeconds { get; set; } = 300; }

4.6 配置依赖注入

Program.cs中添加:

using QwenASR.Client.Services; var builder = WebApplication.CreateBuilder(args); // 添加服务配置 builder.Services.Configure<ASRServiceOptions>( builder.Configuration.GetSection("ASRService")); // 添加HttpClient builder.Services.AddHttpClient("ASRService", (serviceProvider, client) => { var options = serviceProvider.GetRequiredService<IOptions<ASRServiceOptions>>().Value; client.BaseAddress = new Uri(options.BaseUrl); client.Timeout = TimeSpan.FromSeconds(options.TimeoutSeconds); client.DefaultRequestHeaders.Accept.Add( new MediaTypeWithQualityHeaderValue("application/json")); }); // 添加Refit服务 builder.Services.AddRefitClient<IASRService>() .ConfigureHttpClient((serviceProvider, client) => { var options = serviceProvider.GetRequiredService<IOptions<ASRServiceOptions>>().Value; client.BaseAddress = new Uri(options.BaseUrl); client.Timeout = TimeSpan.FromSeconds(options.TimeoutSeconds); }); // 添加自定义服务 builder.Services.AddScoped<ASRService>(); // 添加控制器 builder.Services.AddControllers(); // 添加Swagger builder.Services.AddEndpointsApiExplorer(); builder.Services.AddSwaggerGen(); var app = builder.Build(); if (app.Environment.IsDevelopment()) { app.UseSwagger(); app.UseSwaggerUI(); } app.UseHttpsRedirection(); app.UseAuthorization(); app.MapControllers(); app.Run();

appsettings.json中添加配置:

{ "Logging": { "LogLevel": { "Default": "Information", "Microsoft.AspNetCore": "Warning" } }, "ASRService": { "BaseUrl": "http://localhost:8000", "TimeoutSeconds": 300 }, "AllowedHosts": "*" }

4.7 创建控制器

创建Controllers/TranscriptionController.cs

using Microsoft.AspNetCore.Mvc; using QwenASR.Client.Models; using QwenASR.Client.Services; namespace QwenASR.Client.Controllers; [ApiController] [Route("api/[controller]")] public class TranscriptionController : ControllerBase { private readonly ASRService _asrService; private readonly ILogger<TranscriptionController> _logger; public TranscriptionController( ASRService asrService, ILogger<TranscriptionController> logger) { _asrService = asrService; _logger = logger; } [HttpPost("transcribe")] public async Task<ActionResult<TranscriptionResponse>> Transcribe( IFormFile file, [FromForm] string? language = null, [FromForm] bool returnTimestamps = false) { if (file == null || file.Length == 0) { return BadRequest("请上传有效的音频文件"); } try { var result = await _asrService.TranscribeFileAsync( file, language, returnTimestamps); return Ok(result); } catch (Exception ex) { _logger.LogError(ex, "语音识别失败"); return StatusCode(500, new TranscriptionResponse { Success = false, Message = $"识别失败: {ex.Message}" }); } } [HttpPost("transcribe/local")] public async Task<ActionResult<TranscriptionResponse>> TranscribeLocalFile( [FromBody] LocalFileRequest request) { if (string.IsNullOrEmpty(request.FilePath)) { return BadRequest("文件路径不能为空"); } try { var result = await _asrService.TranscribeLocalFileAsync( request.FilePath, request.Language, request.ReturnTimestamps); return Ok(result); } catch (FileNotFoundException ex) { return NotFound(new TranscriptionResponse { Success = false, Message = ex.Message }); } catch (Exception ex) { _logger.LogError(ex, "本地文件语音识别失败"); return StatusCode(500, new TranscriptionResponse { Success = false, Message = $"识别失败: {ex.Message}" }); } } [HttpPost("transcribe/batch")] public async Task<ActionResult<BatchTranscriptionResponse>> TranscribeBatch( List<IFormFile> files) { if (files == null || files.Count == 0) { return BadRequest("请上传至少一个音频文件"); } if (files.Count > 10) { return BadRequest("单次最多处理10个文件"); } try { var result = await _asrService.TranscribeBatchFilesAsync(files); return Ok(result); } catch (Exception ex) { _logger.LogError(ex, "批量语音识别失败"); return StatusCode(500, new BatchTranscriptionResponse { Results = files.Select(f => new BatchTranscriptionResult { Filename = f.FileName, Success = false, Error = ex.Message }).ToList() }); } } [HttpGet("health")] public async Task<ActionResult> HealthCheck() { try { using var client = new HttpClient(); var response = await client.GetAsync("http://localhost:8000/"); if (response.IsSuccessStatusCode) { return Ok(new { Status = "Healthy", Message = "ASR服务运行正常" }); } return StatusCode(503, new { Status = "Unhealthy", Message = "ASR服务不可用" }); } catch (Exception ex) { return StatusCode(503, new { Status = "Unhealthy", Message = $"ASR服务连接失败: {ex.Message}" }); } } } public class LocalFileRequest { public string FilePath { get; set; } = string.Empty; public string? Language { get; set; } public bool ReturnTimestamps { get; set; } }

5. 测试和使用

5.1 启动服务

首先确保Python服务在运行:

# 在Python环境目录下 python asr_service.py

然后启动.NET服务:

dotnet run

5.2 使用Swagger测试

访问http://localhost:5000/swagger,可以看到我们创建的所有API接口。

测试单个文件识别:

  1. 选择/api/Transcription/transcribe接口
  2. 点击 "Try it out"
  3. 上传一个音频文件(wav、mp3等格式)
  4. 可选:指定语言代码(如 "zh" 表示中文,"en" 表示英文)
  5. 点击 "Execute"

测试批量识别:

  1. 选择/api/Transcription/transcribe/batch接口
  2. 上传多个音频文件
  3. 点击 "Execute"

5.3 代码调用示例

你也可以在代码中直接调用:

// 在Program.cs中添加测试代码 app.MapPost("/test", async (ASRService asrService) => { // 测试本地文件 var result = await asrService.TranscribeLocalFileAsync( "test_audio.wav", language: "zh", returnTimestamps: true); Console.WriteLine($"识别结果: {result.Text}"); Console.WriteLine($"语言: {result.Language}"); if (result.Timestamps != null) { foreach (var ts in result.Timestamps) { Console.WriteLine($"[{ts.Start:F2}s-{ts.End:F2}s] {ts.Text}"); } } return result; });

6. 性能优化和注意事项

6.1 性能优化建议

  1. 使用vLLM后端:如果GPU可用,一定要用vLLM版本,速度提升很明显
  2. 批量处理:多个文件尽量用批量接口,减少网络开销
  3. 合理设置超时:长音频识别需要较长时间,适当增加超时设置
  4. 内存管理:及时释放文件流,避免内存泄漏

6.2 常见问题处理

问题1:Python服务启动失败

  • 检查CUDA是否可用:torch.cuda.is_available()
  • 检查Python包是否安装完整
  • 检查端口是否被占用

问题2:识别速度慢

  • 确认是否使用了vLLM后端
  • 检查GPU内存是否充足
  • 考虑使用Qwen3-ASR-0.6B而不是1.7B版本

问题3:识别准确率不高

  • 确保音频质量较好,背景噪声不要太大
  • 明确指定语言参数,避免自动检测错误
  • 对于特定方言,可以尝试使用对应的语言代码

6.3 生产环境部署建议

  1. 使用Docker容器化:将Python服务打包成Docker镜像,方便部署
  2. 添加负载均衡:如果并发量大,可以部署多个Python服务实例
  3. 添加监控和日志:记录识别成功率、响应时间等指标
  4. 实现熔断和重试:在网络不稳定的情况下自动重试
  5. 添加认证和限流:保护API不被滥用

7. 扩展功能

7.1 添加流式识别

Qwen3-ASR支持流式识别,我们可以扩展API支持实时音频流:

# 在asr_service.py中添加 @app.post("/transcribe_stream") async def transcribe_stream( audio_stream: bytes = Body(...), language: Optional[str] = None ): """流式语音识别""" # 将字节流保存为临时文件 with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp: tmp.write(audio_stream) tmp_path = tmp.name try: results = model.transcribe( audio=tmp_path, language=language ) os.unlink(tmp_path) if results: return {"text": results[0].text, "language": results[0].language} return {"text": "", "language": ""} except Exception as e: if os.path.exists(tmp_path): os.unlink(tmp_path) raise HTTPException(status_code=500, detail=str(e))

7.2 添加WebSocket支持

对于实时语音识别场景,可以添加WebSocket支持:

from fastapi import WebSocket @app.websocket("/ws/transcribe") async def websocket_transcribe(websocket: WebSocket): await websocket.accept() try: while True: # 接收音频数据 data = await websocket.receive_bytes() # 处理音频并识别 with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp: tmp.write(data) tmp_path = tmp.name results = model.transcribe(audio=tmp_path) os.unlink(tmp_path) if results: await websocket.send_json({ "text": results[0].text, "language": results[0].language }) except Exception as e: await websocket.close(code=1011, reason=str(e))

7.3 添加缓存机制

对于相同的音频内容,可以添加缓存避免重复识别:

import hashlib from functools import lru_cache def get_audio_hash(audio_data: bytes) -> str: """计算音频数据的哈希值""" return hashlib.md5(audio_data).hexdigest() @lru_cache(maxsize=1000) def cached_transcribe(audio_hash: str, language: Optional[str] = None): """带缓存的语音识别""" # 这里需要根据audio_hash获取音频数据 # 实际实现中需要存储音频数据和哈希的映射 pass

获取更多AI镜像

想探索更多AI镜像和应用场景?访问 CSDN星图镜像广场,提供丰富的预置镜像,覆盖大模型推理、图像生成、视频生成、模型微调等多个领域,支持一键部署。

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/4/18 4:32:05

HsMod:炉石传说体验增强插件完全指南

HsMod&#xff1a;炉石传说体验增强插件完全指南 【免费下载链接】HsMod Hearthstone Modify Based on BepInEx 项目地址: https://gitcode.com/GitHub_Trending/hs/HsMod &#x1f534; 安全风险警示 ⚠️ 账号安全警告 中国大陆地区炉石传说客户端默认启用反作弊SDK&a…

作者头像 李华
网站建设 2026/4/9 16:35:09

Qwen3-VL:30B代码审查助手:自动检测安全漏洞与性能问题的实测

Qwen3-VL:30B代码审查助手&#xff1a;自动检测安全漏洞与性能问题的实测 1. 这不是传统代码扫描器&#xff0c;而是一位能“看懂”代码的智能伙伴 第一次看到Qwen3-VL:30B被称作“代码审查助手”时&#xff0c;我其实有点怀疑。毕竟市面上的静态分析工具已经不少了&#xff…

作者头像 李华
网站建设 2026/4/17 16:05:08

InstructPix2Pix与LangChain集成:智能图像生成系统

InstructPix2Pix与LangChain集成&#xff1a;智能图像生成系统 1. 当修图不再需要专业技能 你有没有过这样的经历&#xff1a;想给一张照片里的人物加副墨镜&#xff0c;或者把阴天的风景变成阳光明媚的样子&#xff0c;结果打开Photoshop&#xff0c;面对密密麻麻的图层和工…

作者头像 李华
网站建设 2026/4/13 17:59:57

一键部署DAMO-YOLO:阿里达摩院视觉AI快速体验

一键部署DAMO-YOLO&#xff1a;阿里达摩院视觉AI快速体验 1. 为什么你需要这个视觉探测系统&#xff1f; 你是否遇到过这样的场景&#xff1a; 想快速验证一张图片里有哪些物体&#xff0c;却要花半小时配置环境、下载模型、写推理脚本&#xff1f;做工业质检时&#xff0c;…

作者头像 李华