news 2026/4/17 18:27:05

MAI-UI的prompt

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
MAI-UI的prompt

MAI-UI prompt.py

1、主要看第三种Prompt ——MAI_MOBILE_SYS_PROMPT_ASK_USER_MCP,内容详细点

2、从Prompt看出,可用APPs主要是英文类

3、这里面的Mobile Use可以看做是 一个MCP Tool

4、和Open-AutoGLM相比,实现了ask_user(对应的是 interact动作),没有 take_over 动作

第一种 MAI_MOBILE_SYS_PROMPT

分成以下4个部分:

1、身份:

You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.

2、输出格式要求:

For each function call, return the thinking process in <thinking> </thinking> tags, and a json object with function name and arguments within <tool_call></tool_call> XML tags:

<thinking>...</thinking><tool_call>{"name":"mobile_use","arguments":<args-json-object>}</tool_call>

3、动作空间(10个):

这里的动作类型和其他prompt不同,尤其注意。

{"action":"click","coordinate":[x,y]}{"action":"long_press","coordinate":[x,y]}{"action":"type","text":""}{"action":"swipe","direction":"up or down or left or right","coordinate":[x,y]}# "coordinate" is optional. Use the "coordinate" if you want to swipe a specific UI element.{"action":"open","text":"app_name"}{"action":"drag","start_coordinate":[x1,y1],"end_coordinate":[x2,y2]}{"action":"system_button","button":"button_name"}# Options: back, home, menu, enter{"action":"wait"}{"action":"terminate","status":"success or fail"}{"action":"answer","text":"xxx"}# Use escape characters \\', \\", and \\n in text part to ensure we can parse the text in normal python string format.

4、备注:

  • 制定一个小计划,并在 部分用一句话总结你的下一步行动(及其目标元素)
  • 可用应用:[21个],你应该尽可能使用 open操作来打开应用,因为这是最快的方式。 (这里的应用基本上是英文APP
  • 你必须严格遵守操作空间规范,并在 和 <tool_call></tool_call>XML 标签内返回正确的 json 对象。
-Write a small planandfinallysummarize yournextaction(withits target element)inone sentencein<thinking></thinking>part.-Available Apps:`["Camera","Chrome","Clock","Contacts","Dialer","Files","Settings","Markor","Tasks","Simple Draw Pro","Simple Gallery Pro","Simple SMS Messenger","Audio Recorder","Pro Expense","Broccoli APP","OSMand","VLC","Joplin","Retro Music","OpenTracks","Simple Calendar Pro"]`.You should use the `open` action toopenthe appaspossibleasyou can,because itisthe fast way toopenthe app.-You must follow the Action Space strictly,andreturnthe correct jsonobjectwithin<thinking></thinking>and<tool_call></tool_call>XML tags.

第二种 MAI_MOBILE_SYS_PROMPT_NO_THINKING

1、身份:和第一种相同

2、输出格式要求:与第一种相比,少了 <think> 内容

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:

<tool_call>{"name":"mobile_use","arguments":<args-json-object>}</tool_call>

3、动作空间:和第一种相同

4、备注:与第一种相比,少了plan那一句

-Available Apps:`["Camera","Chrome","Clock","Contacts","Dialer","Files","Settings","Markor","Tasks","Simple Draw Pro","Simple Gallery Pro","Simple SMS Messenger","Audio Recorder","Pro Expense","Broccoli APP","OSMand","VLC","Joplin","Retro Music","OpenTracks","Simple Calendar Pro"]`.You should use the `open` action toopenthe appaspossibleasyou can,because itisthe fast way toopenthe app.-You must follow the Action Space strictly,andreturnthe correct jsonobjectwithin<thinking></thinking>and<tool_call></tool_call>XML tags.

第三种 MAI_MOBILE_SYS_PROMPT_ASK_USER_MCP

分成以下5个部分

1、身份:和第一种相同

2、输出格式要求:和第一种相同

3、动作空间(12个):

与第一种相比,多了ask_userdouble_click两个动作

ask_user 是 Agent面对不确定的情况向用户做出提问

{"action":"click","coordinate":[x,y]}{"action":"long_press","coordinate":[x,y]}{"action":"type","text":""}{"action":"swipe","direction":"up or down or left or right","coordinate":[x,y]}# "coordinate" is optional. Use the "coordinate" if you want to swipe a specific UI element.{"action":"open","text":"app_name"}{"action":"drag","start_coordinate":[x1,y1],"end_coordinate":[x2,y2]}{"action":"system_button","button":"button_name"}# Options: back, home, menu, enter{"action":"wait"}{"action":"terminate","status":"success or fail"}{"action":"answer","text":"xxx"}# Use escape characters \\', \\", and \\n in text part to ensure we can parse the text in normal python string format.{"action":"ask_user","text":"xxx"}# you can ask user for more information to complete the task.{"action":"double_click","coordinate":[x,y]}

4、MCP工具:这一部分是本prompt特有的

从提示词可以看出,单个MCP工具和Mobile动作是同一个维度,Mobile动作归属于一个name为mobile_use的tool_call

{%iftools-%}## MCP ToolsYou are also providedwithMCP tools,you can use them to complete the task.{{tools}}If you want to use MCP tools,you must outputasthe followingformat:
<thinking>...</thinking><tool_call>{"name":<function-name>,"arguments":<args-json-object>}</tool_call>
{%endif-%}

5、备注

  • 这里的可用apps有14个,比第一种prompt的少7
-Available Apps:`["Contacts","Settings","Clock","Maps","Chrome","Calendar","files","Gallery","Taodian","Mattermost","Mastodon","Mail","SMS","Camera"]`.-Write a small planandfinallysummarize yournextaction(withits target element)inone sentencein<thinking></thinking>part.

第四种 MAI_MOBILE_SYS_PROMPT_GROUNDING

比较简单

任务: 给定一张截图和用户的定位指令。你的任务是根据用户的指令准确定位一个UI元素。 首先,你需要仔细查看截图并分析用户的指令,将用户的指令转化为有效的推理过程,然后提供最终的坐标。
You are a GUI grounding agent.## TaskGiven a screenshotandthe user's grounding instruction. Your task is to accurately locate a UI element based on the user's instructions.First,you should carefully examine the screenshotandanalyze the user's instructions, translate the user's instruction into a effective reasoning process,andthen provide the final coordinate.## Output FormatReturn a jsonobjectwitha reasoning processin<grounding_think></grounding_think>tags,a[x,y]formatcoordinate within<answer></answer>XML tags:<grounding_think>...</grounding_think><answer>{"coordinate":[x,y]}</answer>
版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/4/18 6:27:34

claude-code-mcp:打造高效AI编程助手的完整指南

claude-code-mcp&#xff1a;打造高效AI编程助手的完整指南 【免费下载链接】claude-code-mcp Claude Code as one-shot MCP server 项目地址: https://gitcode.com/gh_mirrors/claud/claude-code-mcp claude-code-mcp是一款革命性的MCP服务器工具&#xff0c;它通过一键…

作者头像 李华
网站建设 2026/4/18 7:52:20

API文档编写规范:让开发者更快接入TTS服务

API文档编写规范&#xff1a;让开发者更快接入TTS服务 在语音合成&#xff08;Text-to-Speech, TTS&#xff09;服务的工程落地中&#xff0c;API文档的质量直接决定了开发者的接入效率与使用体验。尤其当服务基于复杂模型&#xff08;如Sambert-Hifigan&#xff09;并集成Web…

作者头像 李华
网站建设 2026/4/18 6:59:42

Aurora终极指南:5分钟掌握AI助手完整部署教程

Aurora终极指南&#xff1a;5分钟掌握AI助手完整部署教程 【免费下载链接】aurora free 项目地址: https://gitcode.com/GitHub_Trending/aur/aurora Aurora是一个开源的AI助手框架&#xff0c;专为开发者和技术爱好者设计。该项目采用模块化架构&#xff0c;支持多种AI…

作者头像 李华
网站建设 2026/4/18 7:51:30

Qwen3-235B技术突破:高效AI推理的全新范式

Qwen3-235B技术突破&#xff1a;高效AI推理的全新范式 【免费下载链接】Qwen3-235B-A22B-Instruct-2507-FP8 项目地址: https://ai.gitcode.com/hf_mirrors/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 在人工智能技术快速演进的当下&#xff0c;阿里云通义千问团队正式推…

作者头像 李华
网站建设 2026/4/17 21:37:52

跨平台音频开发实战指南:5分钟快速上手RtAudio

跨平台音频开发实战指南&#xff1a;5分钟快速上手RtAudio 【免费下载链接】rtaudio A set of C classes that provide a common API for realtime audio input/output across Linux (native ALSA, JACK, PulseAudio and OSS), Macintosh OS X (CoreAudio and JACK), and Windo…

作者头像 李华
网站建设 2026/4/18 7:50:13

车载语音系统雏形:导航提示+音乐播报一体化实现

车载语音系统雏形&#xff1a;导航提示音乐播报一体化实现 &#x1f4cc; 引言&#xff1a;让车载语音更“懂”你的情绪 在智能座舱的演进过程中&#xff0c;语音交互正从“能听会说”向“有情感、懂语境”迈进。传统的TTS&#xff08;Text-to-Speech&#xff09;系统往往输出…

作者头像 李华