判定问题与语言, 递归可枚举，非递归可枚举，对角语言（理论计算机基础复习六)-程序员充电站

A decision problem requires checking if an input (string) has some property. Thus, a decision problem is a function from strings to boolean.
一个判定问题要求检查输入（字符串）是否具有某种性质。因此，判定问题可看作从字符串到布尔值的函数。
A decision problem is represented as a formal language consisting of those strings (inputs) on which the answer is yes.
一个判定问题可表示为一个形式语言：该语言恰好由那些答案为yes 的输入字符串组成（满足条件的字符串）

Recursive Enumerability 递归可枚举性

A Turing Machine on input w either accepts (and halts), or rejects (and halts), or never halts.
图灵机在输入 w 上要么接受（并停机），要么拒绝（并停机），要么永不停机。我们叫w 是具有递归可枚举性质的，或者叫w 是图灵机可识别的。

The language of a Turing Machine M, denoted as L(M), is the set of all strings w on which M accepts.
图灵机 M 的语言记作 L(M)，它是所有会被 M 接受的字符串 w 的集合。

A language L is recursively enumerable or Turing recognizable if there is a Turing Machine M such that L(M) = L.
若存在图灵机 M 使得 L(M) = L，则语言 L 称为递归可枚举（或图灵可识别）。

Decidability 可判定性

A language L is decidable if there is a Turing machine M such that L(M) = L andM halts on every input.
若存在图灵机 M，使得L(M) = L 且 M 对每个输入都停机，则语言 L 是可判定的。

Thus, if L is decidable then L is recursively enumerable.
因此，若 L 可判定，则 L 一定是递归可枚举的。

Undecidability 不可判定性

Definition: A language L is undecidable if L is not decidable. Thus, there is no Turing machine M that halts on every input and L(M) = L.
定义：若语言 L 不是可判定的，则称 L 不可判定。因此，不存在图灵机 M 既对每个输入都停机又满足 L(M) = L。

This means either L is not recursively enumerable, i.e. there is no Turing machine M such that L(M) = L
这意味着要么 L 不是递归可枚举的，即不存在图灵机 M 使得 L(M) = L

L is recursively enumerable but not decidable, i.e. for any Turing machine M with L(M) = L, M does not halt on some inputs.
L 是递归可枚举但不可判定，也就是对任何满足 L(M) = L 的图灵机 M，M 在某些输入上不停止。

Machines as Strings 把机器编码成字符串

For the rest of this lecture, let us fix theinput alphabet to be {0,1}; a string over any alphabet can be encoded in binary.
在本讲其余部分，固定输入字母表为 {0,1}；任意字母表上的字符串都可编码为二进制。

Any Turing Machine or program M can itself be encoded as a binary string. Moreover, every binary string can be thought of as encoding a TM or program (if not in the correct format, it is considered the encoding of a default TM).
任意图灵机或程序 M 本身都可编码为二进制串。此外，每个二进制串都可视作某个 TM 或程序的编码（若格式不合法，则视为某个默认 TM 的编码）。

We will consider decision problems (languages) whose inputs are Turing machines (encoded as binary strings).
我们将考虑这类判定问题（语言）：其输入本身是图灵机（以二进制串编码）。

The Diagonal Language 对角语言

Definition: L_d = { M | M ∉ L(M) }.
定义：L_d = { M | M ∉ L(M) }。

Thus, L_d is the collection of Turing machines (programs) M such that M does not halt and accept when given itself as input.
因此，L_d 是这样一类图灵机（程序）M 的集合：当把 M 自身作为输入时，M 不会停机并接受

核心逻辑分析：

输入的特殊性：在图灵机理论中，程序 M 本身可以被编码成字符串。因此，我们可以把程序 M 的代码作为输入数据，喂给 M 自己去运行。

判定标准：

如果 M 运行自己的代码后，进入了“接受”状态，那么 M 就不属于 L_d。

如果 M 运行自己的代码后，结果是“拒绝”或者“死循环（不停机）”，那么 M 就属于 L_d。

为什么叫“对角”：想象一个表格，行和列都是所有的图灵机。第 i 行第 j 列表示“图灵机 i 是否接受字符串 j”。当 i = j 时，就构成了表格的对角线。L_d 关注的就是这些对角线上的元素。

额外的例子：

想象有一类特殊的程序员，他们的工作是测试程序。

有些程序员很自恋，他们写的程序在测试自己的代码时，总是显示“通过（接受）”。
有些程序员很严谨或很奇怪，他们写的程序在测试自己的代码时，总是显示“报错（拒绝）”或者干脆“卡死（死循环）”。

对角语言 L_d 就是“所有用自己的程序测试自己的程序会报错或卡死的程序”的集合。

A non-Recursively Enumerable Language 非递归可枚举语言（L_d）

Proposition: L_d is not recursively enumerable.
命题：L_d 不是递归可枚举的。

Proof.

Recall that,
回忆如下：
Inputs are strings over {0,1}.
输入是 {0,1} 上的字符串。

Every Turing Machine can be described by a binary string, and every binary string can be viewed as a Turing Machine.
每个图灵机都可由某个二进制串描述；每个二进制串也都可看作某个图灵机。

In what follows, we denote the i-th binary string in lexicographic order as the number i. Thus we can say j ∈ L(i), meaning that the Turing machine corresponding to the i-th binary string accepts the j-th binary string.
下文把按字典序第 i 个二进制串记作编号 i。于是可写 j ∈ L(i)，表示「第 i 个二进制串所对应的图灵机」接受「第 j 个二进制串」。

Completing the proof
补全证明

Proof (contd).
证明（续）

We can organize all programs and inputs as an infinite matrix, where the (i, j)-th entry is Y if and only if j ∈ L(i).
把所有程序与所有输入排成一张无穷矩阵：第 (i, j) 格为 Y，当且仅当 j ∈ L(i)。

Suppose L_d is recognized by a Turing machine that is the j-th binary string, i.e. L_d = L(j).
假设 L_d 能被「第 j 个二进制串所对应的」某台图灵机识别，即 L_d = L(j)。

But j ∈ L_d iff j ∉ L(j)!
但 j ∈ L_d 当且仅当 j ∉ L(j)！

（图中矩阵：列为输入 1, 2, 3, …；行为机器 1, 2, 3, …；对角线 (i, i) 用红框标出；其余格为 Y/N。）
（In the figure: columns are inputs 1, 2, 3, …; rows are TMs 1, 2, 3, …; diagonal (i, i) is boxed in red; entries are Y/N.）

Acceptor for L_d?
L_d 的判定器（接受器）？

Consider the following program:
考虑下面程序：

On input i
输入为 i 时

Run program i on i
运行程序 i 在输入 i 上

Output “yes” if i does not accept i
若 i 不接受 i，则输出 “yes”

Output “no” if i accepts i
若 i 接受 i，则输出 “no”

Does the above program recognize L_d?
No, because it may never output “yes” if i does not halt on i.

上面的程序是否识别 L_d？
不是，因为如果 i 在 i 上不停机，它可能永远输出不了 “yes”。、

Recursively Enumerable but not Decidable 递归可枚举但不可判定

L_d is not recursively enumerable, and therefore not decidable. Are there languages that are recursively enumerable but not decidable?
L_d 不是递归可枚举的，因而也不是可判定的。是否存在「递归可枚举但不可判定」的语言？

Yes: A_TM = { ⟨M, w⟩ | M is a TM and M accepts w }.
有：A_TM = { ⟨M, w⟩ | M 是 TM 且 M 接受 w }。

The Universal Language 通用语言（接受问题）

Proposition: A_TM is recursively enumerable but not decidable.
命题：A_TM 是递归可枚举的（可识别），但不是可判定的。

Proof: A_TM is r.e.
证明：A_TM 是递归可枚举的（可识别，有3种结果，停机接受，停机拒绝，陷入循环）。

On input ⟨M, w⟩
输入 ⟨M, w⟩

Run M on input w
在输入 w 上运行 M

Output “yes” if M accepts w
若 M 接受 w，则输出 “yes”

Output “no” if M rejects w
若 M 拒绝 w，则输出 “no”

（注意：若 M 在 w 上不停机，则该过程也不会停；因此这给出的是识别器而非判定器。）
（Note: if M loops on w, this procedure does not halt; hence this is a recognizer, not a decider.）

Proof: A_TM is not decidable
证明：A_TM 不可判定。

Suppose (for contradiction) A_TM is decidable. Then there is a TM M that always halts and L(M) = A_TM. Consider TM D as follows:
为证矛盾，设 A_TM 可判定，则存在总停机的 TM M 且 L(M) = A_TM。
(Atm 可判定意味着存在一个图灵机，比如U，它都能在有限时间内告诉你结果（接受或拒绝），绝对不会死循环。)通过将问题转化（归约）为我们之前学过的“对角语言L d L_dLd”来制造矛盾

定义 TM D 如下：

On input i
输入为 i 时

Run M on input ⟨i, i⟩ （⟨i, i⟩ 意思是<待测试的机器，机器的输入数据>）
在输入 ⟨i, i⟩ 上运行 M

Output “yes” if i rejects i
若 i 拒绝 i，则输出 “yes”

Output “no” if i accepts i
若 i 接受 i，则输出 “no”

Observe that L(D) = L_d! But L_d is not r.e., which is a contradiction.
可见 L(D) = L_d！但 L_d 不是递归可枚举的，矛盾。所有最初的假设——Atm可判定是错误的，即，Atm是不可判定。

Languages ⊃ Recursively Enumerable ⊃ Decidable ⊃ Regular.
语言类 ⊃ 递归可枚举 ⊃ 可判定 ⊃ 正则。

Outside Recursively Enumerable (still languages): examples include L_d and ĀTM (complement of A_TM).
在「仍为语言、但不属于递归可枚举」的区域：例如 L_d 与 ĀTM（A_TM 的补语言）。

Inside Recursively Enumerable but not Decidable: example A_TM.
递归可枚举但不可判定：例如 A_TM。

Inside Decidable but not Regular: example L_0ⁿ1ⁿ.
可判定但不正则：例如 L_0ⁿ1ⁿ。

Innermost: Regular.
最内层：正则语言类。