正規語言相關文章參考資料為陳穎平教授上課講義

Regular Languages

Finite Automata

有限自動機
or DFA: Deterministic finite automata 決定性有限自動機

Formal Definition of Finite Automata

Definition

A finite automaton is a 5-tuple $(Q, Σ, δ, q_0, F)$ , where

$Q$ : is a finite set called the states;
$Σ$ : is a finite set called the alphabet;
$δ$ : $Q × Σ → Q$ is the transition function;
$q_0 ∈ Q$ : is the start state;
$F ⊆ Q$ : is the set of accept states or final states.

Formal Definition of Computation

Finite automaton $M = (Q, Σ, δ, q_0, F)$ accepts string $w = w_1w_2 ··· w_n$ if a sequence of states $r_0, r_1, ··· , r_n$ in $Q$ exists with three conditions:

$r_0 = q_0$ ; 滿足起始狀態
$δ(r_i, w_{i+1}) = r_{i+1}, ∀i = 0, ... , n − 1$ ; 滿足狀態轉移
$r_n ∈ F$ . 滿足結束狀態

If A is the set of all strings that machine M accepts, we say that A is the language of machine M and write $L(M) = A$ $L (M) = A$ .
- M accepts string
- M recognizes/accepts set of strings(language)
A machine always recognizes only one language.

Definition

A language is called a regular language if some finite automaton recognizes it.

The Regular Operations

Definition

Let A and B be languages, let’s define the following three regular operations:

Union: $A ∪ B = \{x~|~x ∈ A~or~x ∈ B\}$ ;
Concatenation: $A ◦ B = \{xy~|~x ∈ A~and~y ∈ B\}$ ;
Star: $A^∗ = \{x_1x_2 ··· x_k~|~k ≥ 0~and~each~x_i ∈ A\}$ .

Definition

The class of regular languages is closed under the union/concatenation/Star operation.

Nondeterminism

NFA: Nondeterministic finite automata 非決定性有限自動機

Deterministic computation: Determined next move.
Nondeterministic computation: Several choices may exist for the next move. (類似 fork a process)

Formal Definition of NFAs

Definition

A nondeterministic finite automaton is a 5-tuple $(Q, Σ, δ, q_0, F)$ , where

$Q$ : is a finite set of states;
$Σ$ : is a finite alphabet;
$δ$ : $Q × Σ_ε → P(Q)$ is the transition function;
$q_0$ ∈ Q: is the start state;
$F ⊆ Q$ : is the set of accept states.

Formal Definition of Computation

Nondeterministic finite automaton $N = (Q, Σ, δ, q_0, F )$ accepts string $w = y_1y_2 ··· y_m$ , where $y_i ∈ Σ_ε$ , if a sequence of states $r_0, r_1, ··· , r_m$ in $Q$ exists with three conditions:

$r_0 = q_0$ ;
$r_{i+1} ∈ δ(r_i, y_{i+1}), ∀i = 0, ... , m − 1$ ;
$r_m ∈ F$ .

在 NFA 定義的 3. $δ$ $δ$ : $Q × Σ_ε → P(Q)$ $Q \times Σ_{ε} \to P (Q)$
- $Σ_ε$ 除了原有的 alphabet 之外還允許 $ε$ (empty string)
- $P(Q)$ 指的是 $Q$ 的 Power set，代表有多種可能
在 Computation 定義的 2. $r_{i+1} ∈ δ(r_i, y_{i+1})$ $r_{i + 1} \in δ (r_{i}, y_{i + 1})$
- 使用 $∈$ 符號是因為只要滿足狀態轉移中的某個可能就好
- string $w = y_1y_2 ··· y_m$ 指的不一定是輸入的字串本身， $m$ 也不一定為字串長度，因為可能包含 $ε$

Equivalence of NFAs and DFAs

Theorem

Every DFA has an equivalent NFA.

容易證明 (DFA 是 NFA 的特例)
只需要修改 transition function

Theorem

Every NFA has an equivalent DFA.

證明: 將一個 NFA: $N = (Q, Σ, δ, q_0, F)$ $N = (Q, Σ, δ, q_{0}, F)$ 改寫為 DFA: $M = (Q', Σ, δ', q_0', F')$ $M = (Q^{'}, Σ, δ^{'}, q_{0}^{'}, F^{'})$
- $Q' = P(Q)$
- $δ'(R,a) = ⋃_{r∈R}δ(r,a)$ $δ^{'} (R, a) = ⋃_{r \in R} δ (r, a)$
  - ex: $δ'(\{q_1, q_2\},a) = δ(q_1,a)∪δ(q_2,a)$
- $q_0' = \{q_0\}$
- $F' = \{R∈Q'~|~R~contain~an~accept~state~of~N\}$
考慮加上 $ε$ $ε$ 的情況
- 定義 $E(R)$ : $R~state$ 經過 0 至多個 $ε$ arrows 之後所能到達的狀態的集合
- $δ'(R,a) = ⋃_{r∈R}E(δ(r,a))$
- $q_0' = E(\{q_0\})$

實際轉換時要注意:

如果一個 state 對於某個操作沒有定義 (沒有 arrow)，則應該指向 ∅ state
完成圖之後可以把只有向外 arrow 且不是 start state 的 states 刪除 (不可能到達)
記得每次狀態轉換時要考慮 $ε$

Corollary

A language is regular if and only if some nondeterministic finite automaton recognizes it.

Closure under the Regular Operations

根據以上性質，我們可以透過建立 NFA 的方式來證明 Regular Operations 的封閉性

Regular Expressions

RE: 正則表達式 - 用於表達某個特定 language

Formal Definition (Inductive Definition)

Definition

$R$ is a regular expression if $R$ is

$a$ $a$ for some $a$ $a$ in the alphabet $Σ~[L(a) = \{a\}]$ $Σ [L (a) = {a}]$ ; 包含 1 個長度為 1 的字串
- regular expression $a$ 所表達的 language 是 $\{a\}$
$ε~[L(ε) = \{ε\}]$ ; 包含 1 個長度為 0 的字串
$∅~[L(∅) = ∅]$ ; 包含 0 個字串 (空集合)
$(R_1 ∪ R_2)$ , where $R_1$ and $R_2$ are regular expressions
$[L(R_1 ∪ R_2) = L(R_1) ∪ L(R_2)]$ ; 類似加法
$(R_1 ◦ R_2)$ , where $R_1$ and $R_2$ are regular expressions
$[L(R_1 ◦ R_2) = L(R_1) ◦ L(R_2)]$ ; 類似乘法
$(R_1^*)$ , where $R_1$ is a regular expression $[L(R_1^*) = (L(R_1))^∗]$ .

Equivalence with Finite Automata

Theorem

A language is regular if and only if some regular expression describes it.

Lemma

If a language is described by a regular expression, then it is regular.

證明: Use an NFA to recognize the language described by a regular expression.
- 根據 regular expression 的定義:
  1. $R=a~(a∈Σ)$ , then $[L(a) = \{a\}]$
  2. $R=ε$ , then $[L(ε) = \{ε\}]$
  3. $R=∅$ , then $[L(∅) = \{∅\}]$
  4. $R=R_1 ∪ R_2$ 已經證過其封閉性
  5. $R=R_1 ◦ R_2$ 已經證過其封閉性
  6. $R=R_1^*$ 已經證過其封閉性

Lemma

If a language is regular, then it is described by a regular expression.

證明: Convert DFAs into regular expressions.

首先定義 GNFA 幫助轉換:

Definition

A generalized nondeterministic finite automaton is a 5-tuple $(Q, Σ, δ, q_{start}, q_{accept})$ , where

$Q$ : is a finite set of states;
$Σ$ : is the input alphabet;
$δ$ : $(Q − \{q_{accept}\}) × (Q − \{q_{start}\}) → R$ is the transition function;
$q_{start}$ : is the start state;
$q_{accept}$ : is the accept state. 唯一的接受狀態

A GNFA accepts a string $w$ in $Σ^∗$ if $w = w_1w_2 ··· w_k$ , where each $w_i ∈ Σ^∗$ and a sequenceof states $q_0, q_1, ··· , q_k$ exists such that

$q_0 = q_{start}$ ;
$q_k = q_{accept}$ ;
for each $i$ , we have $w_i ∈ L(R_i)$ , where $R_i = δ(q_{i−1}, q_i)$ .

注意:

GNFA 每條邊上的是 regular expressions 而不是 symbol
$w_i ∈ Σ^∗$ 所代表的是一段能夠被 regular expressions 描述的子字串，而不是單一 symbol

Every DFA has an equivalent GNFA.
- 容易證明 (DFA 是 GNFA 的特例)
- 加上一個 start state，用 $ε~arrow$ 與原有的 $q_0$ 連接
- 加上一個 accept state，用 $ε~arrow$ 與原有的 $F$ 每個點連接
- 將原本 transition function 的 symbol 改為 regular expressions
CONVERT(G) 是用來刪除節點的函式
- Claim For any GNFA $G$ , CONVERT(G) is equivalent to $G$ .
將 DFA 轉為 GNFA，之後利用 CONVERT(G) 不斷刪除節點，直到只剩下加上的 start state和 accept state 時，兩者之間的就是其對應的 regular expressions
- 刪除節點的順序可能會影響最終的 regular expressions，但是他們描述的 language 是相同的

Nonregular Languages

我們要如何去證明一個 Languages 是 Nonregular Languages 呢?
ex: $B = \{ 0^n1^n|~n ≥ 0 \} .$

Pumping Lemma

Theorem

If $A$ is a regular language, then there is a number p (the pumping length) where, if $s$ is any string in $A$ of length at least $p$ , then $s$ may be divided into three pieces, $s = xyz$ , satisfying the following conditions:

for each $i ≥ 0$ , $xy^iz ∈ A$ ;
$|y| > 0$ ; y 不為空字串
$|xy| ≤ p$ . 在 p 範圍內一定會有重複

證明: 將 $A$ 以 DFA 表示，並且此 DFA 的 states 數量為 p，如果一個 string $s$ 的長度大於 p，則其至少需要經過 p + 1 個 states，根據鴿籠原理必定有 state 是重複經過的，則重複的 state 之間那段就是 y，可以對 y 進行 Pumping 且保持為 regular language

注意:

regular language 一定滿足 Pumping Lemma
Pumping Lemma 是 regular language 的必要條件，不是充分條件
- 如果一個 language 滿足 Pumping Lemma，其不一定為 regular language

有了 Pumping Lemma，我們可以利用反證法去證明一個 Languages 是 nonregular languages:

Assume the language, say, $B$ , is regular in order to obtain a contradiction.
By the pumping lemma, a pumping length p exists, and any string $w ∈ B$ can be pumped if $|w| ≥ p$ .
Find a string $s ∈ B$ , $|s| ≥ p$ , that $s$ cannot be pumped as described in the pumping lemma. 尋找反例
The contradiction is obtained, and therefore, $B$ is proved to be nonregular.

Closure Properties

Let $A$ and $B$ be regular languages. The results of the
following operations are all regular languages:

Union: $A ∪ B$ ;
Concatenation: $AB$ ;
Star: $A^∗$ ;
Intersection: $A ∩ B$ ;
Complement: $\overline A$ $\overline{A}$ (i.e., $Σ^∗ − A$ $Σ^{*} - A$ );
- 證明: 將 DFA 的 accept states 改為 $F' = Q - F$
Difference: $A − B$ ( $A − B = A ∩ \overline B$ );
Reversal: $A^R$ $A^{R}$ ;
- 證明: 將原本的 DFA 修改為 NFA
  - 加上一個 start state，用 $ε~arrow$ 與原有的 $F$ 每個點連接
  - 加上一個 accept state，用 $ε~arrow$ 與原有的 $q_0$ 連接
  - 將原本 transition function 的指向對調 (箭頭方向顛倒)
Homomorphism: $h(A)$ $h (A)$ ;
- A string substitution such that each symbol is replaced by a single string. That is, $h : Σ → Π^∗$ .
- 證明: RE 中的 symbol 經過替換之後會形成新的 RE 用於描述新產生的 languages
Inverse homomorphism: $h^{−1}(A)$ $h^{- 1} (A)$ .
- 所有會映射到 $A$ 的 string 的集合
- $h^{−1}(A ⊆ Π^∗) = \{ w~|~w ∈ Σ^∗,~h(w) ∈ A \}$ .
- 證明: 根據在 $Π^∗$ 上面的 DFA，以及 symbol 對應 string 的關係，可以建立一個在 $Σ^∗$ 上面的 DFA

Myhill-Nerode Theorem

首先定義 indistinguishable 這個關係，主要可以用來將 string 分組:
Relation defined by a given language: Let $x$ and $y$ be strings and $L$ be any language. We say that $x$ and $y$ are distinguishable by $L$ if some string $z$ exists whereby exactly one of the strings $xz$ and $yz$ is a member of $L$ ;

otherwise, for every string $z$ , we have $xz ∈ L$ whenever $yz ∈ L$ and we say that $x$ and $y$ are indistinguishable by $L$ . If $x$ and $y$ are indistinguishable by $L$ we write $x ≡_L y$ . In other words, for $x, y ∈ Σ^∗$ , $x ≡_L y$ iff for each $z ∈ Σ^∗$ : $xz ∈ L ⇔ yz ∈ L$ .

$≡_L$ is an equivlance relation because obviously

Reflexive: $x ≡_L x$ ;
Symmetric: $x ≡_L y ⇒ y ≡_L x$ ;
Transitive: $x ≡_L y ∧ y ≡_L z ⇒ x ≡_L z$ .

Theorem

Let $L$ be a language and let $X$ be a set of strings. Say that $X$ is pairwise distinguishable by $L$ if every two distinct strings in $X$ are distinguishable by $L$ . Define the index of L to be the maximum number of elements in any set that is pairwise distinguishable by $L$ . The index of L may be finite or infinite.

If $L$ is recognized by a DFA with $k$ states, $L$ has index at most $k$ .
If the index of $L$ is a finite number $k$ , it is recognized by a DFA with $k$ states.
$L$ is regular iff it has finite index. Moreover, its index is the size of the smallest DFA recognizing it.

注意:

Myhill-Nerode Theorem 是 regular language 的充分必要條件
可以找出某些 Pumping Lemma 無法辨別出來的 nonregular language

有了 Myhill-Nerode Theorem，我們可以證明一個 Languages 是 regular language 或是 nonregular languages:

Given a langauge $B$ , construct the equivalence relation $≡_B$ by using $B$ .
Prove that $B$ has finite or infinite index:

Finite: $B$ is a regular language.
Infinite: $B$ is a nonregular language.

Minimization of DFAs

根據 Myhill-Nerode Theorem 我們可以知道 index of L 是能夠 accept $L$ 的最小 DFA 的 states 數量，如果有一個 DFA 的 states 數量比 index of L 多，但是也能夠 accept $L$ ，這時這個 DFA 內部有一些 states 實際上是等價的，可以將其進行合併操作，使其最小化

Equivalent states: State $p$ and state $q$ are said to be
equivalent if for all $w ∈ Σ^∗$ , $\hat δ(p, w) ∈ F ⇔ \hat δ(q, w) ∈ F$ ; and distinguishable otherwise.

Table-filling Algorithm

Given DFA $A = (Q, Σ, δ, q_0, F )$ , the following algorithm finds all distinguishable pairs in $A$ .

Basis: If $p ∈ F$ and $q ∉ F$ , $\{ p, q \}$ is distinguishable.
Induction step: If $\{ r, s \}$ is distinguishable, and $r = δ(p, a)$ , $s = δ(q, a)$ , for some $a ∈ Σ$ , then $\{ p, q \}$ is distinguishable.

實際操作: 以一個 table (類似 queue 的功能) 來紀錄要檢查的 pair

先將 accept states 與其餘 states 組成的 pairs 加上標示 (數字)
從 table 中取出 pair，往前推前一步的 distinguishable pairs，並將其標上數字
重複直到 table 上有數字的 pair 都檢查過
table 上沒有被標記過的 pair 即為 Equivalent states pairs

得到 Equivalent state pairs: $\{ A, E \}$ , $\{ B, H \}$ , and $\{ D, F \}$ .

Equivalent states in a given DFA can be found by the table-filling algorithm, and therefore, the DFA can be minimized by combining each set of equivalent states into one single state.

注意:

如果要進行 Minimization 則找到的每個 Equivalent states pairs 都要進行合併，不能只合併某一些 pairs