正規語言相關文章參考資料為陳穎平教授上課講義

Regular Languages

Finite Automata

有限自動機
or DFA: Deterministic finite automata 決定性有限自動機

Formal Definition of Finite Automata

Definition

A finite automaton is a 5-tuple (Q,Σ,δ,q0,F)(Q, Σ, δ, q_0, F), where

  1. QQ: is a finite set called the states;
  2. ΣΣ: is a finite set called the alphabet;
  3. δδ: Q×ΣQQ × Σ → Q is the transition function;
  4. q0Qq_0 ∈ Q: is the start state;
  5. FQF ⊆ Q: is the set of accept states or final states.

Formal Definition of Computation

Finite automaton M=(Q,Σ,δ,q0,F)M = (Q, Σ, δ, q_0, F) accepts string w=w1w2wnw = w_1w_2 ··· w_n if a sequence of states r0,r1,,rnr_0, r_1, ··· , r_n in QQ exists with three conditions:

  1. r0=q0r_0 = q_0; 滿足起始狀態
  2. δ(ri,wi+1)=ri+1,i=0,...,n1δ(r_i, w_{i+1}) = r_{i+1}, ∀i = 0, ... , n − 1; 滿足狀態轉移
  3. rnFr_n ∈ F . 滿足結束狀態
  • If A is the set of all strings that machine M accepts, we say that A is the language of machine M and write L(M)=AL(M) = A .
    • M accepts string
    • M recognizes/accepts set of strings(language)
  • A machine always recognizes only one language.
Definition

A language is called a regular language if some finite automaton recognizes it.

The Regular Operations

Definition

Let A and B be languages, let’s define the following three regular operations:

  • Union: AB={x  xA or xB}A ∪ B = \{x~|~x ∈ A~or~x ∈ B\};
  • Concatenation: AB={xy  xA and yB}A ◦ B = \{xy~|~x ∈ A~and~y ∈ B\};
  • Star: A={x1x2xk  k0 and each xiA}A^∗ = \{x_1x_2 ··· x_k~|~k ≥ 0~and~each~x_i ∈ A\}.
Definition

The class of regular languages is closed under the union/concatenation/Star operation.

Nondeterminism

NFA: Nondeterministic finite automata 非決定性有限自動機

  • Deterministic computation: Determined next move.
  • Nondeterministic computation: Several choices may exist for the next move. (類似 fork a process)

Formal Definition of NFAs

Definition

A nondeterministic finite automaton is a 5-tuple (Q,Σ,δ,q0,F)(Q, Σ, δ, q_0, F), where

  1. QQ: is a finite set of states;
  2. ΣΣ: is a finite alphabet;
  3. δδ: Q×ΣεP(Q)Q × Σ_ε → P(Q) is the transition function;
  4. q0q_0 ∈ Q: is the start state;
  5. FQF ⊆ Q: is the set of accept states.

Formal Definition of Computation

Nondeterministic finite automaton N=(Q,Σ,δ,q0,F)N = (Q, Σ, δ, q_0, F ) accepts string w=y1y2ymw = y_1y_2 ··· y_m, where yiΣεy_i ∈ Σ_ε, if a sequence of states r0,r1,,rmr_0, r_1, ··· , r_m in QQ exists with three conditions:

  1. r0=q0r_0 = q_0;
  2. ri+1δ(ri,yi+1),i=0,...,m1r_{i+1} ∈ δ(r_i, y_{i+1}), ∀i = 0, ... , m − 1;
  3. rmFr_m ∈ F .
  • 在 NFA 定義的 3. δδ: Q×ΣεP(Q)Q × Σ_ε → P(Q)
    • ΣεΣ_ε 除了原有的 alphabet 之外還允許 εε (empty string)
    • P(Q)P(Q) 指的是 QQPower set,代表有多種可能
  • 在 Computation 定義的 2. ri+1δ(ri,yi+1)r_{i+1} ∈ δ(r_i, y_{i+1})
    • 使用 符號是因為只要滿足狀態轉移中的某個可能就好
    • string w=y1y2ymw = y_1y_2 ··· y_m 指的不一定是輸入的字串本身,mm 也不一定為字串長度,因為可能包含 εε

Equivalence of NFAs and DFAs

Theorem

Every DFA has an equivalent NFA.

  • 容易證明 (DFA 是 NFA 的特例)
  • 只需要修改 transition function
Theorem

Every NFA has an equivalent DFA.

  • 證明: 將一個 NFA: N=(Q,Σ,δ,q0,F)N = (Q, Σ, δ, q_0, F) 改寫為 DFA: M=(Q,Σ,δ,q0,F)M = (Q', Σ, δ', q_0', F')
    • Q=P(Q)Q' = P(Q)
    • δ(R,a)=rRδ(r,a)δ'(R,a) = ⋃_{r∈R}δ(r,a)
      • ex: δ({q1,q2},a)=δ(q1,a)δ(q2,a)δ'(\{q_1, q_2\},a) = δ(q_1,a)∪δ(q_2,a)
    • q0={q0}q_0' = \{q_0\}
    • F={RQ  R contain an accept state of N}F' = \{R∈Q'~|~R~contain~an~accept~state~of~N\}
  • 考慮加上 εε 的情況
    • 定義 E(R)E(R): R stateR~state 經過 0 至多個 εε arrows 之後所能到達的狀態的集合
    • δ(R,a)=rRE(δ(r,a))δ'(R,a) = ⋃_{r∈R}E(δ(r,a))
    • q0=E({q0})q_0' = E(\{q_0\})

實際轉換時要注意:

  • 如果一個 state 對於某個操作沒有定義 (沒有 arrow),則應該指向 ∅ state
  • 完成圖之後可以把只有向外 arrow 且不是 start state 的 states 刪除 (不可能到達)
  • 記得每次狀態轉換時要考慮 εε
Corollary

A language is regular if and only if some nondeterministic finite automaton recognizes it.

Closure under the Regular Operations

根據以上性質,我們可以透過建立 NFA 的方式來證明 Regular Operations 的封閉性

Regular Expressions

RE: 正則表達式 - 用於表達某個特定 language

Formal Definition (Inductive Definition)

Definition

RR is a regular expression if RR is

  1. aa for some aa in the alphabet Σ [L(a)={a}]Σ~[L(a) = \{a\}]; 包含 1 個長度為 1 的字串
    • regular expression aa 所表達的 language 是 {a}\{a\}
  2. ε [L(ε)={ε}]ε~[L(ε) = \{ε\}]; 包含 1 個長度為 0 的字串
  3. ∅ [L()=]∅~[L(∅) = ∅]; 包含 0 個字串 (空集合)
  4. (R1R2)(R_1 ∪ R_2), where R1R_1 and R2R_2 are regular expressions
    [L(R1R2)=L(R1)L(R2)][L(R_1 ∪ R_2) = L(R_1) ∪ L(R_2)]; 類似加法
  5. (R1R2)(R_1 ◦ R_2), where R1R_1 and R2R_2 are regular expressions
    [L(R1R2)=L(R1)L(R2)][L(R_1 ◦ R_2) = L(R_1) ◦ L(R_2)]; 類似乘法
  6. (R1)(R_1^*), where R1R_1 is a regular expression [L(R1)=(L(R1))][L(R_1^*) = (L(R_1))^∗].

Equivalence with Finite Automata

Theorem

A language is regular if and only if some regular expression describes it.

Lemma

If a language is described by a regular expression, then it is regular.

  • 證明: Use an NFA to recognize the language described by a regular expression.
    • 根據 regular expression 的定義:
      1. R=a (aΣ)R=a~(a∈Σ), then [L(a)={a}][L(a) = \{a\}]
      2. R=εR=ε, then [L(ε)={ε}][L(ε) = \{ε\}]
      3. R=R=∅, then [L()={}][L(∅) = \{∅\}]
      4. R=R1R2R=R_1 ∪ R_2 已經證過其封閉性
      5. R=R1R2R=R_1 ◦ R_2 已經證過其封閉性
      6. R=R1R=R_1^* 已經證過其封閉性
Lemma

If a language is regular, then it is described by a regular expression.

  • 證明: Convert DFAs into regular expressions.

首先定義 GNFA 幫助轉換:

Definition

A generalized nondeterministic finite automaton is a 5-tuple (Q,Σ,δ,qstart,qaccept)(Q, Σ, δ, q_{start}, q_{accept}), where

  1. QQ: is a finite set of states;
  2. ΣΣ: is the input alphabet;
  3. δδ: (Q{qaccept})×(Q{qstart})R(Q − \{q_{accept}\}) × (Q − \{q_{start}\}) → R is the transition function;
  4. qstartq_{start}: is the start state;
  5. qacceptq_{accept}: is the accept state. 唯一的接受狀態

A GNFA accepts a string ww in ΣΣ^∗ if w=w1w2wkw = w_1w_2 ··· w_k, where each wiΣw_i ∈ Σ^∗ and a sequenceof states q0,q1,,qkq_0, q_1, ··· , q_k exists such that

  1. q0=qstartq_0 = q_{start};
  2. qk=qacceptq_k = q_{accept};
  3. for each ii, we have wiL(Ri)w_i ∈ L(R_i), where Ri=δ(qi1,qi)R_i = δ(q_{i−1}, q_i).

注意:

  • GNFA 每條邊上的是 regular expressions 而不是 symbol
  • wiΣw_i ∈ Σ^∗ 所代表的是一段能夠被 regular expressions 描述的子字串,而不是單一 symbol
  • Every DFA has an equivalent GNFA.
    • 容易證明 (DFA 是 GNFA 的特例)
    • 加上一個 start state,用 ε arrowε~arrow 與原有的 q0q_0 連接
    • 加上一個 accept state,用 ε arrowε~arrow 與原有的 FF 每個點連接
    • 將原本 transition functionsymbol 改為 regular expressions
  • CONVERT(G) 是用來刪除節點的函式
    • Claim For any GNFA GG, CONVERT(G) is equivalent to GG.
  • DFA 轉為 GNFA,之後利用 CONVERT(G) 不斷刪除節點,直到只剩下加上的 start stateaccept state 時,兩者之間的就是其對應的 regular expressions
    • 刪除節點的順序可能會影響最終的 regular expressions,但是他們描述的 language 是相同的

Nonregular Languages

我們要如何去證明一個 LanguagesNonregular Languages 呢?
ex: B={0n1n n0}.B = \{ 0^n1^n|~n ≥ 0 \} .

Pumping Lemma

Theorem

If AA is a regular language, then there is a number p (the pumping length) where, if ss is any string in AA of length at least pp, then ss may be divided into three pieces, s=xyzs = xyz, satisfying the following conditions:

  1. for each i0i ≥ 0, xyizAxy^iz ∈ A;
  2. y>0|y| > 0; y 不為空字串
  3. xyp|xy| ≤ p. 在 p 範圍內一定會有重複
  • 證明: 將 AADFA 表示,並且此 DFAstates 數量為 p,如果一個 string ss 的長度大於 p,則其至少需要經過 p + 1 個 states,根據鴿籠原理必定有 state 是重複經過的,則重複的 state 之間那段就是 y,可以對 y 進行 Pumping 且保持為 regular language

注意:

  • regular language 一定滿足 Pumping Lemma
  • Pumping Lemmaregular language 的必要條件,不是充分條件
    • 如果一個 language 滿足 Pumping Lemma,其不一定為 regular language

有了 Pumping Lemma,我們可以利用反證法去證明一個 Languagesnonregular languages:

  1. Assume the language, say, BB, is regular in order to obtain a contradiction.
  2. By the pumping lemma, a pumping length p exists, and any string wBw ∈ B can be pumped if wp|w| ≥ p.
  3. Find a string sBs ∈ B, sp|s| ≥ p, that ss cannot be pumped as described in the pumping lemma. 尋找反例
  4. The contradiction is obtained, and therefore, BB is proved to be nonregular.

Closure Properties

Let AA and BB be regular languages. The results of the
following operations are all regular languages:

  • Union: ABA ∪ B;
  • Concatenation: ABAB;
  • Star: AA^∗;
  • Intersection: ABA ∩ B;
  • Complement: A\overline A (i.e., ΣAΣ^∗ − A);
    • 證明: 將 DFAaccept states 改為 F=QFF' = Q - F
  • Difference: ABA − B (AB=ABA − B = A ∩ \overline B);
  • Reversal: ARA^R;
    • 證明: 將原本的 DFA 修改為 NFA
      • 加上一個 start state,用 ε arrowε~arrow 與原有的 FF 每個點連接
      • 加上一個 accept state,用 ε arrowε~arrow 與原有的 q0q_0 連接
      • 將原本 transition function 的指向對調 (箭頭方向顛倒)
  • Homomorphism: h(A)h(A);
    • A string substitution such that each symbol is replaced by a single string. That is, h:ΣΠh : Σ → Π^∗ .
    • 證明: RE 中的 symbol 經過替換之後會形成新的 RE 用於描述新產生的 languages
  • Inverse homomorphism: h1(A)h^{−1}(A).
    • 所有會映射到 AA 的 string 的集合
    • h1(AΠ)={w  wΣ, h(w)A}h^{−1}(A ⊆ Π^∗) = \{ w~|~w ∈ Σ^∗,~h(w) ∈ A \} .
    • 證明: 根據在 ΠΠ^∗ 上面的 DFA,以及 symbol 對應 string 的關係,可以建立一個在 ΣΣ^∗ 上面的 DFA

Myhill-Nerode Theorem

首先定義 indistinguishable 這個關係,主要可以用來將 string 分組:
Relation defined by a given language: Let xx and yy be strings and LL be any language. We say that xx and yy are distinguishable by LL if some string zz exists whereby exactly one of the strings xzxz and yzyz is a member of LL;

otherwise, for every string zz, we have xzLxz ∈ L whenever yzLyz ∈ L and we say that xx and yy are indistinguishable by LL. If xx and yy are indistinguishable by LL we write xLyx ≡_L y. In other words, for x,yΣx, y ∈ Σ^∗, xLyx ≡_L y iff for each zΣz ∈ Σ^∗ : xzLyzLxz ∈ L ⇔ yz ∈ L .

L≡_L is an equivlance relation because obviously

  1. Reflexive: xLxx ≡_L x;
  2. Symmetric: xLyyLxx ≡_L y ⇒ y ≡_L x;
  3. Transitive: xLyyLzxLzx ≡_L y ∧ y ≡_L z ⇒ x ≡_L z.
Theorem

Let LL be a language and let XX be a set of strings. Say that XX is pairwise distinguishable by LL if every two distinct strings in XX are distinguishable by LL. Define the index of L to be the maximum number of elements in any set that is pairwise distinguishable by LL. The index of L may be finite or infinite.

  1. If LL is recognized by a DFA with kk states, LL has index at most kk.
  2. If the index of LL is a finite number kk, it is recognized by a DFA with kk states.
  3. LL is regular iff it has finite index. Moreover, its index is the size of the smallest DFA recognizing it.

注意:

  • Myhill-Nerode Theoremregular language 的充分必要條件
  • 可以找出某些 Pumping Lemma 無法辨別出來的 nonregular language

有了 Myhill-Nerode Theorem,我們可以證明一個 Languagesregular language 或是 nonregular languages:

  1. Given a langauge BB, construct the equivalence relation B≡_B by using BB.
  2. Prove that BB has finite or infinite index:
  • Finite: BB is a regular language.
  • Infinite: BB is a nonregular language.

Minimization of DFAs

根據 Myhill-Nerode Theorem 我們可以知道 index of L 是能夠 accept LL 的最小 DFA 的 states 數量,如果有一個 DFA 的 states 數量比 index of L 多,但是也能夠 accept LL,這時這個 DFA 內部有一些 states 實際上是等價的,可以將其進行合併操作,使其最小化

Equivalent states: State pp and state qq are said to be
equivalent if for all wΣw ∈ Σ^∗, δ^(p,w)Fδ^(q,w)F\hat δ(p, w) ∈ F ⇔ \hat δ(q, w) ∈ F ; and distinguishable otherwise.

Table-filling Algorithm

Given DFA A=(Q,Σ,δ,q0,F)A = (Q, Σ, δ, q_0, F ), the following algorithm finds all distinguishable pairs in AA.

  • Basis: If pFp ∈ F and qFq ∉ F , {p,q}\{ p, q \} is distinguishable.
  • Induction step: If {r,s}\{ r, s \} is distinguishable, and r=δ(p,a)r = δ(p, a), s=δ(q,a)s = δ(q, a), for some aΣa ∈ Σ, then {p,q}\{ p, q \} is distinguishable.

實際操作: 以一個 table (類似 queue 的功能) 來紀錄要檢查的 pair

  • 先將 accept states 與其餘 states 組成的 pairs 加上標示 (數字)
  • 從 table 中取出 pair,往前推前一步的 distinguishable pairs,並將其標上數字
  • 重複直到 table 上有數字的 pair 都檢查過
  • table 上沒有被標記過的 pair 即為 Equivalent states pairs

得到 Equivalent state pairs: {A,E}\{ A, E \}, {B,H}\{ B, H \}, and {D,F}\{ D, F \}.

Equivalent states in a given DFA can be found by the table-filling algorithm, and therefore, the DFA can be minimized by combining each set of equivalent states into one single state.

注意:

  • 如果要進行 Minimization 則找到的每個 Equivalent states pairs 都要進行合併,不能只合併某一些 pairs