正規語言相關文章參考資料為陳穎平教授上課講義

L7 Time Complexity

Measuring Complexity

Definition

Let MM be a deterministic Turing machine that halts on all inputs. The running time or time complexity of MM is the function f:NNf : N → N , where f(n)f (n) is the maximum number of steps that MM uses on any input of length nn. If f(n)f (n) is the running time of MM , we say that MM runs in time f(n)f (n) and that MM is an f(n)f (n) time Turing machine. Customarily we use nn to represent the length of the input.

  • In worst-case analysis, we consider the longest running time of all inputs of a particular length.
  • In average-case analysis, we consider the average of all the running times of inputs of a particular length.

Big-O and Small-O Notation

In asymptotic analysis (漸進分析), we seek to understand the running time of the algorithm when it is run on large inputs: Considering the highest order term.

Definition

Let ff and gg be functions f,g:NR+f, g : N → R^+. Say that f(n)=O(g(n))f (n) = O(g(n)) if positive integers cc and n0n_0 exist such that for every integer nn0n ≥ n_0, f(n)cg(n)f (n) ≤ cg(n) .

When f(n)=O(g(n))f (n) = O(g(n)) we say that g(n)g(n) is an upper bound for f(n)f (n), or more precisely, that g(n)g(n) is an asymptotic upper bound for f(n)f (n), to emphasize that we are suppressing constant factors.

loglog 有關的函數的時候,通常可以直接忽略底數,因為 logbn=log2nlog2b\log_bn = \frac{\log_2n}{\log_2b},可以透過除以常數來轉換底數

  • Polynomial bounds: ncn^c for some c>0c > 0.
  • Exponential bounds: 2(nδ)2^{(n^δ)} for some δ>0δ > 0.
Definition

Let ff and gg be functions f,g:NR+f, g : N → R^+. Say that f(n)=o(g(n))f (n) = o(g(n)) if limnf(n)g(n)=0\begin{aligned}\lim_{n\to \infty} \tfrac{f(n)}{g(n)}=0\end{aligned}

In other words, f(n)=o(g(n))f (n) = o(g(n)) means that, for any real number c>0c > 0, a number n0n_0 exists, where f(n)<cg(n)f (n) < cg(n) for all nn0n ≥ n_0. 沒有等號

Analyzing Algorithms

Definition

Let t:NR+t : N → R^+ be a function. Define the time complexity class, TIME(t(n)), to be the collection of all languages that are decidable by an O(t(n))O(t(n)) time Turing machine.

A={0k1k  k0}A = \{0^k1^k~|~k ≥ 0\}

  1. A single-tape TM M1M_1 for AA:
    • O(n2)O(n^2)
    • ATIME(n2)A ∈ TIME(n^2)
  1. A single-tape TM M2M_2 for AA:
    • O(nlogn)O(n\log n)
    • ATIME(nlogn)A ∈ TIME(n\log n)
  1. A two-tape TM M3M_3 for AA:
    • O(n)O(n): linear time
    • ATIME(n)A ∈ TIME(n)

Complexity Relationships among Models

The time complexity relationships for the three models:

  • the single-tape Turing machine;
  • the multitape Turing machine;
  • the nondeterministic Turing machine.
Theorem

Let t(n)t(n) be a function , where t(n)nt(n) ≥ n. Then every t(n)t(n) time multitape Turing machine has an equivalent O(t2(n))O(t^2(n)) time single-tape Turing machine.

  • 證明: 可以用 single-tape Turing machine 來模擬 multitape Turing machine,每次 multitape Turing machine 的操作在模擬的 TM 上需要 t(n)t(n) 的時間,故相乘之後需要 O(t2(n))O(t^2(n)) 的時間
  • 僅考慮 t(n)nt(n) ≥ n 的情況,因為如果不是 t(n)nt(n) ≥ n 的話,代表輸出結果不需要遍歷整個 input string,這個情況過於簡單,不需考慮
Definition

Let NN be a nondeterministic Turing machine that is a decider. The running time of NN is the function f:NNf : N → N , where f(n)f (n) is the maximum number of steps that NN uses on any branch of its computation on any input of length nn.

Theorem

Let t(n)t(n) be a function , where t(n)nt(n) ≥n. Then every t(n)t(n) time nondeterministic single-tape Turing machine has an equivalent 2O(t(n))2^{O(t(n))} time deterministic single-tape Turing machine.

  • 證明: 可以用 deterministic single-tape Turing machine 來模擬 nondeterministic single-tape Turing machine,把它當作一個 queue 來 BFS 遍歷所有可能性,假設每個 state 最多有 bb 個可能性,則會有 O(bt(n))O(b^{t(n)}) 種可能需要遍歷 (等比級數),而每次操作需要 t(n)t(n) 的時間,故總共要 O(t(n)bt(n))=2O(t(n))O(t(n)b^{t(n)}) = 2^{O(t(n))} 的時間

The Class P

  • Single-tape TM vs. Multitape TM: At most a polynomial difference.
  • Deterministic TM vs. Nondeterministic TM: At most an exponential difference.

Polynomial Time

Polynomial differences in running time are considered to be small, whereas exponential differences are considered to be large.

All reasonable deterministic computational models are polynomially equivalent. We focus on aspects of time complexity theory that are unaffected by polynomial differences in running time.

Definition

PP is the class of languages that are decidable in polynomial time on a deterministic single-tape Turing machine. In other words, P=kTIME(nk)P = \bigcup_{k} TIME(n^k)

  • Invariant for all polynomially equivalent models. 使用的 moodel 不會影響是否屬於 P
  • Corresponding to realistically solvable problems. 理論上 solvable

Examples of Problems in P

Describe algorithms with numbered stages for analysis:

  • A polynomial upper bound on the number of stages.
  • The individual stages can be run in polynomial time.

Have to use reasonable encoding methods, such as base kk notation for any k2k ≥ 2 for integers. (至少使用 2 進位來表示數字)

定義 PATH={G,s,t  G is a directed graph that has a directed path from s to t}PATH = \{ ⟨G, s, t⟩~|~G~is~a~directed~graph~that~has~a~directed~path~from~s~to~t \}

Theorem

PATHPPATH ∈ P

  • 證明: Use a graph-searching method.
    • number of stages: 1(1) + m(3) + 1(4)
    • time of individual stages: polynomial time

定義 RELPRIME={x,y  x and y are relatively prime}RELPRIME = \{ ⟨x, y⟩~|~x~and~y~are~relatively~prime \}

Theorem

RELPRIMEPRELPRIME ∈ P

  • 證明: Use the Euclidean algorithm. (輾轉相除法)
    • EE 中的每次操作至少會使 xx 變為原本的一半,故 stage 2,3 總時間會小於 2log2x and 2log2y2\log_2x~and~2\log_2y,因為計算時間複雜度時比較的對象是 tape input 長度 (log2x+log2y)(\log_2x+\log_2y) (假設為 2 進位),故整體為 polynomial (liner) time

Theorem

Every context-free language is a member of PP.

  • 證明: Use the CYK algorithm.
    • CYK 需要填入 O(n2)O(n^2) 個格子,每次填入需要檢查 O(n)O(n) 種組合,共 O(n3)O(n^3) 的時間,為 polynomial time

The Class NP

定義 HAMPATH={G,s,t  G is a directed graph with a Hamiltonian path from s to t}HAMPATH = \{ ⟨G, s, t⟩~|~G~is~a~directed~graph~with~a~Hamiltonian~path~from~s~to~t \}

  • Hamiltonian path: 經過所有 node

定義 COMPOSITES={x  x=pq,for integers,p,q>1}COMPOSITES = \{ ⟨x⟩~|~x = pq, for~integers, p, q > 1 \}

以上兩者都有 polynomial verifiability 的特性,可以在 polynomial time 驗證給定 certificate 是否正確。

Definition

A verifier for language AA is an algorithm VV , where A={w  V accepts w,c for some string c}A = \{ w~|~V~accepts~⟨w, c⟩~for~some~string~c \} .
We measure the time of a verifier only in terms of the length of ww, so a polynomial time verifier runs on polynomial time in the length of ww. A language AA is polynomially verifiable if it has a polynomial time verifier. cc is called a certificate or proof .

Definition

NP is the class of languages that have polynomial time verifiers. The term NP comes from nondeterministic polynomial time.

Theorem

A language is in NP iff it is decided by some nondeterministic polynomial time Turing machine.

  • 證明: Convert a polynomial time verifier to an equivalent polynomial time NTM and vice versa.
Definition

NTIME(t(n))={L  L is a language decided by a O(t(n)) time nondeterministic Turing machine.}NTIME(t(n)) = \{L~|~L~is~a~language~decided~by~a~O(t(n))~time~nondeterministic~Turing~machine.\} .

Corollary

NP=kNTIME(nk)NP = \bigcup_{k} NTIME(n^k)

Examples of Problems in NP

定義 CLIQUE={G,k  G is an undirected graph with a kclique}CLIQUE = \{ ⟨G, k⟩~|~G~is~an~undirected~graph~with~a~k-clique \}

  • k-clique: 大小為 k 的團 (團內每個點彼此互相連接)
Theorem

CLIQUENPCLIQUE ∈ NP

  • 證明: The clique is the certificate.

定義 SUBSETSUM={S,t  S={x1,...,xk} and for some {y1,...,yl}{x1,...,xk},we have Σyi=t}SUBSET-SUM = \{ ⟨S, t⟩~|~S = \{ x_1, ... , x_k \}~and~for~some~\{ y_1, ... , y_l \} ⊆ \{ x_1, ... , x_k \} , we~have~Σ_{y_i} = t \}

  • Note that {x1,...,xk}\{ x_1, ... , x_k \} and {y1,...,yl}\{ y_1, ... , y_l \} are considered
    multisets. (同一元素可以出現多次)
Theorem

SUBSETSUMNPSUBSET-SUM ∈ NP

  • 證明: The subset is the certificate.

coNP contains the languages that are complements of languages in NP, such as CLIQUE\overline{CLIQUE} and SUBSETSUM\overline{SUBSET-SUM}. We don’t know whether coNP is different from NP.

The P versus NP Question

  • PP = the class of languages for which membership can be decided quickly.
  • NPNP = the class of languages for which membership can be verified quickly.

目前不知道是以下哪一種情況:

For now, we can prove that NPEXPTIME=kTIME(2nk)NP ⊆ EXPTIME = \bigcup_{k} TIME(2^{n^k}), but we don’t know whether NP is contained in a smaller deterministic time complexity class.

NP-completeness

NP-complete problem: satisfiability problem.
SAT={ϕ  ϕ is a satisfiable Boolean formula}SAT = \{⟨ϕ⟩~|~ϕ~is~a~satisfiable~Boolean~formula\}

Theorem

SATP iff P=NPSAT ∈ P~iff~P = NP.

Polynomial Time Reducibility

Definition

A function f:ΣΣf : Σ^∗ → Σ^∗ is a polynomial time computable function if some polynomial time Turing machine MM exists that halts with just f(w)f (w) on its tape, when started on any input ww.

  • computable function 加上 polynomial time 的限制
Definition

Language AA is polynomial time mapping reducible, or simply polynomial time reducible, to language BB, written APBA ≤_P B, if a polynomial time computable function f:ΣΣf : Σ^∗ → Σ^∗ exists, where for every ww, $w ∈ A ⇔ f (w) ∈ B $.
The function ff is called the polynomial time reduction of AA to BB.

  • mapping reducible 加上 polynomial time 的限制
Theorem

If APBA ≤_P B and BPB ∈ P, then APA ∈ P.

  • 證明: 假設 BBpolynomial time algorithm MM 存在,則可以利用 MM 以及 polynomial time computable function ff 建立 AApolynomial time algorithm NN

定義:

  • Literal: x or xx~or~\overline{x}
  • Clause: (x1  x2x3x4)(x_1~∨~\overline{x_2}∨\overline{x_3}∨x_4)
  • Conjunctive normal form, cnf-formula: (x1x2x3x4)(x3x5x6)(x3x6)(x_1∨\overline{x_2}∨\overline{x_3}∨x_4) ∧ (x_3∨\overline{x_5}∨x_6) ∧ (x_3∨\overline{x_6})
  • 3cnf-formula: (x1x2x3)(x3x5x6)(x3x6x4)(x4x5x6)(x_1∨\overline{x_2}∨\overline{x_3})∧(x_3∨\overline{x_5}∨x_6)∧(x_3∨\overline{x_6}∨x_4)∧(x_4∨x_5∨x_6)

3SAT={ϕ  ϕ is a satisfiable 3cnfformula}3SAT = \{⟨ϕ⟩~|~ϕ~is~a~satisfiable~3cnf-formula\}

Theorem

3SAT3SAT is polynomial time reducible to CLIQUECLIQUE.

  • 證明:
    • 將每個 Clause 內的 Literal 當成一個 node
    • 將所有 node 互相連接,除了:
      • 同個 Clause 內的 Literal
      • 互為相反的 Literal (x,xx,\overline{x})
    • 如果存在大小為 k (Clause 個數) 的團,則代表 satisfiable

Definition of NP-completeness

Definition

A language B is NP-complete if it satisfies two conditions:

  • B is in NP;
  • every A in NP is polynomial time reducible to B.

NP-hard: Every AA in NP is polynomial time reducible to BB.

Theorem

If BB is NP-complete and BPB ∈ P, then P=NPP = NP.

  • 證明: This theorem follows directly from the definition of polynomial time reducibility.
Theorem

If BB is NP-complete and BPCB ≤_P C for CC in NP, then CC is NP-complete.

  • 證明: Polynomial time reducible to B, then to C.

The Cook-Levin Theorem

Theorem

SATSAT is NP-complete.

  • 證明:
    • 先證明 SATSAT is in NP
    • 接著證明所有 A in NP is polynomial time reducible to SATSAT
      • Boolean formula 表示 AANTMcomputation history 是否合法
        • 檢查 symbol (每個格子只能有一個 symbol 為 true)
        • 檢查起始狀態
        • 檢查狀態轉移
        • 檢查結束狀態
      • 如果 computation history 合法,則該 Boolean formulasatisfiable
      • 得到 polynomial time reduction of AA to SATSAT
Theorem

3SAT3SAT is NP-complete.

  • 證明:
    • 可以將之前的 Boolean formula 轉為 3cnf-formula

Additional NP-complete Problems

Corollary

CLIQUECLIQUE is NP-complete.

  • 證明: 已知
    • CLIQUENPCLIQUE ∈ NP
    • 3SAT3SAT is polynomial time reducible to CLIQUECLIQUE.

定義 VERTEXCOVER={G,k  G is an undirected graph that has a knode vertex cover}VERTEX-COVER = \{⟨G, k⟩~|~G~is~an~undirected~graph~that~has~a~k-node~vertex~cover\}

  • k-node vertex cover 代表只要選擇 kk 個 node,將其所有相鄰邊塗黑,可以塗黑所有邊
Theorem

VERTEXCOVERVERTEX-COVER is NP-complete.

  • 證明: Reduce 3SAT3SAT to VERTEXCOVERVERTEX-COVER.
    • 將所有正反 variable 當成 node 置於上方
    • 將每個 Clause 內的 Literal 當成一個 node 置於下方,並將 Clause 內的 node 互相連接
    • 將上方的 Literal 與下方的相同 Literal 連接
    • 如果這個圖為 k-node vertex cover 則代表 satisfiable
    • k=m+2lk = m + 2l
      • mmVariable 個數: cover 上方的邊 & 為 true 的 Literal
        • 正反 variable 至少要有一個塗黑
      • llClause 個數: cover 為 false 的 Literal
        • 至少兩個 node 塗黑才能 cover 整個 Clause

The Hamiltonian Path Problem

Theorem

HAMPATHHAMPATH is NP-complete.

  • 證明: Reduce 3SAT3SAT to HAMPATHHAMPATH.
    • 將每個 Variable 化為一個菱形圖
      • 左邊為 true,右邊為 fasle
      • 中間 node 總數為 3k+13k + 1
        • kk 為包含該 Variable 的 Clause 數量
    • 將每個 Clause 化作一個 node
      • 連接 Clause 與剛剛的菱形圖
        • 正反 Variable 的方向要相反
    • 這個圖存在 Hamiltonian path 從 s 到 t,則代表 satisfiable

Theorem

UNHAMPATHUNHAMPATH is NP-complete.

  • 證明: Reduce HAMPATHHAMPATH to UNHAMPATHUNHAMPATH.
    • 把原本 HAMPATH 中的每個 node 分解 (除了 s,ts, t)
      • 把 node uu 變為 uin,umid,uoutu^{in}, u^{mid}, u^{out}
      • 三個 node 相連 (uinumiduoutu^{in}-u^{mid}-u^{out})
    • 原有的路徑 s,u1,u2,...,uk,ts, u_1, u_2, ... , u_k, t 變為 sout,u1in,u1mid,u1out,u2in,u2mid,u2out,...,tins^{out}, u_1^{in}, u_1^{mid}, u_1^{out}, u_2^{in}, u_2^{mid}, u_2^{out}, ... , t^{in}

The Subset Sum Problem

Theorem

SUBSETSUMSUBSET-SUM is NP-complete.

  • 證明: Reduce 3SAT3SAT to SUBSETSUMSUBSET-SUM.
    • 化成一個表格
      • 大小為 ((2l+2k)(l+k)(2l + 2k) * (l + k))
        • ll 為 Variable 數量
        • kk 為 Clause 數量
      • 左下角留空
      • 左上和右下填法相同
      • 右上角根據 Clause 與 Variable 對應填入
    • 表格的上方會決定 Variable 選擇正反
      • 左邊總和為 1: 每個 Variable 不是正就是反
      • 右半總和為 1~3: Clause 內至少一個為 true
    • 表格的下方會把右半部分全部補成 3