# 基本概念

字符串匹配 (string-matching) 问题又称字符串搜索 (string-searching) 问题或模式匹配 (pattern-matching) 问题。可以描述为在一个字符串 $S$ 中寻找字符串 $T$ 的问题。在字符串匹配问题中，我们称 $S$ 为主串 (main string), 长度为 $n$ , 而称子串 $T$ 为模式串 (pattern string), 长度为 $m$ . 此外，我们记字符串的字母表 (alphabet) 为 $\varSigma$ , 大小为 $k$ .

# 复杂度

基本的字符串算法及复杂度如下表所示：

算法	预处理时间	匹配时间	空间
Brute-force (暴力算法)	none	$O(mn)$	none
Rabin-Karp	$\Theta(m)$	平均 $\Theta(n)$ , 最差 $O(mn)$	$O(1)$
Knuth-Morris-Pratt (KMP)	$\Theta(m)$	$\Theta(n)$	$\Theta(m)$

# Rabin-Karp 算法

算法伪代码 (Python) 如下：

	def RabinKarp(s, t):
	n = len(s)
	m = len(t)
	h_t = hash(t)
	for i in range(n-m+1):
	h_s = hash(s[i:i+m])
	if h_s == h_t:
	if s[i:i+m] == t
	return i
	return -1

可以看出，算法的复杂度主要取决于其哈希算法的复杂度。若一次 hash(s[i,i+m]) 的复杂度是 $O(m)$ 的，那么最终的复杂度是 $O(mn)$ 的。若 hash(s[i+1, i+m+1]) 可以由 hash(s[i,i+m]) 在常数时间内推导得到，且可以有效避免哈希碰撞，那么算法的复杂度为 $O(n)$ 的。

# KMP 算法

KMP 算法的核心是 next 数组的计算。 next[i] 表示模式串前 $i-1$ ( $i \geq 1$ ) 位中最大公共前后缀的长度加一。例如，对 ABABC , 有 next[4] = 2 , next[5] = 3 , next[2] = next[3] = next[6] = 1 . 此外，习惯设 next[1] = 0 .

	def get_next():
	i = 1
	j = 0
	next[1] = 0

用 C++：

	void get_next(string t, int next[]) {
	int i=1, j=0;
	next[1] = 0;
	while (i<t.length()) {
	if (j==0 \|\| t[i] == t[j]) {
	next[++i] = ++j;
	}
	else j = next[j];
	}
	}

也可以

	void get_next(string t, int next[]) {
	int j=0;
	next[1] = 0;
	for (int i=2;i<=t.length();i++) {
	while (j && t[i]!=t[j+1]) {
	j = next[j];
	}
	if (b[j+1]==b[i]) next[i] = ++j;
	}
	}

kmp 算法
字符串学习笔记・浅析 KMP—— 单模式串匹配算法

# Reference

Wikipedia: String-searching algorithm

# 基本概念

# 复杂度

# Rabin-Karp 算法

# KMP 算法

# Reference

vector

STL set & multiset