正则表达式基础

面试官问："Java 中怎么使用正则表达式？"

候选人小邬答："用 Pattern 和 Matcher 类。"

面试官追问："Pattern.compile() 有什么用？"

小邬说："编译正则表达式？"

面试官追问："为什么要预编译？每次匹配都编译不行吗？"

小邬答不上来。

【面试官心理】这道题考查的是候选人对正则表达式性能优化的理解。能说出 Pattern 预编译价值和 Matcher 组匹配机制的候选人，说明有性能意识。

一、核心 API 🔴

1.1 Pattern 与 Matcher

// ❌ 错误：每次匹配都编译
boolean match = input.matches("\\d+"); // 每次调用都编译正则！

// ✅ 正确：预编译
Pattern pattern = Pattern.compile("\\d+"); // 预编译，只编译一次
Matcher matcher = pattern.matcher(input);
boolean match = matcher.matches();

// 常用方法
matcher.matches();    // 整个字符串是否匹配
matcher.lookingAt(); // 从开头开始匹配
matcher.find();      // 查找下一个匹配
matcher.group();     // 获取匹配内容

1.2 Pattern 的标志

Pattern p = Pattern.compile("abc", Pattern.CASE_INSENSITIVE); // 不区分大小写
Pattern p2 = Pattern.compile("abc", Pattern.MULTILINE);        // 多行模式
Pattern p3 = Pattern.compile("abc", Pattern.DOTALL);           // . 匹配换行
Pattern p4 = Pattern.compile("(?i)abc(?-i)");                   // 内联标志

二、常用语法 🔴

语法	含义	示例
`.`	任意字符	`a.c` 匹配 `abc`
`\d`	数字	`\d{3}` 匹配 `123`
`\w`	单词字符	`\w+` 匹配 `hello`
`\s`	空白字符	`\s+`
`*`	0个或多个	`a*`
`+`	1个或多个	`a+`
`?`	0个或1个	`a?`
`{n}`	恰好n个	`a{3}`
`{n,}`	至少n个	`a{3,}`
`{n,m}`	n到m个	`a{3,5}`
`^`	开头	`^abc`
`$`	结尾	`abc$`
`[]`	字符类	`[aeiou]`
`[^]`	否定字符类	`[^0-9]`
`()`	组捕获	`(\\d+)-(\\d+)`
`\|`	或	`a\|b`

三、组捕获 🔴

Pattern p = Pattern.compile("(\\d{4})-(\\d{2})-(\\d{2})");
Matcher m = p.matcher("2024-06-15");

if (m.matches()) {
    String full = m.group(0);     // "2024-06-15"（整个匹配）
    String year = m.group(1);    // "2024"
    String month = m.group(2);   // "06"
    String day = m.group(3);     // "15"

    int yearInt = Integer.parseInt(m.group(1)); // 转整数
}

四、常用验证场景 🟡

// 手机号
Pattern phone = Pattern.compile("1[3-9]\\d{9}");

// 邮箱
Pattern email = Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}");

// URL
Pattern url = Pattern.compile("https?://[\\w.-]+(?:/[^\\s]*)?");

// 身份证（18位）
Pattern idCard = Pattern.compile("[1-9]\\d{5}(19|20)\\d{2}(0[1-9]|1[0-2])(0[1-9]|[12]\\d|3[01])\\d{3}[\\dXx]");

// 密码强度（至少8位，包含大小写字母和数字）
Pattern password = Pattern.compile("^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d).{8,}$");

五、性能优化 🟡

5.1 预编译

// ✅ 预编译 + 复用 Matcher
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("123");
while (m.find()) {
    System.out.println(m.group());
}

// ❌ 避免循环中重复编译
for (String s : list) {
    Pattern.compile("\\d+").matcher(s).find(); // ❌ 每次都编译
}

5.2 非贪婪匹配

// ❌ 贪婪匹配：.* 会尽可能多地匹配
"abc123def".replaceFirst("\\d+.*\\d+", "X"); // "abcXef"

// ✅ 非贪婪：.*? 尽可能少地匹配
"abc123def".replaceFirst("\\d+?\\d+", "X"); // "abcXdef"

六、追问升级

面试官："正则表达式中的贪心、非贪心、独占有什么区别？"

// 贪心（Greedy）：默认，尽可能多地匹配，必要时回溯
// 非贪心（Reluctant）：加 ?，尽可能少地匹配
// 独占（Possessive）：加 +，不回溯，性能最好

// 示例：
String input = "ababc";

// 贪心：.* 会吃掉所有可能的字符，然后回溯
Pattern.compile(".*ab").matcher(input).find(); // 找到 "ababc" 的 "ab"（最后两个）

// 非贪心：
Pattern.compile(".*?ab").matcher(input).find(); // 找到 "ab"（最前面的）

// 独占：不回溯，如果无法匹配就失败
Pattern.compile(".*+ab").matcher(input).find(); // 失败（.*+ 吃掉了所有内容）

#正则表达式基础

#一、核心 API 🔴

#1.1 Pattern 与 Matcher

#1.2 Pattern 的标志

#二、常用语法 🔴

#三、组捕获 🔴

#四、常用验证场景 🟡

#五、性能优化 🟡

#5.1 预编译

#5.2 非贪婪匹配

#六、追问升级