python使用re模块操作正则表达式

编程我只用CPP

2017 年 10 月 14 日

44 次浏览

暂无评论

3005字数

编程语言

一、概述

re 模块是python官方提供的正则表达式模块，一些常用的方法如下：

re.match(pattern, string, pos, endpos)

在string 中匹配pattern 规则，返回一个匹配对象。

re.search(pattern, string, pos, endpos)

在string 中查找第一个满足规则pattern 的字符串，返回一个匹配对象。

re.findall(pattern, string, pos, endpos)

查找所有满足规则pattern 的字符串，结果将返回一个元组。

re.finditer(pattern, string, pos, endpos)

查找所有满足条件的字符串，并以匹配对象的形式返回一个元组。

re.sub(pattern, repl, string, count)

把string 中符合规则的字符都替换成repl ，count 表示替换的数量，默认匹配所有，返回被替换后的字符串。

re.subn(pattern, repl, string, count)

和sub 函数功能一直，只是subn 在返回的同时会带上被替换的字符串数量。

以上所有函数中的pos 和endpos 均表示在[pos, endpos) 下标范围内匹配，下标索引从0 开始，默认省略表示匹配整个字符串。

二、匹配对象

re.match()和re.search() 方法都返回一个匹配对象<type "_sre.SRE_Match">，常用的方法为：

2.1 group()

返回匹配成功的字符串。

2.2 start()和end()

匹配成功后返回匹配到的字符串的开始下标和结束下标。

2.3 span()

以元组的方式返回开始下标和结束下标。

2.4 groups()

返回所有匹配到的分组。

三、示例

3.1 match方法和匹配对象

p = re.compile(r"maqian")
t = "Hellomaqian"
rs = p.match(t)
if rs is not None:
    print type(rs)
    print rs.group()
else:
    print "no match"  # no match
rs = p.match(t, 5)  # 从索引为5的下标开始匹配
if rs is not None:
    print rs.group()  # maqian
    print rs.groups()  # () 没有任何分组
    print rs.start(), rs.end()  # 5, 11
    print rs.span()  # (5, 11) 返回元组
else:
    print "no match"

3.2 search方法

p = re.compile(r"maqian")
t = "hellomaqian"
rs = p.search(t)
if rs is not None:
    print rs.group()  # maqian
else:
    print "no match"

3.3 find_all方法

p = re.compile(r"d{3}")
t = "123abc456def789"
rs = p.findall(t)
if rs is not None:
    print rs  # ["123", "456", "789"]
else:
    print "no match"

3.4 finditer方法

p = re.compile("d{3}")
t = "123abc456def789"
rs = p.finditer(t)
if rs is not None:  # 返回匹配对象元组
    for i in rs:
        print i.group()  # 分别打印 123 345 789
else:
    print "no match"

3.5 分组

x = r"(?P<id>d{3})(w*).*(?P=id)"
p = re.compile(x)
t = "123abc456123"
rs = p.match(t)
if rs is not None:
    print rs.group()  # 完整匹配到的字符串123abc456123
    print rs.groups()  # 匹配到的分组("123", "abc456")
else:
    print "no matched"

3.6 替换

p = re.compile("d{3}")
t = "123abc456def789ghi888"
rs = p.sub("000", t)  # 把所有满足条件的字符串替换成000
if rs is not None:
    print rs  # 返回被替换后的字符串 000abc000def000ghi000
else:
    print "no match"
rs = p.subn("000", t, 2)
if rs is not None:
    print rs  # 返回被替换后的字符串以及替换的个数 ("000abc000def789ghi888", 2)
else:
    print "no match"

其中，要替换的字符也可以是一个函数，将会把匹配到的字符串以匹配对象类型 为形参调用指定函数：

p = re.compile("d{3}")
t = "123abc456def789ghi888"
rs = p.subn(rep_func, t)  # 替换的内容是一个函数
if rs is not None:
    print rs
else:
    print "no matched"
def rep_func(match_obj):  # 替换函数
    match_str = match_obj.group()  # 传入的是匹配对象，通过group方法获取匹配到的内容
    if match_str == "123":
        return "000"
    elif match_str == "456":
        return "111"
    elif match_str == "789":
        return "222"
    else:
        return "999"

最后的结果：

("000abc111def222ghi999", 4)

python使用re模块操作正则表达式

python使用re模块操作正则表达式

一、概述

re.match(pattern, string, pos, endpos)

re.search(pattern, string, pos, endpos)

re.findall(pattern, string, pos, endpos)

re.finditer(pattern, string, pos, endpos)

re.sub(pattern, repl, string, count)

re.subn(pattern, repl, string, count)

二、匹配对象

2.1 group()

2.2 start()和end()

2.3 span()

2.4 groups()

三、示例

3.1 match方法和匹配对象

3.2 search方法

3.3 find_all方法

3.4 finditer方法

3.5 分组

3.6 替换

nginx安装modsecurity实现waf功能

一次孤儿socket过多导致系统异常的问题排查过程

踩坑记录：CDN开启强制https之后返回重定向次数过多的问题

shell中数组的使用方法

tcpdump的基本用法

C++类中const类型成员变量的初始化

使用PERCPU变量时编译错误的问题

581-最短无序连续子数组

修改Linux系统时区为CST的方法

openssl命令的用法

python使用re模块操作正则表达式