Regex in Python (1) - PROj - ITeye博客

`

provista

浏览: 120447 次
性别:
来自: 上海

最近访客更多访客>>

bawomingtian123

kangqiao182

everimbaq

飞往罗布泊

博主相关

博客

微博

相册

收藏

留言

关于我

文章分类

社区版块

存档分类

最新评论

hnraysir：谢谢你的分享!对我有帮助！
JNI的某些数组和字符串类型转换（转）
yejiurui：楼主你这篇文章简直是太好了，多谢啊
JNI的某些数组和字符串类型转换（转）
kndroid： Thanks
理解python的unicode字符串
lseeo：非常不错！
全排列的Python实现
summerbell：太冷清了。你的pagerank心得呢？？？
Hello World

Regex in Python (1)

博客分类：

Python

Python 正则表达式 D语言工作

阅读更多

s = u'ft&#65292;&#25105;'
print re.sub(ur'(?s)&#(\d+);', lambda x:unichr(int(x.group(1))), s)

执行结果:

引用

ft，我

实际上,python的sub函数第二参数,即replacement,可以为一个函数.函数的输入就是成功匹配的match object, 输出,亦即返回值,就是用于替换的replacement.这样可根据具体每次不同的成功匹配对象字串,进行不同的替换.

除了上例,又如,定义替换函数:

def replacem(o):
	if o.group(0)=='-':return ' '
	else: return '*'

然后

print re.sub ('-{1,2}',replacem,'pro---g--r-am')

输出结果为:

引用

pro* g*r am

实际上,对match obj来说,group()函数在python中的定义是这样的, group(0)为整个匹配成功的字串,而group(1~N)为可能在pattern中出现的捕获型括号所匹配的对象串(这也表明python和大多数语言一样是NFA型的正则表达式:)).
比如开头一例,对匹配成功的两处,"，"和"我",他们各自内部的group(1)就是(\d+)匹配的内容,即65292和25105,接着函数将它们转成十进制整数,即unicode编码,进而变为unicode字符.

也许读者开始还对开头一例中的(?s)疑惑不解.实际上这是所谓的单行匹配模式,实际上是使点号能匹配本来不能匹配的换行符.为啥叫单行匹配模式我估计是,把一串含有多个物理多行的文本视为单行文本处理,故为单行匹配.
至于为什么例子中要有这个,这就无从考证了,因为没有(?s),该regex一样正常工作.实际上,可能只是原作者的regex使用习惯.-_-b

分享到：

Young Tableau问题的随笔 | 理解python的unicode字符串

2009-09-08 21:02
浏览 1553
评论(0)
查看更多

评论

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

Regex Quick Syntax Reference: It demonstrates regex syntax in a well-organized format that can be used as a handy reference, showing you how to execute regexes in many languages, including JavaScript, Python, Java, and C#. ...

Python for Bioinformatics 第二版，最新版: Chapter 14 Graphics in Python 299 14.1 INTRODUCTION TO BOKEH 299 14.2 INSTALLING BOKEH 299 14.3 USING BOKEH 301 14.3.1 A Simple X-Y Plot 303 14.3.2 Two Data Series Plot 304 14.3.3 A Scatter Plot 306 ...

search-text-in-files:Python脚本以文本方式递归搜索目录中的所有文件。支持docx，txt，pdf和pptx: pip install doc2txt python-pptx pdfminer.six 用法将其中一个脚本复制到要搜索文本的文件夹中并运行它。当前支持docx，txt和pptx文件搜索。脚本还会在所有子文件夹中进行递归搜索。 trigram_token_match.py...

使用python 获取进程pid号的方法: 保存为.py文件后运行脚本在后面添加进程名称即可比如:python ...pid(name):　process_list = psutil.get_process_list()　regex = “pid=(\d+),\sname=\'” + name + “\'”　print regex　pid = 0　for line in

SublimeText 2编译python出错的解决方法（The system cannot find the file specified）: [Error 2] The system cannot find the file specified ...复制代码代码如下:{ “cmd”: [“python”, “-u”, “$file”], “file_regex”: “^[ ]*File \”(…*?)\”, line ([0-9]*)”, “selector”: “source.

npp.8.0.Installer.x64.exe: 14. Fix Python Function List not showing functions in some circumstance. 15. Enhance Folder as Workspace performance while adding/removing files in bulk. 16. Add Ada, Fortran, Fortran77 & Haskell in ...

npp.8.0.portable.x64.7z: 14. Fix Python Function List not showing functions in some circumstance. 15. Enhance Folder as Workspace performance while adding/removing files in bulk. 16. Add Ada, Fortran, Fortran77 & Haskell in ...

npp.8.0.portable.x64.zip: 14. Fix Python Function List not showing functions in some circumstance. 15. Enhance Folder as Workspace performance while adding/removing files in bulk. 16. Add Ada, Fortran, Fortran77 & Haskell in ...

otm:otm-在树形图中显示elf文件的静态内存: 用法usage: otm.py [-h] [-d] [-fp FUNCTION_PATH_REGEX_IN] [-op OBJECT_PATH_REGEX_IN] [-fn FUNCTION_NAME_REGEX_IN] [-on OBJECT_NAME_REGEX_IN] [-Fp FUNCTION_PATH_REGEX_EX] [-Op OBJECT_PATH_REGEX_EX] [-Fn...

关于Python正则表达式 findall函数问题详解: 在写正则表达式的时候总会遇到不少的问题，特别是在表达式有多个元组的时候...regex1=re.compile("(\w+)\s+\w+") print(regex1.findall(str)) regex2=re.compile("\w+\s+\w+") print(regex2.findall(str)) 结果： [('

pycharm使用正则表达式批量添加print括号完美从python2迁移到python3: 1、在pycharm编译器中，Ctrl+R调出替换功能框，勾选“Regex”，选择正则表达式替换方法 2、从上到下，第一个搜索框输入 print (.*?);?$ 正则表达式含义 . 匹配任意字符，除了换行符，当re.DOTALL标记被...

Python正则表达式匹配ip地址实例: :\d{1,3}\.){3}\d{1,3}(?![\.\d])') for ip in reip.findall(line): print "ip>>>", ip PS：关于正则，这里再为大家推荐2款非常方便的正则表达式工具供大家参考使用： JavaScript正则表达式在线测试工具： ...

Jeffrey E. F. Friedl - Mastering.Regular.Expressions.3rd.Edition: They are now standard features in a wide range of languages and popular tools, including Perl, Python, Ruby, Java, VB.NET and C# (and any language using the .NET Framework), PHP, and MySQL. ...

Advanced Apple Debugging & Reverse Engineering v0.9.5: In this chapter, you’ll explore how to inspect your LLDB Python scripts using the Python pdb module, which is used for debugging Python scripts. 19. Script Bridging Classes and Hierarchy You’ve ...

cursive_re:适用于Python 3.6及更高版本的可读正则表达式: 安装pip install cursive_re例子>> > from cursive_re import *>> > hash = text ( '#' )>> > hexdigit = any_of ( in_range ( '0' , '9' ) + in_range ( 'a' , 'f' ) + in_range ( 'A' , 'F' ))>> > hexcolor = (.....

Mastering Vim Build a software development environment with Vim and Neovim: By the end of this book, you will be sufficiently confident to make Vim (or its fork, Neovim) your first choice when writing applications in Python and other programming languages. Contents 1: ...

MetaMap:一个简单的Python脚本，可在Nmap扫描中在主机上运行Metasploit模块: usage: metamap.py [-h] (--xml-file XML_FILE | --regex-file REGEX_FILE | --target TARGET_IP) [--filter [SUBNET [SUBNET ...]]] [--verbose] [--debug] [--module-options MODULE_OPTIONS] output-file ...

SecretFinder:SecretFinder-用于查找敏感数据（apikey，accesstoken，jwt等）的python脚本，并搜索javascript文件中的所有内容: 关于SecretFinder SecretFinder是基于的python脚本，旨在发现...usage: SecretFinder.py [-h] [-e] -i INPUT [-o OUTPUT] [-r REGEX] [-b] [-c COOKIE] [-g IGNORE] [-n ONLY] [-H HEADERS] [-p PROXY] optional a

vscode 插件合集2: chrmarti.regex-0.2.0 CoenraadS.bracket-pair-colorizer-1.0.37 daltonjorge.scala-0.0.5 danields761.dracula-theme-from-intellij-pythoned-0.1.4 DavidAnson.vscode-markdownlint-0.17.0 dbaeumer.vscode-eslint...

Global site tag (gtag.js) - Google Analytics