Getting file extension using pattern matching in python

I am trying to find the extension of a file, given its name as a string. I know I can use the function os.path.splitext but it does not work as expected in case my file extension is .tar.gz or .tar.bz2 as it gives the extensions as gz and bz2 instead of tar.gz and tar.bz2 respectively.
So I decided to find the extension of files myself using pattern matching.

print re.compile(r'^.*[.](?P<ext>tar.gz|tar.bz2|w+)$').match('a.tar.gz')group('ext')
>>> gz            # I want this to come as 'tar.gz'
print re.compile(r'^.*[.](?P<ext>tar.gz|tar.bz2|w+)$').match('a.tar.bz2')group('ext')
>>> bz2           # I want this to come 'tar.bz2'

I am using (?P<ext>...) in my pattern matching as I also want to get the extension.

Please help.


>>> print re.compile(r'^.*[.](?P<ext>tar.gz|tar.bz2|w+)$').match('a.tar.gz').group('ext')
gz
>>> print re.compile(r'^.*?[.](?P<ext>tar.gz|tar.bz2|w+)$').match('a.tar.gz').group('ext')
tar.gz
>>>

The ? operator tries to find the minimal match, so instead of .* eating ".tar" as well, .*? finds the minimal match that allows .tar.gz to be matched.


root,ext = os.path.splitext('a.tar.gz')
if ext in ['.gz', '.bz2']:
   ext = os.path.splitext(root)[1] + ext

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.


我有一个想法,它比用正则表达式打破你的头更容易,有时它听起来也许很愚蠢。
name="filename.tar.gz"
extensions=('.tar.gz','.py')
[x for x in extensions if name.endswith(x)]

链接地址: http://www.djcxy.com/p/46936.html

上一篇: Javascript正则表达式匹配/解压缩文件扩展名

下一篇: 在python中使用模式匹配获取文件扩展名