Getting file extension using pattern matching in python
I am trying to find the extension of a file, given its name as a string. I know I can use the function os.path.splitext
but it does not work as expected in case my file extension is .tar.gz
or .tar.bz2
as it gives the extensions as gz
and bz2
instead of tar.gz
and tar.bz2
respectively.
So I decided to find the extension of files myself using pattern matching.
print re.compile(r'^.*[.](?P<ext>tar.gz|tar.bz2|w+)$').match('a.tar.gz')group('ext')
>>> gz # I want this to come as 'tar.gz'
print re.compile(r'^.*[.](?P<ext>tar.gz|tar.bz2|w+)$').match('a.tar.bz2')group('ext')
>>> bz2 # I want this to come 'tar.bz2'
I am using (?P<ext>...)
in my pattern matching as I also want to get the extension.
Please help.
>>> print re.compile(r'^.*[.](?P<ext>tar.gz|tar.bz2|w+)$').match('a.tar.gz').group('ext')
gz
>>> print re.compile(r'^.*?[.](?P<ext>tar.gz|tar.bz2|w+)$').match('a.tar.gz').group('ext')
tar.gz
>>>
The ? operator tries to find the minimal match, so instead of .* eating ".tar" as well, .*? finds the minimal match that allows .tar.gz to be matched.
root,ext = os.path.splitext('a.tar.gz')
if ext in ['.gz', '.bz2']:
ext = os.path.splitext(root)[1] + ext
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
我有一个想法,它比用正则表达式打破你的头更容易,有时它听起来也许很愚蠢。
name="filename.tar.gz"
extensions=('.tar.gz','.py')
[x for x in extensions if name.endswith(x)]