Pattern matching and placeholder values
I'm writing an application that uses renaming rules to rename a list of files based on information given by the user. The files may be inconsistently named to begin with, or the filenames may be consistent. The user selects a list of files, and inputs information about the files (for MP3s, they would be Artist, Title, Album, etc). Using a rename rule (example below), the program uses the user-inputted information to rename the files accordingly.
However, if all or some the files are named consistently, I would like to allow the program to 'guess' the file information. That is the problem I'm having. What is the best way to do this?
Sample filenames:
Kraftwerk-Kraftwerk-01-RuckZuck.mp3
Kraftwerk-Autobahn-01-Autobahn.mp3
Kraftwerk-Computer World-03-Numbers.mp3
Rename Rule:
%Artist%-%Album%-%Track%-%Title%.mp3
The program should properly deduce the Artist, Track number, Title, and Album name.
Again, what's the best way to do this? I was thinking regular expressions, but I'm a bit confused.
Easiest would be to replace each %Label%
with (?<Label>.*?)
, and escape any other characters.
%Artist%-%Album%-%Track%-%Title%.mp3
becomes
(?<Artist>.*?)-(?<Album>.*?)-(?<Track>.*?)-(?<Title>.*?).mp3
You would then get each component into named capture groups.
Dictinary<string,string> match_filename(string rule, string filename) {
Regex tag_re = new Regex(@'%(w+)%');
string pattern = tag_re.Replace(Regex.escape(rule), @'(?<$1>.*?)');
Regex filename_re = new Regex(pattern);
Match match = filename_re.Match(filename);
Dictionary<string,string> tokens =
new Dictionary<string,string>();
for (int counter = 1; counter < match.Groups.Count; counter++)
{
string group_name = filename_re.GroupNameFromNumber(counter);
tokens.Add(group_name, m.Groups[counter].Value);
}
return tokens;
}
But if the user leaves out the delimiters, or if the delimiters could be contained within the fields, you could get some strange results. The pattern would for %Artist%%Album%
would become (?<Artist>.*?)(?<Album>.*?)
which is equivalent to .*?.*?
. The pattern wouldn't know where to split.
This could be solved if you know the format of certain fields, such as the track-number. If you translate %Track%
to (?<Track>d+)
instead, the pattern would know that any digits in the filename must be the Track
.
Not the answer to the question you asked, but an ID3 tag reading library might be a better way to do this when you are using MP3s. A quick Google came up with: C# ID3 Library.
As for guessing which string positions hold the artist, album, and song title... the first thing I can think of is that if you have a good selection to work with, say several albums, you could first see which position repeats the most, which would be the artist, which repeats the second most (album) and which repeats the least (song title).
Otherwise, it seems like a difficult guess to make based solely on a few strings in the file name... could you ask the user to also input a matching expression for the file name that describes the order of the fields?
The filenames in your example seem pretty consistent to me. You can simply do string.Split() and add each element of the resulting array to its according tag information.
Guessing at which position is which tag information would involve TONS of heuristics.
Btw. folders that contain song files usually have some pattern in their name as well, fe
1998 - Seven
1999 - Periscope
2000 - CO2
The format here is %Year% - %AlbumName%, that might help you to identify which element in the filename is the album.
链接地址: http://www.djcxy.com/p/41930.html上一篇: 在Windows上安装PHP扩展
下一篇: 模式匹配和占位符值