Regular Expression to Match Only the First File in a Rar File Set

Regular expression to match only the first file in a RAR file set

The short answer is that it's not possible to construct a single regex to satisfy your problem. Ruby 1.8 does not have lookaround assertions (the (?<! stuff in your example regex) which is why your regex doesn't work. This leaves you with two options.

1) Use more than one regex to do it.

def is_first_rar(filename)
if ((filename =~ /part(\d+)\.rar$/) == nil)
return (filename =~ /\.rar$/) != nil
else
return $1.to_i == 1
end
end

2) Use the regex engine for ruby 1.9, Oniguruma. It supports lookaround assertions, and you can install it as a gem for ruby 1.8. After that, you can do something like this:

def is_first_rar(filename)
reg = Oniguruma::ORegexp.new('.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)')
match = reg.match(filename)
return match != nil
end

Regex to match the first file in a rar archive file set in Python

There's no need to use look behind assertions for this. Since you start looking from the beginning of the string, you can do everything with look-aheads that you can with look-behinds. This should work:

^((?!\.part(?!0*1\.rar$)\d+\.rar$).)*\.(?:rar|r?0*1)$

To capture the first part of the filename as you requested, you could do this:

^((?:(?!\.part\d+\.rar$).)*)\.(?:(?:part0*1\.)?rar|r?0*1)$

Regex to match the first file in a rar archive file set

This should pass your tests:

    var regex = new Regex(@"(\.001|\.part0*1\.rar|^((?!part\d*\.rar$).)*\.rar)$", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Assert.That(regex.IsMatch("filename.001"));
Assert.That(regex.IsMatch("filename.rar"));
Assert.That(regex.IsMatch("filename.part1.rar"));
Assert.That(regex.IsMatch("filename.part01.rar"));
Assert.That(regex.IsMatch("filename.004"), Is.False);
Assert.That(regex.IsMatch("filename.057"), Is.False);
Assert.That(regex.IsMatch("filename.r67"), Is.False);
Assert.That(regex.IsMatch("filename.s89"), Is.False);
Assert.That(regex.IsMatch("filename.part2.rar"), Is.False);
Assert.That(regex.IsMatch("filename.part04.rar"), Is.False);
Assert.That(regex.IsMatch("filename.part11.rar"), Is.False);

python regex expression to match (first multipart or simple part) rar archive

The reason that you sometimes match the last character is because the pattern (.*)(?:part0*1|.*[^(part\d+)])\.rar that you tried, first captures the whole line in capture group 1.

That capture group is followed by an alternation matching either part0*1 or .*[^(part\d+)]

You can see that the lines that have part followed by a digit at the end are matched.

But, when there is no match for part0*1 the next alternative is tried which is .*[^(part\d+)].

The second alternative matches until the end of the string (where it already is), and then matches a single character of [^(part\d+)] because using the square brackets makes it a character class without a quantifier.


One option could be using a negative lookahead asserting that the string does not contain part followed by optional zeroes and either a char 2-9 and optional digits or | 1-9 and 1 or more digits.

^(?!.*part0*(?:[2-9]\d*|[1-9]\d+)\.rar)(.+)\.rar$

Regex demo

Regex to determine if a file is a rar file

\.(?:rar|r\d\d|\d\d\d)$

I think.

Edit: Credit to Peter for another correction.

Validating file types by regular expression

Your regex seems a bit too complex in my opinion. Also, remember that the dot is a special character meaning "any character". The following regex should work (note the escaped dots):

^.*\.(jpg|JPG|gif|GIF|doc|DOC|pdf|PDF)$

You can use a tool like Expresso to test your regular expressions.

Modify working AddHandler to match files only in the CURRENT directory, NOT child directories

If directive can be used to provide a condition for the handler to be added only for files matching the pattern in the current folder.

The following example will add the handler for only files in the document root, such as /sitemap.xml and /opensearch.xml but not for /folder/sitemap.xml and /folder/opensearch.xml

<FilesMatch ^(opensearch|sitemap)\.xml$>
<If "%{REQUEST_URI} =~ m#^\/(opensearch|sitemap)\.xml$#">
AddHandler application/x-httpd-php .xml
</If>
</FilesMatch>

In the above example, the condition is checking that the REQUEST_URI matches the regex pattern delimited in m# #.
The ~= comparison operator checks that a string match a regular expression.

The pattern ^\/(opensearch|sitemap)\.xml$ matches REQUEST_URI variable (the path component of the requested URI) such as /opensearch.xml or /sitemap.xml

^                      # startwith
\/ # escaped forward-slash
(opensearch|sitemap) # "opensearch" or "sitemap"
\. # .
xml # xml
$ # endwith

Java iterating through all the files and only the unique RAR archives in a directory

I've written this bit of code to identify RAR archives where I only take the first-volume of a spanned archive into consideration and omit the others.

/**
* Checks whether a file is an archive
*
* @param filFile the file to checks
* @retuns a bollean value indicating the result
*/
public static Boolean isArchive(File filFile) {

try {

byte[] bytSignature = new byte[] {0x52, 0x61, 0x72, 0x21, 0x1a, 0x07, 0x00};
FileInputStream fisFileInputStream = new FileInputStream(filFile);

byte[] bytHeader = new byte[20];
fisFileInputStream.read(bytHeader);

Short shoFlags = (short) (((bytHeader[10]&0xFF)<<8) | (bytHeader[11]&0xFF));

//Check if is an archive
if (Arrays.equals(Arrays.copyOfRange(bytHeader, 0, 7), bytSignature)) {
//Check if is a spanned archive
if ((shoFlags & 0x0100) != 0) {
//Check if it the first part of a spanned archive
if ((shoFlags & 0x0001) != 0) {
return true;
} else {
return false;
}
} else {
return true;
}
} else {
return true;
}

} catch (Exception e) {
return false;
}

}

I've used the official RAR header specifications. In order to implement this and parse the bytes, I've followed a discussion here:

How do I read in hex values from a binary file and decipher some bytes containing bitflag values?.

How do i search directories and find files that match regex?

import os
import re

rootdir = "/mnt/externa/Torrents/completed"
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')

for root, dirs, files in os.walk(rootdir):
for file in files:
if regex.match(file):
print(file)

CODE BELLOW ANSWERS QUESTION IN FOLLOWING COMMENT

That worked really well, is there a way to do this if match is found on regex group 1 and do this if match is found on regex group 2 etc ? – nillenilsson

import os
import re

regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
rx = '(.*zip$)|(.*rar$)|(.*r01$)'

for root, dirs, files in os.walk("../Documents"):
for file in files:
res = re.match(rx, file)
if res:
if res.group(1):
print("ZIP",file)
if res.group(2):
print("RAR",file)
if res.group(3):
print("R01",file)

It might be possible to do this in a nicer way, but this works.

In bash, what's the regular expression to list two types of files?

Do you really want a regular expression?

This uses * ("globbing") and {[...]} ("brace expansion").

$ ls *.{zip,rar}

See also this question for many, many more shortcuts.



Related Topics



Leave a reply



Submit