Regular expression to match only the first file in a RAR file set
The short answer is that it's not possible to construct a single regex to satisfy your problem. Ruby 1.8 does not have lookaround assertions (the (?<! stuff in your example regex) which is why your regex doesn't work. This leaves you with two options.
1) Use more than one regex to do it.
def is_first_rar(filename)
if ((filename =~ /part(\d+)\.rar$/) == nil)
return (filename =~ /\.rar$/) != nil
else
return $1.to_i == 1
end
end
2) Use the regex engine for ruby 1.9, Oniguruma. It supports lookaround assertions, and you can install it as a gem for ruby 1.8. After that, you can do something like this:
def is_first_rar(filename)
reg = Oniguruma::ORegexp.new('.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)')
match = reg.match(filename)
return match != nil
end
Regex to match the first file in a rar archive file set in Python
There's no need to use look behind assertions for this. Since you start looking from the beginning of the string, you can do everything with look-aheads that you can with look-behinds. This should work:
^((?!\.part(?!0*1\.rar$)\d+\.rar$).)*\.(?:rar|r?0*1)$
To capture the first part of the filename as you requested, you could do this:
^((?:(?!\.part\d+\.rar$).)*)\.(?:(?:part0*1\.)?rar|r?0*1)$
Regex to match the first file in a rar archive file set
This should pass your tests:
var regex = new Regex(@"(\.001|\.part0*1\.rar|^((?!part\d*\.rar$).)*\.rar)$", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Assert.That(regex.IsMatch("filename.001"));
Assert.That(regex.IsMatch("filename.rar"));
Assert.That(regex.IsMatch("filename.part1.rar"));
Assert.That(regex.IsMatch("filename.part01.rar"));
Assert.That(regex.IsMatch("filename.004"), Is.False);
Assert.That(regex.IsMatch("filename.057"), Is.False);
Assert.That(regex.IsMatch("filename.r67"), Is.False);
Assert.That(regex.IsMatch("filename.s89"), Is.False);
Assert.That(regex.IsMatch("filename.part2.rar"), Is.False);
Assert.That(regex.IsMatch("filename.part04.rar"), Is.False);
Assert.That(regex.IsMatch("filename.part11.rar"), Is.False);
python regex expression to match (first multipart or simple part) rar archive
The reason that you sometimes match the last character is because the pattern (.*)(?:part0*1|.*[^(part\d+)])\.rar
that you tried, first captures the whole line in capture group 1.
That capture group is followed by an alternation matching either part0*1
or .*[^(part\d+)]
You can see that the lines that have part followed by a digit at the end are matched.
But, when there is no match for part0*1
the next alternative is tried which is .*[^(part\d+)]
.
The second alternative matches until the end of the string (where it already is), and then matches a single character of [^(part\d+)]
because using the square brackets makes it a character class without a quantifier.
One option could be using a negative lookahead asserting that the string does not contain part
followed by optional zeroes and either a char 2-9 and optional digits or |
1-9 and 1 or more digits.
^(?!.*part0*(?:[2-9]\d*|[1-9]\d+)\.rar)(.+)\.rar$
Regex demo
Regex to determine if a file is a rar file
\.(?:rar|r\d\d|\d\d\d)$
I think.
Edit: Credit to Peter for another correction.
Validating file types by regular expression
Your regex seems a bit too complex in my opinion. Also, remember that the dot is a special character meaning "any character". The following regex should work (note the escaped dots):
^.*\.(jpg|JPG|gif|GIF|doc|DOC|pdf|PDF)$
You can use a tool like Expresso to test your regular expressions.
Modify working AddHandler to match files only in the CURRENT directory, NOT child directories
If
directive can be used to provide a condition for the handler to be added only for files matching the pattern in the current folder.
The following example will add the handler for only files in the document root, such as /sitemap.xml
and /opensearch.xml
but not for /folder/sitemap.xml
and /folder/opensearch.xml
<FilesMatch ^(opensearch|sitemap)\.xml$>
<If "%{REQUEST_URI} =~ m#^\/(opensearch|sitemap)\.xml$#">
AddHandler application/x-httpd-php .xml
</If>
</FilesMatch>
In the above example, the condition is checking that the REQUEST_URI
matches the regex pattern delimited in m#
#
.
The ~=
comparison operator checks that a string match a regular expression.
The pattern ^\/(opensearch|sitemap)\.xml$
matches REQUEST_URI
variable (the path component of the requested URI) such as /opensearch.xml
or /sitemap.xml
^ # startwith
\/ # escaped forward-slash
(opensearch|sitemap) # "opensearch" or "sitemap"
\. # .
xml # xml
$ # endwith
Java iterating through all the files and only the unique RAR archives in a directory
I've written this bit of code to identify RAR archives where I only take the first-volume of a spanned archive into consideration and omit the others.
/**
* Checks whether a file is an archive
*
* @param filFile the file to checks
* @retuns a bollean value indicating the result
*/
public static Boolean isArchive(File filFile) {
try {
byte[] bytSignature = new byte[] {0x52, 0x61, 0x72, 0x21, 0x1a, 0x07, 0x00};
FileInputStream fisFileInputStream = new FileInputStream(filFile);
byte[] bytHeader = new byte[20];
fisFileInputStream.read(bytHeader);
Short shoFlags = (short) (((bytHeader[10]&0xFF)<<8) | (bytHeader[11]&0xFF));
//Check if is an archive
if (Arrays.equals(Arrays.copyOfRange(bytHeader, 0, 7), bytSignature)) {
//Check if is a spanned archive
if ((shoFlags & 0x0100) != 0) {
//Check if it the first part of a spanned archive
if ((shoFlags & 0x0001) != 0) {
return true;
} else {
return false;
}
} else {
return true;
}
} else {
return true;
}
} catch (Exception e) {
return false;
}
}
I've used the official RAR header specifications. In order to implement this and parse the bytes, I've followed a discussion here:
How do I read in hex values from a binary file and decipher some bytes containing bitflag values?.
How do i search directories and find files that match regex?
import os
import re
rootdir = "/mnt/externa/Torrents/completed"
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
for root, dirs, files in os.walk(rootdir):
for file in files:
if regex.match(file):
print(file)
CODE BELLOW ANSWERS QUESTION IN FOLLOWING COMMENT
That worked really well, is there a way to do this if match is found on regex group 1 and do this if match is found on regex group 2 etc ? – nillenilsson
import os
import re
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
rx = '(.*zip$)|(.*rar$)|(.*r01$)'
for root, dirs, files in os.walk("../Documents"):
for file in files:
res = re.match(rx, file)
if res:
if res.group(1):
print("ZIP",file)
if res.group(2):
print("RAR",file)
if res.group(3):
print("R01",file)
It might be possible to do this in a nicer way, but this works.
In bash, what's the regular expression to list two types of files?
Do you really want a regular expression?
This uses *
("globbing") and {[...]}
("brace expansion").
$ ls *.{zip,rar}
See also this question for many, many more shortcuts.
Related Topics
Import CSV in Batches of Lines in Rails
Mechanize and Ntlm Authentication
Iterate JSON with Ruby and Get a Key,Value in an Array
Rails: Organizing Models in Subfolders Having Warning: Toplevel Constant a Referenced by B::A
Expressing Conditional Haml Possibly with Ternary Operator
How to Effectively Force Minitest to Run My Tests in Order
How to Convert This Ruby String into an Array
Sorting Numeric Strings in Ruby
Can Nokogiri Search for "Xml-Stylesheet" Tags
Ruby/Rails Actionmailer Not Working with Ntlm
How to Use String Methods on Utf-8 Characters
How to Convert Character Code to What I Want
Will_Paginate Can It Order by Day
Uri::Invalidurierror: Bad Uri(Is Not Uri) Testing Rails Controllers
How to Recursively Flatten a Yaml File into a JSON Object Where Keys Are Dot Separated Strings
System New Line Separator in Ruby
How to Serialize as Activesupport::Hashwithindifferentaccess Anymore
Why Do I Get "Undefined Method 'Paginate'" Error in Production