Find string to regular expression programmatically?
Assume you define regular expressions like this:
R :=
<literal string>
(RR) -- concatenation
(R*) -- kleene star
(R|R) -- choice
Then you can define a recursive function S(r)
which finds a matching string:
S(<literal string>) = <literal string>
S(rs) = S(r) + S(s)
S(r*) = ""
S(r|s) = S(r)
For example: S(a*(b|c)) = S(a*) + S(b|c) = "" + S(b) = "" + "b" = "b"
.
If you have a more complex notion of regular expression, you can rewrite it in terms of the basic primitives and then apply the above. For example, R+ = RR*
and [abc] = (a|b|c)
.
Note that if you've got a parsed regular expression (so you know its syntax tree), then the above algorithm takes at most time linear in the size of the regular expression (assuming you're careful to perform the string concatenations efficiently).
finding regular exprssions from a list of string examples
I implemented a solution myself. I made a small python package out of it and put it in my Github repo @ http://github.com/shivylp/RegexUtils
If anyone is looking for something similar, can use it.
finding regular exprssions from a list of string examples
I implemented a solution myself. I made a small python package out of it and put it in my Github repo @ http://github.com/shivylp/RegexUtils
If anyone is looking for something similar, can use it.
How to know if a string could match a regular expression by adding more characters
You can do it as easy as
boolean couldMatch(CharSequence charsSoFar, Pattern pattern) {
Matcher m = pattern.matcher(charsSoFar);
return m.matches() || m.hitEnd();
}
If the sequence does not match and the engine did not reach the end of the input, it implies that there is a contradicting character before the end, which won’t go away when adding more characters at the end.
Or, as the documentation says:
Returns true if the end of input was hit by the search engine in the last match operation performed by this matcher.
When this method returns true, then it is possible that more input would have changed the result of the last search.
This is also used by the Scanner
class internally, to determine whether it should load more data from the source stream for a matching operation.
Using the method above with your sample data yields
Pattern fpNumber = Pattern.compile("[+-]?\\d*\\.?\\d*");
String[] positive = {"+", "-", "123", ".24", "-1.04" };
String[] negative = { "+A", "-B", "123z", ".24.", "-1.04+" };
for(String p: positive) {
System.out.println("should accept more input: "+p
+", couldMatch: "+couldMatch(p, fpNumber));
}
for(String n: negative) {
System.out.println("can never match at all: "+n
+", couldMatch: "+couldMatch(n, fpNumber));
}
should accept more input: +, couldMatch: true
should accept more input: -, couldMatch: true
should accept more input: 123, couldMatch: true
should accept more input: .24, couldMatch: true
should accept more input: -1.04, couldMatch: true
can never match at all: +A, couldMatch: false
can never match at all: -B, couldMatch: false
can never match at all: 123z, couldMatch: false
can never match at all: .24., couldMatch: false
can never match at all: -1.04+, couldMatch: false
Of course, this doesn’t say anything about the chances of turning a nonmatching content into a match. You could still construct patterns for which no additional character could ever match. However, for ordinary use cases like the floating point number format, it’s reasonable.
Regular Expression to get a string between backtick `` in Console application (C#)
Something like this (Regular expression and Linq):
String test = "select t.`ProductID` AS `ProductID`, t.`AttributeID` ...";
// If you want to preserve `` the pattern is @"\bAS\s*(`[^`]*?`)"
String pattern = @"\bAS\s*`([^`]*?)`";
var result = Regex
.Matches(test, pattern, RegexOptions.IgnoreCase)
.OfType<Match>()
.Select(match => match.Groups[1].Value)
.ToArray(); // if you want, say, an array representation
Console.Write(String.Join(", ", result));
And you'll get
ProductID, AttributeID, ... , ModifiedBy
However, be careful: in general case regular expressions are not a good choice for parsing SQL
; let me provide some examples to show the problems emerging:
-- commented AS ("abc" should not be returned)
select a /* AS `abc`*/
from tbl
-- commented value ("abc" should be returned, not "obsolete" or "proposed")
select a AS /*`obsolete`*/ `abc` /*`proposed`*/
from tbl
-- String ("abc" should not be returned)
select 'a AS `abc`'
from tbl
-- honest AS ("abc" should be returned)
select a /*'*/AS `abc`--'
from tbl
-- commented comment ("abc" should be returned)
select -- /*
a AS `abc`
--*/
from tbl
Related Topics
How to Use Link_To to Link an Image and a Text
Log Doesn't Work in Production with Delayed Job
Rails Initializes Extremely Slow on Ruby 1.9.1
/Usr/Bin/Env Ruby_Noexec_Wrapper Fails with No File or Directory
Custom_Require.Rb:36:In 'Require': No Such File to Load -- Myapp(Loaderror)
Generate a Nested JSON Array in Jbuilder
Could Not Find Rake-10.1.0 in Any of the Sources
How to Force Rails to Load All Models
How to Source Environment Variables for a Command Shell in a Ruby Script
Installing Ruby Gem Less-Rails on Windows MAChine Using Therubyracer
Sidekiq Worker Not Getting Triggered
Rails 5.0.0 When Installing "Nio4R":Failed to Build Gem Native Extension
Saml 2.0 Sso for Ruby on Rails
Save Image with Mechanize and Nokogiri