Grammar Inference Library

Grammar inference library?

I have not used them yet, but I had this identical question and (after much searching) found these two libraries, at very least:

  • libalf for C++
  • gitoolbox for MATLAB

Unlike the other answers to the question, these are actual grammar inference libraries rather than parser generators.

Grammatical inference of regular expressions for given finite list of representative strings?

Yes, it turns out this does exist; what is required is what is known academically as a DFA Learning algorithm, examples of which include:

  • Angluin's L*
  • L* (adding counter-examples to columns)
  • Kearns / Vazirani
  • Rivest / Schapire
  • NL*
  • Regular positive negative inference (RPNI)
  • DeLeTe2
  • Biermann & Feldman's algorithm
  • Biermann & Feldman's algorithm (using SAT-solving)

Source for the above is libalf, an open-source automata learning algorithm framework in C++; descriptions of at least some of these algorithms can be found in this textbook, among others. There are also implementations of grammatical inference algorithms (including DFA learning) in gitoolbox for MATLAB.

Since this question has come up before and has not been satisfactorily answered in the past, I am in the process of evaluating these algorithms and will update will more information about how useful they are, unless someone with more expertise in the area does first (which is preferable).

NOTE: I am accepting my own answer for now but will gladly accept a better one if someone can provide one.

FURTHER NOTE: I've decided to go with the route of using custom code, since using a generic algorithm turns out to be a bit overkill for the data I'm working with. I'm leaving this answer here in case someone else needs it, and will update if I ever do evaluate these.

Building an Inference Engine in Python

I'm very surprised that step #3 is the one giving you trouble...

Assuming you can label/categorize properly each token (and that prior to categorization you can find the proper tokens, as there may be many ambiguous cases...), the "Step #3" problem seems one that could easily be tackled with a context free grammar where each of the desired actions (such as ZIP code lookup or Mathematical expression calculation...) would be symbols with their production rule itself made of the possible token categories. To illustrate this in BNF notation, we could have something like

<SimpleMathOperation> ::= <NumericalValue><Operator><NumericalValue>

Maybe your concern is that when things get complicated, it will become difficult to express the whole requirement in terms of non-conflicting grammar rules. Or maybe your concern is that one could add rules dynamically, hence forcing the grammar "compilation" logic to be integrated with the program ? Whatever the concern, I think that this 3rd step will comparatively be trivial.

On the other hand, and unless the various categories (and underlying input text) are such that they can be described with a regular language as well (as you seem to hint in the question), a text parser and classifier (Steps #1 and #2...) is typically a less than trivial affair..

Some example Python libraries that simplify writing and evaluating grammars:

  • pijnu
  • pyparsing

Defining grammar based rules and inferences in android

I assume you are asking about how to program,not how to use Android as OS.

Unfortunately there are no grammar tools in java for Android, you would have to use a parser for natural language (see this parser online, also has an API for programming).

Parsers like YACC, BISON, ANTLR or similar (you'll get a lot of info by Googleing those terms) are no good for you since they require unambiguous grammars, and natural language is nothing but unambiguous.

If your input is relatively simple you should consider using some sort of regular expression parsing and classification. It is much simpler and easier to implement. Java does support all sort of string manipulation with regular expressions.

Generating a parser with `inverse`, with constraints on the grammar

You can achieve the same results using just Char. As you've already pointed out, you can use isAlpha to define name2char as a partial identity. I changed the following lines of your code.

type NameChar = Char

name2char :: NameChar -> Char
name2char c | isAlpha c = c

The two exemplary expressions then evaluate as follows.

test> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node (Name "input") [(Attr (Name "type") "submit"),(Attr (Name "name") "btn1")] [])

test> parse "<input type=\"submit\"/>"
(Node (Name "input") [(Attr (Name "type") "submit")] [])

As a side-effect, names with non-alpha characters silently fail with nameFromString.

test> nameFromString "input "

Edit: Since you seem to be a fan of function patterns, you can define generators for Nodes and Attrs and use them in your conversion function.

attr :: Name -> String -> Attr
attr name val
| name `elem` ["type", "src", "alt", "name"] = Attr name val

node :: String -> [Attr] -> [Node] -> Node
node name [] nodes
| name `elem` ["a", "p"] = Node name [] nodes
node name attrPairs@(_:_) nodes
| name `elem` ["img", "input"] = Node name attrPairs nodes

node2string :: Node -> String
node2string (node name attrs children)
| null children = "<" ++ name ++ attrs' ++ "/>"
| otherwise = "<" ++ name ++ attrs' ++ ">"
++ children' ++ "</" ++ name' ++ ">"
where
name' = name
attrs' = concatMap ((" " ++) . attr2string) attrs
children' = intercalate "" $ map (node2string) children

attr2string :: Attr -> String
attr2string (attr name val) = name ++ "=\"" ++ escape val ++ "\""
where
escape = concatMap (\c -> if c == '"' then "\\\"" else [c])

This approach has its disadvantages; it works quite well for a specific set of valid names, but fails miserably when you use a predicate like before (e.g., all isAlpha name).

Edit2:
Besides the fact that the solution with the isAlpha condition is quite "prettier" than your verbose solution, it is also defined in a declarative way.
Without your comments, it doesn't become clear (that easily) that you are encoding alphabetic characters with your NameChar data type. The isAlpha condition on the other hand is a good example for a declarative specification of the wanted property.
Does this answer your question? I'm not sure what you are aiming at.

Inference engine to calculate matching set according to internal rules

There are a bunch of embedded Prolog-like SLD solvers for Java; my favourite approach is to use mini-Kanren for Scala, since that is clean and allows you to use Scala to lazily handle the results of queries, but I have not used it in any depth. See Embedded Prolog Interpreter/Compiler for Java for other options, as well as Ross' answer.

SLD solvers handle all of your criteria, provided they have some extra features that Prolog has:

  1. Conjunction of rules: Basic SLD goal processing;
  2. Rules may have different priorities: Prolog's cut rule allows representation of negation, provided the queries are decidable;
  3. Rules may be conflicting: Again, with cut you can ensure that lower priority clauses are not applied if higher priority goals are satisfied. There are a few ways to go about doing this.
  4. Rules are symmetric: With cut, this is easily ensured for decidable predicates.
  5. Rules (or rather subsets) may have meta rules (explicitly defined) associated with them: your example seems to suggest this is equivalent to 4, so I'm not sure I get what you are after here.

The advantages and disadvantages of SLD solvers over description logic-based tools are:

  1. Programmatic power, flexibility: you can generally find programming solutions to modelling difficulties, where description logics might require you to rethink your models. But of course absence of duct-tape means that description logic solutions force you to be clean, which might be a good discipline.
  2. Robustness: SLD solvers are a very well understood technology, while description logic tools are often not many steps from their birth in a PhD thesis.
  3. Absence of semantic tools: description logic has nice links with first-order logic and model logic, and gives you a very rich set of techniques to reason about them. The flexibility of Prolog typically makes this very hard.

If you do not have special expertise in description logic, I'd recommend an SLD solver.



Related Topics



Leave a reply



Submit