What Kinds of Patterns Could I Enforce on the Code to Make It Easier to Translate to Another Programming Language

What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

I've been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).

There's nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you'll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.

The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that's even harder and they normally find out painfully with long time delays, high costs and often outright failure.)

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.

Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don't justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that's painful.

I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.

translate one language to another?

Translating one language to another is just a special case for the class of programs called compilers, interpreters and translators.

This class of program will take a stream of input symbols ("source code") that can usually be described by a formal grammar and will output a stream of symbols.

That output stream of symbols can be:

  • Native assembly code, usually for the operating system and hardware the machine is running on. If so, the program is referred to as a compiler;
  • Native assembly code for a different OS and/or hardware. This can be called a compiler too but is often referred to as a cross-compiler;
  • To an intermediate form that can be executed by a virtual machine of some kind. This isn't a true compiler but is often called a compiler anyway. The Java, C#, F#, VB.NET, etc "compilers" all fall into this category;
  • To another language entirely. This is called a translator and there are examples of, say, Java to C# translators. They typically have varying degrees of success because idioms often aren't readily translatable;
  • Interpreters follow the same principle but typically execute the processed form in-place rather than saving it somewhere. Perl, PHP and shell scripts all fall into this category. PHP for example will store opcodes in an opcode cache as an intermediate form (if opcoding caching is enabled) but this intermediate form isn't stored so it's still safe to call PHP an interpreter.

Reverse Engineering a Programming Language or 'Unsupervised Learning of Languages'

The simple answer is "No".

Any kind of generalization from examples suffers from the basic fact that it is guessing. You may guess that the langauge has an 'if' token. There's no guarantee that it does, or that it is spelled if or that it has semantics that you understand.
You're not going to get an automated tool to induce the grammar for you.

Your best bet is to take all the documents you can get that describe the langauge, and, well, guess at a grammar. Then you build a parser for the grammar, and validate it against as big a code base as you can find, and revise. I've done this dozens of times with a wide variety of langauges (see my bio).

It is painful, but you often get someplace pretty useful. The good news is that your parser doesn't have to parse anything the users don't know how to write. The bad news is they'll write things based on some obscure example you've never seen, or with a typo that accidentally works. (Even the language designer didn't intend it, but that doesn't matter to the user; his program works and your compiler doesn't. Your problem by definition).

What you'll never know is if the the provider of the language has certain features he simply hasn't documented, and hasn't shown anyone else. Be continually prepared to be surprised, long after you are done :-{

Now, the best tool you can use for this process IMHO is a GLR parser generator; it is what my company uses. These will parse any context-free langauge (that you might propose) without a lot of struggle to bend the grammar to match the other-common restrictions of recursive descent, LL(k), or LR(k) parsers. Life is hard enough to to guess the grammar, let alone guess the grammar and then guess how to bend to it make the parser generator swallow it correctly.

You also have the problem of building a translator, once you get the grammar right. You might find this SO answer helpful: What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

How to bolt on ANTLR 4 front to GCC Generic/GIMBLE?

Gluing a C++ implementation of ANTLR into GCC so that GCC will call it is likely to be the easy step.
[Don't expect to be easy; GCC wants to be GCC, not your pet. You might get some help from GCC Melt, a package for interfacing to GCC machinery.]

The AST produced for an arbitrary (e.g., your custom DSL) language doesn't "just move (easily)" to a C AST or to the GCC Gimple (not GIMBLE) framework.

You will have to build, in essence, your DSL-AST to C-AST translator, or your DSL-AST to Gimple translator. There is no a priori reason to believe that building such a translator is easy; for example, you didn't tell us your DSL was "just like C except ...". So, you're going to have to build a translator. In the absence of evidence this is easy, you'll have to translate your DSL concepts to C concepts. The better ("non C-like") your DSL is, the harder this is going to be.

This SO link discusses the issues behind translation in more detail: What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

Experiences with language converters?

It seems to me, as is almost always the case with MS-ACCESS questions having tags that attract the wider StackOverflow population, that the people answering are missing the key question here, which I read as:

Are there any tools that can successfully convert an Access application to any other platform?

And the answer is

ABSOLUTELY NOT

The reason for that is simply that tools in the same family that use similar models for the UI objects (e.g., VB6) lack so many things that Access provides by default (how do you convert an Access continuous subform to VB6 and not lose functionality?). And other platforms don't even share the same core model as VB6 and Access, so those have even more hurdles to clear.

The cited MySQL article is quite interesting, but it really confuses the problems that come with incompetently-developed apps vs. the problems that come with the development tools being used. A bad data schema is not inherent to Access -- it's inherent to [most] novice database users. But the articles seems to attribute this problem to Access.

And entirely overlooks the possibility of fixing the schema, upsizing it to MySQL and keeping the front end in Access, which is by far the easiest approach to the problem.

This is exactly what I expect from people who just don't get Access -- they don't even consider that Access as front end to a securable, large-capacity server database engine can be a superior solution to the problem.

That article doesn't even really consider conversion of an Access app, and there's good reason for that. All the tools that I've seen that claim to convert Access applications (to whatever platform) either convert nothing but data (in which case they don't convert the app at all -- morons!), or convert the front end structure slavishly, with a 1:1 correspondence between UI objects in the Access application and in the target app.

This doesn't work.

Access's application design is specific to itself, and other platforms don't support the same set of features. Thus, there has to be translation of Access features into a working substitute for the original feature in the converted application. This is not something that can be done in an automated fashion, in my opinion.

Secondly, when contemplating converting an Access app for deployment in the web browser, the whole application model is different, i.e., from stateful to stateless, and so it's not just a matter of a few Access features that are unsupported, but of a completely different fundamental model of how the UI objects interact with the data. Perhaps a 100% unbound Access app could be relatively easily be converted to a browser-based implementation, but how many of those are there? It would mean an Access app that uses no subforms whatsoever (since they can't be unbound), and an app that uses only a handful of events from the rich event model (most of which work only with bound forms/controls). In short, a 100% unbound Access app would be one that fights against the whole Access development paradigm. Anyone who thinks they want to build an unbound app in Access really shouldn't be using Access in the first place, as the whole point of Access is the bound forms/controls! If you eliminate that, you've thrown out the majority of Access's RAD advantage over other development platforms, and gained almost nothing in return (other than enormous code complexity).

To build an app for deployment in the web browser that accomplishes the same tasks as an Access applications requires from-the-ground-up redesign of the application UI and workflow. There is no conversion or translation that will work because the successful Access application model is antithetical to the successful web application model.

Of course, all of this changes with Access 2010 and Sharepoint Server 2010 with Access Services. In that case, you can build your app in Access (using web objects) and deploy on Sharepoint for users to run it in the browser. The results are functionally 100% equivalent (and 90% visually), and run on all browsers (no IE-specific dependencies here).

So, starting this June, the cheapest way to convert an Access app for deployment in the browser may very well be to upgrade to A2010, convert the design to use all web objects, and then deploy with Sharepoint. That's not a trivial project, as Access web objects have a limited set of features in comparison to client objects (and no VBA, for instance, so you have to learn the new macros, which are much more powerful and safe than the old ones, so that's not the terrible hardship it may seem for those familiar with Access's legacy macros), but it would likely be much less work than a full-scale redesign for deployment on the web.

The other thing is that it won't require any retraining for end users (insofar as the web-object version is the same as the original client version), as it will be the same in the Access client as in the web browser.

So, in short, I'd say conversion is a chimera, and almost always not worth the effort. I'm agreeing with the cited sentiment, in fact (even if I have a lot of problems with the other comments from that source). But I'd also caution that the desire for conversion is often misguided and misses out on cheaper, easier and better solutions that don't require wholesale replacement of the Access app from top to bottom. Very often the dissatisfaction with Jet/ACE as data store confuses people into thinking they have to replace the Access application as well. And it's true that many user-developed Access apps are filled with terrible, unmaintainable compromises and are held together with chewing gum and bailing wire. But a badly-designed Access application can be improved in conjunction with the back-end upsizing andrevision of the data schema -- it doesn't have to be discarded.

That doesn't mean it's easy -- it's very often not. As I tell clients all the time, it's usually easier to build a new house than to remodel an old one. But one of the reasons we remodel old houses is because they have irreplaceable characteristics that we don't want to lose. It's very often the case that an Access app implicitly includes a lot of business rules and modelling of workflows that should not be lost in a new app (the old Netscape conundrum, pace Joel Spolsky). These things may not be obvious to the outside developer trying to port to a different platform, but for the end user, if the app produces results that are off by a penny in comparison to the old app, they'll be unhappy (and probably should be, since it may mean that other aspects of the app are not producing reliable results, either).

Anyway, I've rambled on for too long, but my opinion is that conversion never works except for the most trivial apps (or for ones that were designed to be converted, e.g., a 100% unbound Access app). I'm all for revision in place of replacment.

But, of course, that's how I make my living, i.e., fixing Access apps.

Tool for automated porting and language that can compile into others

GCC converts complex C++ code into machine code and thus technically is an answer to your question. In fact, there are lots of compiler like this, but I don't think these are what you intended to ask.

There are tools that are hardwired to translate just one language to another as source code (another poster suggested "f2C", which is a perfect example). These are just like compilers... but rarer.

There are virtually no tools that will map from one language to many others, out of the box. The problem is that languages have different execution models, data types, and execution schemes, which such a translator has to simulate properly in the target language.
The are "code generators" that claim to do this, but they are largely IMHO specifications of rather simple functions that translate trivially to simple code in the target langauge.

If you want to translate one language to another in a sort of general way, you need a program transformation system, e.g., a system that can parse arbitrary langauges, and for which you can provide translation rules that map to other languages in a sort of straightforward way.

Our DMS Software Reengineering Toolkit is one of these. This SO What kinds of patterns could I enforce on the code to make it easier to translate to another programming language? discusses the issues in more detail.



Related Topics



Leave a reply



Submit