Is There Any Difference Between Executable Binary Files Between Distributions

Is there any difference between executable binary files between distributions?

All Linux distributions use the same binary format ELF, but there is still some differences:

different cpu arch use different instruction set.
the same cpu arch may use different ABI, ABI defines how to use the register file, how to call/return a routine. Different ABI can not work together.
Even on same arch, same ABI, this still does not mean we can copy one binary file in a distribution to another. Since most binary files are not statically linked, so they depends on the libraries under the distribution, which means different distribution may use different versions or different compilation configuration of libraries.

So if you want your program to run on all distribution, you may have to statically link a version that depends on the kernel's syscall only, even this you can only run a specified arch.

If you really want to run a program on any arch, then you have to compile binaries for all arches, and use a shell script to start up the right one.

Why are Executable files operating system dependent?

In order to do something meaningful, applications will need to interface with the OS. Since system calls and user-space infrastructure look fundamentally different on Windows and Unix/Linux, having different formats for executable programs is the smallest trouble. It's the program logic that would need to be changed.

(You might argue that this is meaningless if you have a program that solely depends on standardized components, for example the C runtime library. This is theoretically true - but irrelevant for most applications since they are forced to use OS-dependent stuff).

The other differences between Windows PE (EXE,DLL,..) files and Linux ELF binaries are related to the different image loaders and some design characteristics of both OSs. For example on Linux a separate program is used to resolve external library imports while this functionality is built-in on Windows. Another example: Linux shared libraries function differently than DLLs on Windows. Not to mention that both formats are optimized to enable the respective OS kernels to load programs as quick as possible.

Emulators like Wine try to fill the gap (and actually prove that the biggest problem is not the binary format but rather the OS interface!).

Binary files and OS

Executable file formats for Windows (PE), Linux (ELF), OS/X etc (MACH-O), tend to be designed to solve common problems, so they all share common features. However, each platform specifies a different standard, so the files are not compatible across platforms, even if the platforms use the same type of CPU.

Executable file formats are not only used for executable files, but also libraries, which also contain code but are never run directly by the user - only loaded into memory to satisfy the needs to directly executable binaries.

Common Features of an executable file format:

One or more blocks of executable code
One or more blocks of read-only data such as text and numbers
One or more blocks of read/write data
Instructions on where to place these blocks in memory when the application is run
Instructions on what libraries (which are also in an 'executable file format') need to be loaded as well, and how they connect (link) up to this executable file.
One or more tables mapping code and data locations to strings or ids that describe them, useful for linking and debugging.

It's interesting to compare such formats to more basic formats, such as the venerable DOS .com file, which simply describes 64K of assorted 'stuff' to be loaded at the next available location, and has few of the features listed above.

Binary in this sense is used to compare them to 'source' files, which are written in text format. Binary format simply says that they are encoded in a non-text way, and doesn't really relate to the 0-and-1 sense of binary.

Difference between Groovy Binary and Source release?

A source release will be compiled on your own machine while a binary release must match your operating system.

source releases are more common on linux systems because linux systems can dramatically vary in cpu, installed library versions, kernelversions and nearly every linux system has a compiler installed.

binary releases are common on ms-windows systems. most windows machines do not have a compiler installed.

Src v/s Binary?

Source files are uncompiled C/C++ code, while binary files are the compiled programs.

How does OS execute binary files in virtual memory?

This question can't be answered in general because it's totally hardware and OS dependent. However a typical answer is that the initially loaded program can be compiled as you say: Because the VM hardware gives each program its own address space, all addresses can be fixed when the program is linked. No recalculation of addresses at load time is needed.

Things get much more interesting with dynamically loaded libraries because two used by the same initially loaded program might be compiled with the same base address, so their address spaces overlap.

One approach to this problem is to require Position Independent Code in DLLs. In such code all addresses are relative to the code itself. Jumps are usually relative to the PC (though a code segment register can also be used). Data are also relative to some data segment or base register. To choose the runtime location, the PIC code itself needs no change. Only the segment or base register(s) need(s) be set whenever in the prelude of every DLL routine.

PIC tends to be a bit slower than position dependent code because there's additional address arithmetic and the PC and/or base registers can bottleneck the processor's instruction pipeline.

So the other approach is for the loader to rebase the DLL code when necessary to eliminate address space overlaps. For this the DLL must include a table of all the absolute addresses in the code. The loader computes an offset between the assumed code and data base addresses and actual, then traverses the table, adding the offset to each absolute address as the program is copied into VM.

DLLs also have a table of entry points so that the calling program knows where the library procedures start. These must be adjusted as well.

Rebasing is not great for performance either. It slows down loading. Moreover, it defeats sharing of DLL code. You need at least one copy per rebase offset.

For these reasons, DLLs that are part of Windows are deliberately compiled with non-overlapping VM address spaces. This speeds loading and allows sharing. If you ever notice that a 3rd party DLL crunches the disk and loads slowly, while MS DLLs like the C runtime library load quickly, you are seeing the effects of rebasing in Windows.

You can infer more about this topic by reading about object file formats. Here is one example.

Binary file conversion in distributed manner - Spark, Flume or any other option?

In terms of scalability, reliability and future-proofing your solution, I'd look at Apache NiFi rather than Flume. You can start by developing your own ASN.1 Processor or try using the patch thats already available but not part of a released version yet.

CS-Script for notepad++ prepare exe for distribution with icon

I have found this way to get Icon from the embedded resources in the CS-Script compiled binary. Icon file name is alpha.ico.

Please pay attention to the using section.

//css_resource alpha.ico

using System;
using System.Drawing;
using System.Windows.Forms;
using System.Reflection;

class Script
{
    [STAThread]
    static public void Main(string[] args)
    {
        var assembly = Assembly.GetExecutingAssembly();

        using (var stream = assembly.GetManifestResourceStream("alpha.ico"))
        {
            var icon = new Icon(stream);
        }
    }
}

Is There Any Difference Between Executable Binary Files Between Distributions