How Does the Import Library Work - Details

How does the Import Library work? Details?

Linking to a DLL file can occur implicitly at ~~compile~~ link time, or explicitly at run time. Either way, the DLL ends up loaded into the processes memory space, and all of its exported entry points are available to the application.

If used explicitly at run time, you use LoadLibrary() and GetProcAddress() to manually load the DLL and get pointers to the functions you need to call.

If linked implicitly when the program is built, then stubs for each DLL export used by the program get linked in to the program from an import library, and those stubs get updated as the EXE and the DLL are loaded when the process launches. (Yes, I've simplified more than a little here...)

Those stubs need to come from somewhere, and in the Microsoft tool chain they come from a special form of .LIB file called an import library. The required .LIB is usually built at the same time as the DLL, and contains a stub for each function exported from the DLL.

Confusingly, a static version of the same library would also be shipped as a .LIB file. There is no trivial way to tell them apart, except that LIBs that are import libraries for DLLs will usually be smaller (often much smaller) than the matching static LIB would be.

If you use the GCC toolchain, incidentally, you don't actually need import libraries to match your DLLs. The version of the Gnu linker ported to Windows understands DLLs directly, and can synthesize most any required stubs on the fly.

Update

If you just can't resist knowing where all the nuts and bolts really are and what is really going on, there is always something at MSDN to help. Matt Pietrek's article An In-Depth Look into the Win32 Portable Executable File Format is a very complete overview of the format of the EXE file and how it gets loaded and run. Its even been updated to cover .NET and more since it originally appeared in MSDN Magazine ca. 2002.

Also, it can be helpful to know how to learn exactly what DLLs are used by a program. The tool for that is Dependency Walker, aka depends.exe. A version of it is included with Visual Studio, but the latest version is available from its author at http://www.dependencywalker.com/. It can identify all of the DLLs that were specified at link time (both early load and delay load) and it can also run the program and watch for any additional DLLs it loads at run time.

Update 2

I've reworded some of the earlier text to clarify it on re-reading, and to use the terms of art implicit and explicit linking for consistency with MSDN.

So, we have three ways that library functions might be made available to be used by a program. The obvious follow up question is then: "How to I choose which way?"

Static linking is how the bulk of the program itself is linked. All of your object files are listed, and get collected together in to the EXE file by the linker. Along the way, the linker takes care of minor chores like fixing up references to global symbols so that your modules can call each other's functions. Libraries can also be statically linked. The object files that make up the library are collected together by a librarian in a .LIB file which the linker searches for modules containing symbols that are needed. One effect of static linking is that only those modules from the library that are used by the program are linked to it; other modules are ignored. For instance, the traditional C math library includes many trigonometry functions. But if you link against it and use cos(), you don't end up with a copy of the code for sin() or tan() unless you also called those functions. For large libraries with a rich set of features, this selective inclusion of modules is important. On many platforms such as embedded systems, the total size of code available for use in the library can be large compared to the space available to store an executable in the device. Without selective inclusion, it would be harder to manage the details of building programs for those platforms.

However, having a copy of the same library in every program running creates a burden on a system that normally runs lots of processes. With the right kind of virtual memory system, pages of memory that have identical content need only exist once in the system, but can be used by many processes. This creates a benefit for increasing the chances that the pages containing code are likely to be identical to some page in as many other running processes as possible. But, if programs statically link to the runtime library, then each has a different mix of functions each laid out in that processes memory map at different locations, and there aren't many sharable code pages unless it is a program that all by itself is run in more than process. So the idea of a DLL gained another, major, advantage.

A DLL for a library contains all of its functions, ready for use by any client program. If many programs load that DLL, they can all share its code pages. Everybody wins. (Well, until you update a DLL with new version, but that isn't part of this story. Google DLL Hell for that side of the tale.)

So the first big choice to make when planning a new project is between dynamic and static linkage. With static linkage, you have fewer files to install, and you are immune from third parties updating a DLL you use. However, your program is larger, and it isn't quite as good citizen of the Windows ecosystem. With dynamic linkage, you have more files to install, you might have issues with a third party updating a DLL you use, but you are generally being friendlier to other processes on the system.

A big advantage of a DLL is that it can be loaded and used without recompiling or even relinking the main program. This can allow a third party library provider (think Microsoft and the C runtime, for example) to fix a bug in their library and distribute it. Once an end user installs the updated DLL, they immediately get the benefit of that bug fix in all programs that use that DLL. (Unless it breaks things. See DLL Hell.)

The other advantage comes from the distinction between implicit and explicit loading. If you go to the extra effort of explicit loading, then the DLL might not even have existed when the program was written and published. This allows for extension mechanisms that can discover and load plugins, for instance.

Do import libraries work across dll versions?

Yes, an import library will work with different versions of the DLL. Of course, you won't be able to use it to call functions that exist in the DLL but are not defined in the import library (e.g. functions added in a newer version of the DLL).

Note that I am assuming that different versions of the DLL don't have modified function names and/or ordinals (whichever is used by the import library) or modified function signatures. In other words, I assume that the developer of the DLL is following the well accepted good practise for maintaining compatibility between DLL versions.

How do import libraries work and why doesn't MinGW need them?

It is possible to link to DLL without an import library as MinGW clearly demonstrates. Hence the question is why MSVC decided to omit this feature.

The reasons are primarily historic.

Back then in 1983 when Windows came around and DLLs were designed there were many toolchains (compilers, linkers) from different vendors. Going out asking the vendors to implement support for linking "DLLs" for a minority OS was clearly not an option.

So they made a decision to write a tool to generate a library everyone and their dog could link against even if a linker had absolutely no idea about DLLs.

Besides, import libraries offer some features that were vital 3 decades ago but are next to obsolete now. First is the ability to import a symbol by ordinal — i.e. DLL has an option to offer no names at all only a list of addresses; the ordinal is an index in this list. Made sence when the amount of RAM was severely limited.

Second is the support for different name mangling schemes. Even in C there is a name mangling scheme, for instance FooBar may become _FooBar@4 (it depends on the platform and the calling convention). It made perfect sense for a DLL to export "FooBar" on every supported platform for consistency (and it makes the life of GetProcAddress() user easier). Import library implements the mapping from _FooBar@4 to FooBar.

This is based on the blog (1, 2) of Raimond Chen, the man who was involved in Windows development from the very beginning.

Import Library benefit

As it is stated in this answer import library (a.k.a. stub library) is useful when one wants to link executable dynamically but do not want to mess with LoadLibrary and GetProcAddress functions.

Do import library files contain information that can break my application on updates?

Is under your responsability to deploy a new dll backward compatible to the old one. If you have a void foo (bool) and the new dll implements void foo(bool,int), your application continue to call foo(TRUE). This means you have to leave void foo(bool) implemented and create a new void foo(bool,int) that is called from void foo(bool) with a default/secure value for int parameter.

What is usefulness of .lib file generated while compiling the dll projects, can i use it for static linking?

The .lib generated along with a .dll is called "import library", and it allows you to use the dll functions including their header as if they were statically linked in your executable. It makes sure that, when the linker has to fix up the dll function addresses referenced into object files, it can find them inside the import library. Such functions found into the import library are actually stubs which retrieve the actual address of the corresponding function in the loaded dll from the Import Address Table and jump straight to it (traditionally; now there is some smartness in the linker that allows to avoid this double jump).

The import library, in turn, contains special instructions for the linker that instruct it to generate the relevant entries into the import table of the executable, which in turn is read at load time by the loader (the "dynamic linker", in Unix terms). This makes sure that, before the entry point of your executable is called, the referenced dlls are loaded and the IAT contains the correct addresses of the referenced functions.

Notice that all of this is mostly just convenience stuff to allow you to call dll functions as if they were statically linked into your executable. You don't strictly need the .lib file if you handle the dynamic load/function address retrieval explicitly (using LoadLibrary and GetProcAddress); it's just more convenient to delegate all this stuff to the linker and the loader.

Why does MSVC need an import library (.lib) for a .dll when MinGW doesn't?

According to http://www.mingw.org/wiki/sampleDLL, MinGW can guess the information that's contained in a .lib (DLL name, exported entries' names and ABI) from the corresponding .h file and the DLL itself.

According to http://www.mingw.org/wiki/CreateImportLibraries, this works "for all DLLs created with MinGW and also a few others".

In cases when it can't guess correctly, you still need to provide a .lib file. The latter link has instructions on how to generate one by hand if you haven't got a pristine one.

The former link refers to ld docs for a more in-depth description. Specifically, it's at the ld and WIN32 (cygwin/mingw) node, "direct linking to a dll" section. Among other things, it outlines cases when a .lib is necessary:

until recently, direct linking didn't work for exported data entries
if a .lib contains pure static objects (e.g. cygwinX.dll)
if exported entries do not conform to mangling rules (e.g. Win32 libraries)

Can an import library contain both stubs and static code at the same time?

There's no technical reason that an import library can't contain statically linked entry points.

You'd want to check whether this works properly, but one way that might get you there is to do a postprocess step on the import library to add the static linked objects to it.

This page includes the following notes:

You can use LIB to perform the following library-management tasks:

To add objects to a library, specify the file name for the existing library and the filenames for the new objects.

Provided this operation doesn't remove the DLL import information, it should allow you to create such a library. I'm at work right now, on a Mac, so I don't have access to VS on my Windows system at home to test this for sure.

As to how the linker knows the name of the DLL involved, that's embedded in the import library, and from there gets embedded into the final EXE.

How Does the Import Library Work - Details