How Is Llvm Isa<> Implemented

llvm::BasicBlock::isLandingPad not behaving as expected

LLVM itself is (at least on my system) compiled with assertions disabled, so the assertion doesn't trigger. When you inline it in your code, you are compiling with assertions enabled, so it does trigger.

Note that since isa<...> is a template, it will be compiled into the compilation unit it is instantiated as part of. In this case, there's at least two: one in LLVM and one that comprises your program. Strictly speaking they should both be identical (the "one definition rule") or you have UB anyway. The practical upshot in a case like this one is that calls to isa<...>() from either compilation unit might end up calling the version instantiated in the other one. However, it's likely that in the case of isa<...>() the calls are being inlined, i.e. you end up with a version of isa<...>() specific to each compilation unit that instantiates it.

Why does ExpectedT in LLVM implement two constructors for ExpectedT&&?

Because that constructor is conditionally explicit according to the proposal. This means that the constructor is explicit only if some condition is met (here, convertibility of T and OtherT).

C++ does not have a mechanism for this functionality (something as explicit(condition)) before C++20. Implementations thus need to use some other mechanism, such as a definition of two different constructors — one explicit and another one converting — and ensure the selection of the proper constructor according to the condition. This is typically done via SFINAE with the help of std::enable_if, where the condition is resolved.

Since C++20, there should be a conditional version of the explicit specifier. The implementation then would be much easier with a single definition:

template <class OtherT>
explicit(!std::is_convertible_v<OtherT, T>)
Expected(Expected<OtherT> &&Other)
{
   moveConstruct(std::move(Other));
}

What is the purpose of method classof in clang?

classof is indeed part of the solution, it is however not normally meant for direct use.

Instead you should be using the isa<>, cast<> and dyn_cast<> templates. An example from the LLVM Programmer's Manual:

static bool isLoopInvariant(const Value *V, const Loop *L) {
  if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
    return true;

  // Otherwise, it must be an instruction...
  return !L->contains(cast<Instruction>(V)->getParent());
}

The difference between cast<> and dyn_cast<> is that the former asserts if the instance cannot be cast whereas the former merely returns a null pointer.

Note: cast<> and dyn_cast<> do not allow for a null argument, you can use the cast_or_null<> and dyn_cast_or_null<> if the argument may be null.

For further insights into the design of polymorphism without virtual methods, see How is LLVM isa<> implemented, you will note it uses classof behind the scenes.

What is going on with __vector_base_common?

This is an implementation trick so that the library can be either used only as headers, or have a precompiled part.

Some of vector's member functions do not depend on the template arguments at all; specifically, the helper functions that throw exceptions. It is therefore possible (unlike the parts that depend on the template parameters) to compile them once and put them in a shared library. This is what happens on MacOS, for example.

On the other hand, on platforms where the library is not distributed with the OS, it is more convenient for the user if he doesn't have to distribute the shared library, but can instead use the library as header-only, i.e. include <vector> and be done with it, without having to add flags to the linker invocation in the build.

This means that you need the code for these functions to be available in the headers, but if you use the shared library variant, it should not actually get compiled when using the header.

The trick presented here is a way to achieve that. First, the implementation is put into a template, so that it may live in the header without generating multiple definition errors. The template in question only has a dummy parameter; the important thing is that it is a template, not that it has any particular parameters. This is a common technique used by header-only libraries.

Now you can use the library header-only. But if you want to use the shared library variant, you actually need to compile the code ahead of time and suppress code generation for the library user. Explicit template instantiation can be used for that.

So you put an extern template declaration in the header:

extern template class _LIBCPP_EXTERN_TEMPLATE_TYPE_VIS __vector_base_common<true>;

So the header now contains an explicit specialization declaration, suppressing code generation for the template members.

And then you take a source file, put in the explicit instantiation, and compile it to a shared library.

template class _LIBCPP_CLASS_TEMPLATE_INSTANTIATION_VIS __vector_base_common<true>;

Now you have the shared library usage covered, but you destroyed the ability to use the library header-only. To get that back, you need to make the extern template declaration optional, depending on the usage mode of the library. So you wrap the declaration in a macro whose definition depends on the mode:

_LIBCPP_EXTERN_TEMPLATE(class _LIBCPP_EXTERN_TEMPLATE_TYPE_VIS __vector_base_common<true>)

This macro is conditionally defined:

#ifdef _LIBCPP_DISABLE_EXTERN_TEMPLATE
#define _LIBCPP_EXTERN_TEMPLATE(...)
#endif

#ifndef _LIBCPP_EXTERN_TEMPLATE
#define _LIBCPP_EXTERN_TEMPLATE(...) extern template __VA_ARGS__;
#endif

So if you're in header-only mode (_LIBCPP_DISABLE_EXTERN_TEMPLATE is defined), the declaration vanishes. If you're in shared library mode, the declaration is there, preventing code generation.

The reason vector<bool> derives privately from __vector_base_common is because it doesn't have any derived classes itself that need access to the throw helpers. vector<T> derives from __vector_base<T>, and __vector_base<T> in turn derives from __vector_base_common; so for vector<T> to have access to the __vector_base_common members, __vector_base<T> must derive from __vector_base_common as protected.

How can I implement a string data type in LLVM?

What is a string? An array of characters.

What is a character? An integer.

So while I'm no LLVM expert by any means, I would guess that if, eg, you wanted to represent some 8-bit character set, you'd use an array of i8 (8-bit integers), or a pointer to i8. And indeed, if we have a simple hello world C program:

#include <stdio.h>

int main() {
        puts("Hello, world!");
        return 0;
}

And we compile it using llvm-gcc and dump the generated LLVM assembly:

$ llvm-gcc -S -emit-llvm hello.c
$ cat hello.s
; ModuleID = 'hello.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
target triple = "x86_64-linux-gnu"
@.str = internal constant [14 x i8] c"Hello, world!\00"         ; <[14 x i8]*> [#uses=1]

define i32 @main() {
entry:
        %retval = alloca i32            ; <i32*> [#uses=2]
        %tmp = alloca i32               ; <i32*> [#uses=2]
        %"alloca point" = bitcast i32 0 to i32          ; <i32> [#uses=0]
        %tmp1 = getelementptr [14 x i8]* @.str, i32 0, i64 0            ; <i8*> [#uses=1]
        %tmp2 = call i32 @puts( i8* %tmp1 ) nounwind            ; <i32> [#uses=0]
        store i32 0, i32* %tmp, align 4
        %tmp3 = load i32* %tmp, align 4         ; <i32> [#uses=1]
        store i32 %tmp3, i32* %retval, align 4
        br label %return

return:         ; preds = %entry
        %retval4 = load i32* %retval            ; <i32> [#uses=1]
        ret i32 %retval4
}

declare i32 @puts(i8*)

Notice the reference to the puts function declared at the end of the file. In C, puts is

int puts(const char *s)

In LLVM, it is

i32 @puts(i8*)

The correspondence should be clear.

As an aside, the generated LLVM is very verbose here because I compiled without optimizations. If you turn those on, the unnecessary instructions disappear:

$ llvm-gcc -O2 -S -emit-llvm hello.c
$ cat hello.s 
; ModuleID = 'hello.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
target triple = "x86_64-linux-gnu"
@.str = internal constant [14 x i8] c"Hello, world!\00"         ; <[14 x i8]*> [#uses=1]

define i32 @main() nounwind  {
entry:
        %tmp2 = tail call i32 @puts( i8* getelementptr ([14 x i8]* @.str, i32 0, i64 0) ) nounwind              ; <i32> [#uses=0]
        ret i32 0
}

declare i32 @puts(i8*)

How do I check if an object's type is a particular subclass in C++?

You really shouldn't. If your program needs to know what class an object is, that usually indicates a design flaw. See if you can get the behavior you want using virtual functions. Also, more information about what you are trying to do would help.

I am assuming you have a situation like this:

class Base;
class A : public Base {...};
class B : public Base {...};

void foo(Base *p)
{
  if(/* p is A */) /* do X */
  else /* do Y */
}

If this is what you have, then try to do something like this:

class Base
{
  virtual void bar() = 0;
};

class A : public Base
{
  void bar() {/* do X */}
};

class B : public Base
{
  void bar() {/* do Y */}
};

void foo(Base *p)
{
  p->bar();
}

Edit: Since the debate about this answer still goes on after so many years, I thought I should throw in some references. If you have a pointer or reference to a base class, and your code needs to know the derived class of the object, then it violates Liskov substitution principle. Uncle Bob calls this an "anathema to Object Oriented Design".