What Does It Mean to "Poison a Function" in C++

What does it mean to poison a function in C++?

In general it refers to making a function unusable, e.g. if you want to ban the use of dynamic allocation in a program you could "poison" the malloc function so it can't be used.

In the video he's using it in a more specific way, which is clear if you read the slide that is displayed when he talks about poisoning the function, which says "A way to force compile-time only?"

So he is talking about "poisoning" the function to make it uncallable at run-time, so it's only callable in constant expressions. The technique is to have a branch in the function which is never taken when called in a compile-time context, and to make that branch contain something that will cause an error.

A throw expression is allowed in a constexpr function, as long as it is never reached during compile-time invocations of the function (because you can't throw an exception at compile-time, it's an inherently dynamic operation, like allocating memory). So a throw expression that refers to an undefined symbol will not be used during compile-time invocations (because that would fail to compile) and cannot be used at run-time, because the undefined symbol causes a linker error.

Because the undefined symbol is not "odr-used" in the compile-time invocations of the function, in practice the compiler will not create a reference to the symbol, so it's OK that it's undefined.

Is that useful? He's demonstrating how to do it, not necessarily saying it's a good idea or widely useful. If you have a need to do it for some reason then his technique might solve your problem. If you don't have a need for it, you don't need to worry about it.

One reason it might be useful is when the compile-time version of some operation is not as efficient as it could be. There are restrictions on the kind of expressions allowed in a constexpr function (especially in C++11, some restrictions were removed in C++14). So you might have two versions of a function for performing a calculation, one that is optimal, but uses expressions that aren't allowed in a constexpr function, and one that is a valid constexpr function, but would perform poorly if called at run-time. You could poison the sub-optimal one to ensure it is never used for run-time calls, ensuring the more efficient (non-constexpr) version is used for run-time calls.

N.B. The performance of a constexpr function used at compile-time is not really important, because it has no run-time overhead anyway. It might slow down your compilation by making the compiler do extra work, but it won't have any run-time performance cost.

How to poison an identifier in VC++?

MSVC++ has two ways to do this. To get the GCC version you'd use #pragma deprecated. That produces warning C4995, you can turn that into an error with /WX.

That however poisons any identifier with the name you specified, it isn't selective enough to prevent warnings on C++ members that happen to have the same identifier name. You couldn't use it to deprecate a specific function overload for example. Solved by the second way, __declspec(deprecated).

In general you'd prefer the latter to avoid accidental matches. But do beware that it has a chicken-and-egg problem, you can only deprecate a function that the compiler knows about. Forcing you to, say, #include a header that you don't want to use at all.

How to demonstrate type poisoning?

Poisoning is implemented by checking the return value of std::thread::panicking within a Drop implementation. If it returns true, then the value should be poisoned. Here's an example:

use std::cell::Cell;
use std::panic::{self, AssertUnwindSafe};
use std::thread;

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
enum ResourceState {
    Available,
    Locked,
    Poisoned,
}

struct Resource {
    state: Cell<ResourceState>,
}

struct ResourceGuard<'a> {
    resource: &'a Resource,
}

impl Resource {
    fn new() -> Resource {
        Resource {
            state: Cell::new(ResourceState::Available),
        }
    }

    fn lock(&self) -> ResourceGuard {
        assert_eq!(self.state.get(), ResourceState::Available);
        self.state.set(ResourceState::Locked);
        ResourceGuard {
            resource: self,
        }
    }
}

impl<'a> Drop for ResourceGuard<'a> {
    fn drop(&mut self) {
        self.resource.state.set(
            if thread::panicking() {
                ResourceState::Poisoned
            } else {
                ResourceState::Available
            });
    }
}

fn main() {
    let resource = Resource::new();
    println!("state: {:?}", resource.state.get()); // Available

    {
        println!("acquiring lock");
        let _guard = resource.lock();
        println!("state: {:?}", resource.state.get()); // Locked
        println!("dropping lock");
    }

    println!("state: {:?}", resource.state.get()); // Available

    let _ = panic::catch_unwind(AssertUnwindSafe(|| {
        println!("acquiring lock");
        let _guard = resource.lock();
        println!("state: {:?}", resource.state.get()); // Locked
        println!("panicking!");
        panic!("panicking!");
    }));

    println!("state: {:?}", resource.state.get()); // Poisoned
}

Why does optimisation kill this function?

This code violates the strict aliasing rules which makes it illegal to access an object through a pointer of a different type, although access through a *char ** is allowed. The compiler is allowed to assume that pointers of different types do not point to the same memory and optimize accordingly. It also means the code invokes undefined behavior and could really do anything.

One of the best references for this topic is Understanding Strict Aliasing and we can see the first example is in a similar vein to the OP's code:

uint32_t swap_words( uint32_t arg )
{
  uint16_t* const sp = (uint16_t*)&arg;
  uint16_t        hi = sp[0];
  uint16_t        lo = sp[1];

  sp[1] = hi;
  sp[0] = lo;

 return (arg);
}

The article explains this code violates strict aliasing rules since sp is an alias of arg but they have different types and says that although it will compile, it is likely arg will be unchanged after swap_words returns. Although with simple tests, I am unable to reproduce that result with either the code above nor the OPs code but that does not mean anything since this is undefined behavior and therefore not predictable.

The article goes on to talk about many different cases and presents several working solution including type-punning through a union, which is well-defined in C99¹ and may be undefined in C++ but in practice is supported by most major compilers, for example here is gcc's reference on type-punning. The previous thread Purpose of Unions in C and C++ goes into the gory details. Although there are many threads on this topic, this seems to do the best job.

The code for that solution is as follows:

typedef union
{
  uint32_t u32;
  uint16_t u16[2];
} U32;

uint32_t swap_words( uint32_t arg )
{
  U32      in;
  uint16_t lo;
  uint16_t hi;

  in.u32    = arg;
  hi        = in.u16[0];
  lo        = in.u16[1];
  in.u16[0] = lo;
  in.u16[1] = hi;

  return (in.u32);
}

For reference the relevant section from the C99 draft standard on strict aliasing is 6.5 Expressions paragraph 7 which says:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:⁷⁶⁾

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the
object,

— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or

— a character type.

and footnote 76 says:

The intent of this list is to specify those circumstances in which an object may or may not be aliased.

and the relevant section from the C++ draft standard is 3.10 Lvalues and rvalues paragraph 10

The article Type-punning and strict-aliasing gives a gentler but less complete introduction to the topic and C99 revisited gives a deep analysis of C99 and aliasing and is not light reading. This answer to Accessing inactive union member - undefined? goes over the muddy details of type-punning through a union in C++ and is not light reading either.

Footnotes:

Quoting comment by Pascal Cuoq: [...]C99 that was initially clumsily worded, appearing to make type-punning through unions undefined. In reality, type-punning though unions is legal in C89, legal in C11, and it was legal in C99 all along although it took until 2004 for the committee to fix incorrect wording, and the subsequent release of TC3. open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm

What does (void) mean in c++?

What you see there is really just a "trick" to fake variable/parameter usage.

Without those lines, a pedantic compiler will warn you about the variables not being used.

Using a construct (void)variablename; will result in no instructions being generated, but the compiler will consider it a valid "use" of those variables.

compiler way to stop using certain C system calls

#define sprintf COMPILE_TIME_ERROR
#define COMPILE_TIME_ERROR switch(0){case 0:case 0:;}

int main(void) {
 char hi[50];
 sprintf(hi,"hi");
 return 0;
}

Compiler output will be something like:

prog.c: In function ‘main’:
prog.c:6: error: duplicate case value
prog.c:6: error: previously used here

What Does It Mean to "Poison a Function" in C++