Are There Stackless or Heapless Implementation of C++

Are there stackless or heapless implementation of C++?

Others have already given good answers about the heap, so I'll leave that alone.

Some implementations (e.g., on IBM mainframes) don't use a stack as most people would think of it, for the simple reason that the hardware doesn't support it. Instead, when you call a function, an activation record (i.e., space for the locals, arguments, and return address) is allocated from (their version of) the heap. These activation records are built into a linked list.

From a purely abstract viewpoint, this is certainly a stack -- it supports last-in, first-out semantics, just like any other stack. You do have to look at it pretty abstractly to call it a stack though. If you showed people a diagram of the memory blocks linked together, I think it's safe to guess most programmers would describe it as a linked list. If you pushed them, I think most would judge it something like "yeah, you can use it in a stack-like manner, but it's still a linked list."

Does C need a stack and a heap in order to run?

No, it does not. Let's cover the heap first, that's easy.

An implementation that does not provide a heap of any sort just needs to return NULL whenever you try to call malloc (or any other memory allocation function). That's perfectly acceptable behaviour according to the standard.

In terms of the stack, it also doesn't need to provide one. ISO C11 mentions the word "stack" exactly zero times.

What an implementation does need to do is simply be a correct "virtual machine" for all the things specified in the standard. Granted that will be very difficult without a stack but it's not impossible. As an extreme case, there's nothing that says you can't simply inline every single function call recursively. That would use rather a large amount of code and function-specific data space, but it's certainly doable.

However, it's probably something that would convince me to move to another architecture, one that did have a stack (and heap, for that matter).

Having said that, even if an architecture provides neither a heap nor a stack, both of those can be built out of basic memory I/O operations. In fact, one of the earliest computers I ever had as a teen sported an RCA 1802 CPU which had no dedicated stack. It didn't even have a call or ret instruction.

Yet it could handle subroutines and a stack quite well (for some definition of the word "well") using its SCRT (standard call and return technique). See here for some more detail on how this thing of beauty (or monstrosity, depending on your viewpoint) worked, along with some other unusual architectures.

The IBM Z (a.k.a. System z, zSeries, whatever they're calling it this week) actually has a heap (of sorts, in that you can allocate memory from the OS) but no stack. It actually implements a linked-list stack by using this heap memory along with certain registers (similar to the RCA chip referenced in the above link), meaning that a function prolog allocates local function memory using STORAGE OBTAIN and the epilog releases it with STORAGE RELEASE.

Needless to say that puts quite a bit of extra code into the prolog and epilog for each function.

How does a stackless language work?

The modern operating systems we have (Windows, Linux) operate with what I call the "big stack model". And that model is wrong, sometimes, and motivates the need for "stackless" languages.

The "big stack model" assumes that a compiled program will allocate "stack frames" for function calls in a contiguous region of memory, using machine instructions to adjust registers containing the stack pointer (and optional stack frame pointer) very rapidly. This leads to fast function call/return, at the price of having a large, contiguous region for the stack. Because 99.99% of all programs run under these modern OSes work well with the big stack model, the compilers, loaders, and even the OS "know" about this stack area.

One common problem all such applications have is, "how big should my stack be?". With memory being dirt cheap, mostly what happens is that a large chunk is set aside for the stack (MS defaults to 1Mb), and typical application call structure never gets anywhere near to using it up. But if an application does use it all up, it dies with an illegal memory reference ("I'm sorry Dave, I can't do that"), by virtue of reaching off the end of its stack.

Most so-called called "stackless" languages aren't really stackless. They just don't use the contiguous stack provided by these systems. What they do instead is allocate a stack frame from the heap on each function call. The cost per function call goes up somewhat; if functions are typically complex, or the language is interpretive, this additional cost is insignificant. (One can also determine call DAGs in the program call graph and allocate a heap segment to cover the entire DAG; this way you get both heap allocation and the speed of classic big-stack function calls for all calls inside the call DAG).

There are several reasons for using heap allocation for stack frames:

If the program does deep recursion dependent on the specific problem it is solving,
it is very hard to preallocate a "big stack" area in advance because the needed size isn't known. One can awkwardly arrange function calls to check to see if there's enough stack left, and if not, reallocate a bigger chunk, copy the old stack and readjust all the pointers into the stack; that's so awkward that I don't know of any implementations.
Allocating stack frames means the application never has to say its sorry until there's
literally no allocatable memory left.
The program forks subtasks. Each subtask requires its own stack, and therefore can't use the one "big stack" provided. So, one needs to allocate stacks for each subtask. If you have thousands of possible subtasks, you might now need thousands of "big stacks", and the memory demand suddenly gets ridiculous. Allocating stack frames solves this problem. Often the subtask "stacks" refer back to the parent tasks to implement lexical scoping; as subtasks fork, a tree of "substacks" is created called a "cactus stack".
Your language has continuations. These require that the data in lexical scope visible to the current function somehow be preserved for later reuse. This can be implemented by copying parent stack frames, climbing up the cactus stack, and proceeding.

The PARLANSE programming language I implemented does 1) and 2). I'm working on 3). It is amusing to note that PARLANSE allocates stack frames from a very fast-access heap-per-thread; it costs typically 4 machine instructions. The current implementation is x86 based, and the allocated frame is placed in the x86 EBP/ESP register much like other conventional x86 based language implementations. So it does use the hardware "contiguous stack" (including pushing and poppping) just in chunks. It also generates "frame local" subroutine calls the don't switch stacks for lots of generated utility code where the stack demand is known in advance.

Why did C never implement stack extension?

A new object may be returned through many layers of software. So the wasted space may be that from dozens or even hundreds of function calls.

Consider also a routine that performs some iterative task. In each iteration, it gets some newly allocated object from a subroutine, which it inserts into a linked list or other data structure. Such iterative tasks may repeat for hundreds, thousands, or millions of iterations. The stack will overflow with wasted space.

Is there a Variable allocated on the Heap?

You seem to have understood. There is a float on the heap* and a pointer on the stack*. The disagreement is 'just' a naming convention for how you refer to the float.

Some people talk of things-that-are-pointed-to in terms of the-thing-that-does-the-pointing. I am inclined to agree with you: this is potential confusing, and can add complexity.

However in the interest of fairness: keep in mind different people have different motivations for the way they use language. If you never want to deal with pointers and they are just a way of having a variable persist outside of its scope then seeing (*a) as the variable and remembering it obeys slightly different rules is not completely without merit.

[*] Modulo grammar/standards nazisim.

Why is there no heap overhead for the base reference?

You get heap overhead when you store something on the heap. The two complex values are stored on the heap, so they get overhead. The array of references is also stored on the heap, so it gets overhead.

However, the reference to the array is not stored on the heap. Usually, this reference will be either placed on the stack as a local variable, or that stack storage might be optimized away by using a CPU register instead. In either case, the reference itself is just a local pointer variable which does not have heap allocation overhead itself.

Are There Stackless or Heapless Implementation of C++