What Is the Purpose of Anonymous { } Blocks in C Style Languages

What is the purpose of anonymous { } blocks in C style languages?

It limits the scope of variables to the block inside the { }.

Anonymous code blocks in c

It is an Statement Expression.
It as an compiler extension supported by GCC and it is not Standard C++,hence it is non portable.

If you compile your code with the -pedantic flag it will tell you so.

This answer of mine talks about it in detail.

Why do we need blocks, function literals, closures in programming languages?

First of all, in order to answer the question in the title:

Why do we need blocks, function literals, closures in programming languages?

Answer #1: we don't. Brainfuck is Turing-complete without them.

Answer #2: because we're lazy and closures are convenient and easy to use.

What specific problem do they solve?

They solve no specific problem. If you are comfortable with Objective-C, you have surely heard of function pointers, right? Now every time a block is used somewhere, that piece of code could be transformed to an equivalent snippet using a function pointer. The great achievement that closures bring to the programmer is readability. Closures (blocks, lambda functions, etc.) can be (and are) used where they are created, unlike "normal" global functions. Consider the following two pieces of code (two imaginary methods invented with regards to the Cocoa networking APIs):

void url_callback(void *data, size_t length)
{
NSLog(@"Received data %p of size %zu", data, length);
}

[connection sendAsyncRequestWithHandlerFPtr:&url_callback];

versus

[connection sendAsyncRequestWithHandlerLambda:^(void *data, size_t length) {
NSLog(@"Received data %p of size %zu", data, length);
}];

Of course, in the second one it is obvious to whoever reads the code what it does. In the first one, you have to scroll up and down to get to the implementation of the function (if any!) just so you can understand what happens when some data is received.

Are blocks, closures, function literals, named functions, anonymous functions - one and the same thing?

No, they aren't. (Quite.)

Closures and anonymous functions are a mathematical and/or CS theory concept - they descibe subroutines which are first-class values.

Blocks are a particular implementation of closures, as realized by Apple in an extension to the C (and consequentially to the Objective-C) programming language.

Named function expressions are a JavaScript feature that combine the advantages of closures and global functions.

What is the value of an anonymous unattached block in C#?

Scope and garbage collection: When you leave the unattached block, any variables declared in it go out of scope. That lets the garbage collector clean up those objects.

Ray Hayes points out that the .NET garbage collector will not immediately collect the out-of-scope objects, so scoping is the main benefit.

Objective advantages of C-style syntax

IMHO, the only thing that makes C-style syntax popular is that most people know it. And thus, using C-style for a new language makes it possible to apply old habbits (like naming conventions). This is a good thing! Although I think, the syntax is the least concern for learning a new language. But a good syntax can help to avoid errors.

Microsoft did a lot of effort to make VB.NET as important as C# (remember all the "null (Nothing in Visual Basic)" in the MSDN, which quite annoys me), but still C# is the dominant language for the .NET platform. It seems that VB.NET has a problem with the bad reputation of its predecessors. And still, using C-style seems to make a more professional look.

After all, one of C's design goals was to make the compiler easy to implement. It was easier to use a preprocessor than defining a new language construct for constants. Or the semantics of a[5]. This does not apply to C++, which is very difficult to implement, partly because it tries to stay compatible with C.

Examples:

  • Case sensitiveness, although case insensitivity is more natural to humans (NOT to computers). This doesn't mean that you should spell identifiers differently than they have been declared (beware!), but can lead to confusion. Real-world example (Java): getWhiteSpace() or getWhitespace()?

EDIT: Another nice example: What is the worst gotcha in C# or .NET?.
But of course, once you are used to it and with the help of an IDE, this isn't much of a problem anymore, sometimes even more natural because it resembles better how computers actually work.

  • Operator precedence

  • = for assignment and == for comparison. if (a = b) anyone? Similar, the && and &, || and |, (! and ~) are syntactically too close although they mean different things. Personally, I'd prefer and and or, because symbols should just support syntax instead of being the main part.

  • ++ and -- operators; Makes some statements just a little bit shorter, but introduces side effects to expressions (a = b+++b++). Originally, compilers could compile this more efficiently than i = i + 1.

  • for(init;condition;step) loop; Although best practise is to use it to increment a variable only, no explicit syntax exists for this. Instead, this for construct is redundant as it (nearly) the same as

    init;
    while (condition) {
    statement;
    step;
    }
  • switch statement; ever forgotten a break? Why not allowing ranges as case labels, as most other languages do?

  • if(condition) statement. Using parenthesis wasn't a that good choice as it can be used in the condition expression itself:

    if (!(var & 0x02))
  • Preprocessor

  • Braces. This is controversial. I don't agree to arguments that state that these "don't use a lot of screen estate", are terser or are faster to write. First, a language should be designed to be easy to read, not easy to write. Second, depending on your style, braces uses uses the exact same amount of screen space than keywords: You write them in a single line. Isn't that a lot of wasted space?

    Additionally, people criticise LISP for its clutter of parentheses. Never happened to you that you have to count your braces to find out where you have missed one? I sometimes add a comment after the closing brace to indicate what is supposed to end here. BASIC syntax has this already included. And doesn't even need an equivalent to an opening brace.
    Partly I agree that braces are good: They are nearly invisible and indention is the dominant visual characteristic. Seen this way, python is the next step.

  • Semicolons as statement terminator or separator. Why is a single semicolon a valid statement?

    if (condition);
    DoSomething();
  • Indistinguishable sequences of keywords

    public static string main()

    Is it a method declaration? Or a variable declaration? Or a function prototype? Or something else? Some punctuation (and keywords for every declaration type) could have helped here, for instance, to clearly separate the return type. This is what makes C++ hard to parse.

  • Orthogonality. {} while (condition) does fit to the other language constructs where the statement is followed by the block. I think VB's

    do [while/until condition]
    Statements
    loop [while/until condition]

    a nice solution because you have 4 possible combinations with different semantics: until/while after the do/loop keyword.

  • Strange order of variable type modifiers.

    int * const & i [];
  • Type and variable name just appear after each other, no marker that it is a local variable declaration. Scala uses val and var do indicate a declaration of a final/mutable variable and the type is separated by a colon. On most other things, Scala uses Java syntax.

  • Assignment operator returning a value; No distinction between statements (with effects) and expressions (which just return a value)

EDIT: Some more examples here: https://stackoverflow.com/questions/163026/what-is-your-least-favorite-syntax-gotcha

You certainly won't agree on many of these points, and not all of them are necessarily negative (semicolons, for instance), or that I knew a solution that is better for all cases. Even if I would, the resulting language would not be the perfect language. Programming languages will always evolve and newly invented languages hopefully learn from its predecessors. So, why not staying at a known syntax instead of designing a new one from every ten years?

However, when a language designer has the possibility to avoid programming errors which are just typing errors, why not changing it? For instance this was done in C#'s switch statement, which makes break (or goto) mandatory. And once the worst drawbacks have been removed, the benefit that most programmers know the rest of the syntax syntax far outweighs the advantages of redesigning a language from scratch. But I am still astonished why so many programmers still defend C-syntax so eager, although these are used to that progress in computer science requires revision of nearly everything regularly.

To conclude, I think the only reason that C syntax is dominant is because it known to nearly all professional programmers, these a just used to it. The actual syntax is less important, although other languages might have advantages. This is the same reason why electrical engineers use the convention of electric charge as it is.

https://imgs.xkcd.com/comics/urgent_mission.png

(Maybe there will be a comic about a programmer visiting Dennis Ritchie, too: "Please, don't make breaks in switch statements optional!")

Unknown C++ braces syntaxis

They are simply blocks that introduce scope, and hide their contents. Perfectly valid.

Is it practical to create a C language addon for anonymous functions?

I know that C compilers are capable of taking standalone code, and generate standalone shellcode out of it for the specific system they are targeting.

Turning source into machine code is what compilation is. Shellcode is machine code with specific constraints, none of which apply to this use-case. You just want ordinary machine code like compilers generate when they compile functions normally.

AFAICT, what you want is exactly what you get from static foo(int x){ ...; }, and then passing foo as a function pointer. i.e. a block of machine code with a label attached, in the code section of your executable.

Jumping through hoops to get compiler-generated machine code into an array is not even close to worth the portability downsides (esp. in terms of making sure the array is in executable memory).


It seems the only thing you're trying to avoid is having a separately-defined function with its own name. That's an incredibly small benefit that doesn't come close to justifying doing anything like you're suggesting in the question. AFAIK, there's no good way to achieve it in ISO C11, but:

Some compilers support nested functions as a GNU extension:

This compiles (with gcc6.2). On Godbolt, I used -xc to compile it as C, not C++.. It also compiles with ICC17, but not clang3.9.

#include <stdlib.h>

void sort_integers(int *arr, size_t len)
{
int bar(){return 3;} // gcc warning: ISO C forbids nested functions [-Wpedantic]

int cmp(const void *va, const void *vb) {
const int *a=va, *b=vb; // taking const int* args directly gives a warning, which we could silence with a cast
return *a > *b;
}

qsort(arr, len, sizeof(int), cmp);
}

The asm output is:

cmp.2286:
mov eax, DWORD PTR [rsi]
cmp DWORD PTR [rdi], eax
setg al
movzx eax, al
ret
sort_integers:
mov ecx, OFFSET FLAT:cmp.2286
mov edx, 4
jmp qsort

Notice that no definition for bar() was emitted, because it's unused.

Programs with nested functions built without optimization will have executable stacks. (For reasons explained below). So if you use this, make sure you use optimization if you care about security.


BTW, nested functions can even access variable in their parent (like lambas). Changing cmp into a function that does return len results in this highly surprising asm:

__attribute__((noinline)) 
void call_callback(int (*cb)()) {
cb();
}

void foo(int *arr, size_t len) {
int access_parent() { return len; }
call_callback(access_parent);
}

## gcc5.4
access_parent.2450:
mov rax, QWORD PTR [r10]
ret
call_callback:
xor eax, eax
jmp rdi
foo:
sub rsp, 40
mov eax, -17599
mov edx, -17847
lea rdi, [rsp+8]
mov WORD PTR [rsp+8], ax
mov eax, OFFSET FLAT:access_parent.2450
mov QWORD PTR [rsp], rsi
mov QWORD PTR [rdi+8], rsp
mov DWORD PTR [rdi+2], eax
mov WORD PTR [rdi+6], dx
mov DWORD PTR [rdi+16], -1864106167
call call_callback
add rsp, 40
ret

I just figured out what this mess is about while single-stepping it: Those MOV-immediate instructions are writing machine-code for a trampoline function to the stack, and passing that as the actual callback.

gcc must ensure that the ELF metadata in the final binary tells the OS that the process needs an executable stack (note readelf -l shows GNU_STACK with RWE permissions). So nested functions that access outside their scope prevent the whole process from having the security benefits of NX stacks. (With optimization disabled, this still affects programs that use nested functions that don't access stuff from outer scopes, but with optimization enabled gcc realizes that it doesn't need the trampoline.)

The trampoline (from gcc5.2 -O0 on my desktop) is:

   0x00007fffffffd714:  41 bb 80 05 40 00       mov    r11d,0x400580   # address of access_parent.2450
0x00007fffffffd71a: 49 ba 10 d7 ff ff ff 7f 00 00 movabs r10,0x7fffffffd710 # address of `len` in the parent stack frame
0x00007fffffffd724: 49 ff e3 rex.WB jmp r11
# This can't be a normal rel32 jmp, and indirect is the only way to get an absolute near jump in x86-64.

0x00007fffffffd727: 90 nop
0x00007fffffffd728: 00 00 add BYTE PTR [rax],al
...

(trampoline might not be the right terminology for this wrapper function; I'm not sure.)

This finally makes sense, because r10 is normally clobbered without saving by functions. There's no register that foo could set that would be guaranteed to still have that value when the callback is eventually called.

The x86-64 SysV ABI says that r10 is the "static chain pointer", but C/C++ don't use that. (Which is why r10 is treated like r11, as a pure scratch register).

Obviously a nested function that accesses variables in the outer scope can't be called after the outer function returns. e.g. if call_callback held onto the pointer for future use from other callers, you would get bogus results. When the nested function doesn't do that, gcc doesn't do the trampoline thing, so the function works just like a separately-defined function, so it would be a function pointer you could pass around arbitrarily.

What does two consecutive blocks of code {}{} do?

It's is just writing two different blocks of code in order to hide local variables.

From the answer to the question "Anonymous code blocks in Java":

Blocks restrict variable scope.

public void foo()
{
{
int i = 10;
}
System.out.println(i); // Won't compile.
}

In practice, though, if you find yourself using such a code block then
it's probably a sign that you want to refactor that block out to a
method.

Anonymous code blocks

No. Only class and def blocks and modules (and in more recent versions, list comprehensions and generator expressions - not appliable here though) introduce a new scope. And only classes execute directly. So if you want to continue this debatable use of Python, you'll have to stick with abusing class, or define functions and call them directly. Using a different file for each calculation is slightly less ugly at source code level, but propably not worth it if the calculations are always that small.



Related Topics



Leave a reply



Submit