What C++ Pitfalls Should I Avoid

What C++ pitfalls should I avoid?

A short list might be:

  • Avoid memory leaks through use shared pointers to manage memory allocation and cleanup
  • Use the Resource Acquisition Is Initialization (RAII) idiom to manage resource cleanup - especially in the presence of exceptions
  • Avoid calling virtual functions in constructors
  • Employ minimalist coding techniques where possible - for example, declaring variables only when needed, scoping variables, and early-out design where possible.
  • Truly understand the exception handling in your code - both with regard to exceptions you throw, as well as ones thrown by classes you may be using indirectly. This is especially important in the presence of templates.

RAII, shared pointers and minimalist coding are of course not specific to C++, but they help avoid problems that do frequently crop up when developing in the language.

Some excellent books on this subject are:

  • Effective C++ - Scott Meyers
  • More Effective C++ - Scott Meyers
  • C++ Coding Standards - Sutter & Alexandrescu
  • C++ FAQs - Cline

Reading these books has helped me more than anything else to avoid the kind of pitfalls you are asking about.

What are all the common undefined behaviours that a C++ programmer should know about?

Pointer

  • Dereferencing a NULL pointer
  • Dereferencing a pointer returned by a "new" allocation of size zero
  • Using pointers to objects whose lifetime has ended (for instance, stack allocated objects or deleted objects)
  • Dereferencing a pointer that has not yet been definitely initialized
  • Performing pointer arithmetic that yields a result outside the boundaries (either above or below) of an array.
  • Dereferencing the pointer at a location beyond the end of an array.
  • Converting pointers to objects of incompatible types
  • Using memcpy to copy overlapping buffers.

Buffer overflows

  • Reading or writing to an object or array at an offset that is negative, or beyond the size of that object (stack/heap overflow)

Integer Overflows

  • Signed integer overflow
  • Evaluating an expression that is not mathematically defined
  • Left-shifting values by a negative amount (right shifts by negative amounts are implementation defined)
  • Shifting values by an amount greater than or equal to the number of bits in the number (e.g. int64_t i = 1; i <<= 72 is undefined)

Types, Cast and Const

  • Casting a numeric value into a value that can't be represented by the target type (either directly or via static_cast)
  • Using an automatic variable before it has been definitely assigned (e.g., int i; i++; cout << i;)
  • Using the value of any object of type other than volatile or sig_atomic_t at the receipt of a signal
  • Attempting to modify a string literal or any other const object during its lifetime
  • Concatenating a narrow with a wide string literal during preprocessing

Function and Template

  • Not returning a value from a value-returning function (directly or by flowing off from a try-block)
  • Multiple different definitions for the same entity (class, template, enumeration, inline function, static member function, etc.)
  • Infinite recursion in the instantiation of templates
  • Calling a function using different parameters or linkage to the parameters and linkage that the function is defined as using.

OOP

  • Cascading destructions of objects with static storage duration
  • The result of assigning to partially overlapping objects
  • Recursively re-entering a function during the initialization of its static objects
  • Making virtual function calls to pure virtual functions of an object from its constructor or destructor
  • Referring to nonstatic members of objects that have not been constructed or have already been destructed

Source file and Preprocessing

  • A non-empty source file that doesn't end with a newline, or ends with a backslash (prior to C++11)
  • A backslash followed by a character that is not part of the specified escape codes in a character or string constant (this is implementation-defined in C++11).
  • Exceeding implementation limits (number of nested blocks, number of functions in a program, available stack space ...)
  • Preprocessor numeric values that can't be represented by a long int
  • Preprocessing directive on the left side of a function-like macro definition
  • Dynamically generating the defined token in a #if expression

To be classified

  • Calling exit during the destruction of a program with static storage duration

Best practices to avoid problems with pointers

Things that can go wrong when pointers are misused:

  1. Memory leaks - You allocate a pointer in a method and then let it go out of scope without properly deallocating it. The pointer to the memory on the heap is now lost, but the memory remains allocated. Freeing this memory is now extremely difficult. More info from Wikipedia.

  2. Access violations - You create a pointer that points at a memory address that you do not have access to, or that does not exist. Pointers are just integers afterall, and can be manipulated like any other number. When you attempt to dereference your invalid pointer, your program will halt. More info from Wikipedia.

  3. Null pointer errors - This is a special case of an access violation. The proper way to "park" a pointer, so that it doesn't point at anything in particular, is to set its value to zero or null. Attempting to dereference a null pointer will halt your program. More info from Wikipedia.

  4. Buffer overflows - You allocate a pointer for a character buffer of 30 characters. You then proceed to stream user input (from a socket, file, console, etc.) into this buffer. If you fail to properly implement buffer bounding checks, then your program could potentially put more than 30 characters into the buffer. This will corrupt any data stored adjacent to the buffer in memory and possibly expose you to a malicious code attack. More info from Wikipedia.

  5. Memory corruption - A pointer is just an integer that contains the memory address of something it points to. As an integer, pointer arithmetic can be used to manipulate the pointer's value in all sorts of interesting ways. Subtle bugs can develop if the pointer calculations go wrong. The pointer will now point to some unknown location in memory, and anything could happen when it is dereferenced.

  6. Null-terminated string problems - These bugs occur when string library functions that expect null-terminated strings are fed character pointers that are not null terminated. The string library functions will continue to process characters, one at a time, until a null is found -- wherever that may be. A joke best illustrates this bug.

jQuery pitfalls to avoid

Being unaware of the performance hit and overusing selectors instead of assigning them to local variables. For example:-

$('#button').click(function() {
$('#label').method();
$('#label').method2();
$('#label').css('background-color', 'red');
});

Rather than:-

$('#button').click(function() {
var $label = $('#label');
$label.method();
$label.method2();
$label.css('background-color', 'red');
});

Or even better with chaining:-

$('#button').click(function() {
$("#label").method().method2().css("background-color", "red");
});

I found this the enlightening moment when I realized how the call stacks work.

Edit: incorporated suggestions in comments.

Modifying C++ DLL to support unicode - common pitfalls to avoid?

First read this article by Joel Spolsky: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Then run through these links on Stack Overflow: What do I need to know about Unicode?

Generally, you are looking for any code that assumes one character = one byte (memory/buffer allocation, etc). But the links above will give you a pretty good rundown of the details.

How to implement an STL-style iterator and avoid common pitfalls?

https://cplusplus.com/reference/iterator/ has a handy chart that details the specs of § 24.2.2 of the C++11 standard. Basically, the iterators have tags that describe the valid operations, and the tags have a hierarchy. Below is purely symbolic, these classes don't actually exist as such.

iterator {
iterator(const iterator&);
~iterator();
iterator& operator=(const iterator&);
iterator& operator++(); //prefix increment
reference operator*() const;
friend void swap(iterator& lhs, iterator& rhs); //C++11 I think
};

input_iterator : public virtual iterator {
iterator operator++(int); //postfix increment
value_type operator*() const;
pointer operator->() const;
friend bool operator==(const iterator&, const iterator&);
friend bool operator!=(const iterator&, const iterator&);
};
//once an input iterator has been dereferenced, it is
//undefined to dereference one before that.

output_iterator : public virtual iterator {
reference operator*() const;
iterator operator++(int); //postfix increment
};
//dereferences may only be on the left side of an assignment
//once an output iterator has been dereferenced, it is
//undefined to dereference one before that.

forward_iterator : input_iterator, output_iterator {
forward_iterator();
};
//multiple passes allowed

bidirectional_iterator : forward_iterator {
iterator& operator--(); //prefix decrement
iterator operator--(int); //postfix decrement
};

random_access_iterator : bidirectional_iterator {
friend bool operator<(const iterator&, const iterator&);
friend bool operator>(const iterator&, const iterator&);
friend bool operator<=(const iterator&, const iterator&);
friend bool operator>=(const iterator&, const iterator&);

iterator& operator+=(size_type);
friend iterator operator+(const iterator&, size_type);
friend iterator operator+(size_type, const iterator&);
iterator& operator-=(size_type);
friend iterator operator-(const iterator&, size_type);
friend difference_type operator-(iterator, iterator);

reference operator[](size_type) const;
};

contiguous_iterator : random_access_iterator { //C++17
}; //elements are stored contiguously in memory.

You can either specialize std::iterator_traits<youriterator>, or put the same typedefs in the iterator itself, or inherit from std::iterator (which has these typedefs). I prefer the second option, to avoid changing things in the std namespace, and for readability, but most people inherit from std::iterator.

struct std::iterator_traits<youriterator> {        
typedef ???? difference_type; //almost always ptrdiff_t
typedef ???? value_type; //almost always T
typedef ???? reference; //almost always T& or const T&
typedef ???? pointer; //almost always T* or const T*
typedef ???? iterator_category; //usually std::forward_iterator_tag or similar
};

Note the iterator_category should be one of std::input_iterator_tag, std::output_iterator_tag, std::forward_iterator_tag, std::bidirectional_iterator_tag, or std::random_access_iterator_tag, depending on which requirements your iterator satisfies. Depending on your iterator, you may choose to specialize std::next, std::prev, std::advance, and std::distance as well, but this is rarely needed. In extremely rare cases you may wish to specialize std::begin and std::end.

Your container should probably also have a const_iterator, which is a (possibly mutable) iterator to constant data that is similar to your iterator except it should be implicitly constructable from a iterator and users should be unable to modify the data. It is common for its internal pointer to be a pointer to non-constant data, and have iterator inherit from const_iterator so as to minimize code duplication.

My post at Writing your own STL Container has a more complete container/iterator prototype.

Disadvantages of scanf

The problems with scanf are (at a minimum):

  • using %s to get a string from the user, which leads to the possibility that the string may be longer than your buffer, causing overflow.
  • the possibility of a failed scan leaving your file pointer in an indeterminate location.

I very much prefer using fgets to read whole lines in so that you can limit the amount of data read. If you've got a 1K buffer, and you read a line into it with fgets you can tell if the line was too long by the fact there's no terminating newline character (last line of a file without a newline notwithstanding).

Then you can complain to the user, or allocate more space for the rest of the line (continuously if necessary until you have enough space). In either case, there's no risk of buffer overflow.

Once you've read the line in, you know that you're positioned at the next line so there's no problem there. You can then sscanf your string to your heart's content without having to save and restore the file pointer for re-reading.

Here's a snippet of code which I frequently use to ensure no buffer overflow when asking the user for information.

It could be easily adjusted to use a file other than standard input if necessary and you could also have it allocate its own buffer (and keep increasing it until it's big enough) before giving that back to the caller (although the caller would then be responsible for freeing it, of course).

#include <stdio.h>
#include <string.h>

#define OK 0
#define NO_INPUT 1
#define TOO_LONG 2
#define SMALL_BUFF 3
static int getLine (char *prmpt, char *buff, size_t sz) {
int ch, extra;

// Size zero or one cannot store enough, so don't even
// try - we need space for at least newline and terminator.

if (sz < 2)
return SMALL_BUFF;

// Output prompt.

if (prmpt != NULL) {
printf ("%s", prmpt);
fflush (stdout);
}

// Get line with buffer overrun protection.

if (fgets (buff, sz, stdin) == NULL)
return NO_INPUT;

// Catch possibility of `\0` in the input stream.

size_t len = strlen(buff);
if (len < 1)
return NO_INPUT;

// If it was too long, there'll be no newline. In that case, we flush
// to end of line so that excess doesn't affect the next call.

if (buff[len - 1] != '\n') {
extra = 0;
while (((ch = getchar()) != '\n') && (ch != EOF))
extra = 1;
return (extra == 1) ? TOO_LONG : OK;
}

// Otherwise remove newline and give string back to caller.
buff[len - 1] = '\0';
return OK;
}

And, a test driver for it:

// Test program for getLine().

int main (void) {
int rc;
char buff[10];

rc = getLine ("Enter string> ", buff, sizeof(buff));
if (rc == NO_INPUT) {
// Extra NL since my system doesn't output that on EOF.
printf ("\nNo input\n");
return 1;
}

if (rc == TOO_LONG) {
printf ("Input too long [%s]\n", buff);
return 1;
}

printf ("OK [%s]\n", buff);

return 0;
}

Finally, a test run to show it in action:

$ printf "\0" | ./tstprg     # Singular NUL in input stream.
Enter string>
No input

$ ./tstprg < /dev/null # EOF in input stream.
Enter string>
No input

$ ./tstprg # A one-character string.
Enter string> a
OK [a]

$ ./tstprg # Longer string but still able to fit.
Enter string> hello
OK [hello]

$ ./tstprg # Too long for buffer.
Enter string> hello there
Input too long [hello the]

$ ./tstprg # Test limit of buffer.
Enter string> 123456789
OK [123456789]

$ ./tstprg # Test just over limit.
Enter string> 1234567890
Input too long [123456789]


Related Topics



Leave a reply



Submit