Std::String Length() and Size() Member Functions

std::string length() and size() member functions

As per the documentation, these are just synonyms. size() is there to be consistent with other STL containers (like vector, map, etc.) and length() is to be consistent with most peoples' intuitive notion of character strings. People usually talk about a word, sentence or paragraph's length, not its size, so length() is there to make things more readable.

std::string::length() vs. std::string::size()

Nope! No difference. The more natural name for this function is length() and size() is provided alongside for uniformity with containers.

Is this how the size() function really works in std::string?

An std::string can contain null bytes, so your sizeOfString() function will produce a different result on the following input:

std::string evil("abc\0def", 7);

As for your other question: the size() method simply reads out an internal size field, so it is always constant time, while yours is linear in the size of the string.

You can peek at the implementation of std::string::size for various implementations for yourself: libc++, MSVC, libstdc++.

Assigning length of a string to an integer: C++

std::string::size() returns an unsigned integer type (std::size_t). Changing the type of i is easy enough, but introduces another problem. With i being type std::size_t the loop condition is broken; by definition an unsigned type is always going to be >= 0

There are alternatives, the most direct being to modify the entire for-loop to:

  1. Declare i as std::size_t
  2. Move the decrement as a post-decrement in the condition check
  3. Remove the increment step entirely.

The result looks like this:

for (std::size_t i = text.size(); i-- > 0;)
{
// use text[i] here
}

This will enumerate within the loop body from (text.size()-1) through 0 inclusively, and break the loop thereafter. It will work even if the string is empty.

Note that such hijinks are a sign of a larger problem: using the wrong construct in the first place. std::string provides reverse iterators for walking the string content backward:

for (auto it = text.rbegin(); it != text.rend(); ++it)
{
// use *it to access each character in reverse order
}

Of these two methods, the second is preferred.

compile-time variable-length objects based on string

Adding a CTAD to UsbDescrString should be enough

template<size_t N>
struct UsbDescrString final: UsbDescrStd {
char str[N * 2];

constexpr UsbDescrString(const char (&s)[N+1]) noexcept
: UsbDescrStd{sizeof(*this), UsbDescriptorType::USB_DESCR_STRING}
, str {}
{
for(size_t i = 0; i < N; ++i)
str[i * 2] = s[i];
}
};

template<size_t N>
UsbDescrString(const char (&)[N]) -> UsbDescrString<N-1>;

Note that in order to prevent array to pointer decay, const char (&) needs to be used as the constructor parameter.

Demo

"String argument to template" is explicitly prohibited as of C++20,
cutting that solution off as well.

However, thanks to P0732, with the help of some helper classes such as basic_fixed_string, now in C++20 you can

template<fixed_string>
struct UsbDescrString final: UsbDescrStd;

constexpr UsbDescrString<"Descr str"> uds9;

Determine the print-size of a string containing escape characters

You can use a regular expression to search for ANSI terminal escape sequences since they have a unique pattern. Incidentally, there is a C function called isprint(x) to check for printable characters.

Combining these two, you should be able to create a function that can count printable characters in a string. (Assuming that the terminal in question supports
the ANSI escape codes/sequences, of course.)

// The following only works with C++11 or above

// ...
#include <algorithm>
#include <string>
#include <cctype>
#include <regex>

// The regular expression is brought outside the function in order to avoid compiling it multiple times during each call to 'count_no_escape'
std::regex ansi_reg("\033((\\[((\\d+;)*\\d+)?[A-DHJKMRcf-ilmnprsu])|\\(|\\))");

std::string::iterator::difference_type count_no_escape(std::string const& str) {
std::string::iterator::difference_type result = 0;
std::for_each(std::sregex_token_iterator(str.begin(), str.end(), ansi_reg, -1),
std::sregex_token_iterator(), [&result](std::sregex_token_iterator::value_type const& e) {
std::string tmp(e);
result += std::count_if(tmp.begin(), tmp.end(), isprint);
});
return result;
}

Small Note: The regex for checking ANSI escape sequences was built using this webpage as reference:

The above function tokenizes the string using the ANSI escape codes as delimiter. After extracting all potential substrings, the printable characters are counted in each of them and the sum total result is returned.

Now you can use it like this:

// ...
std::cout << count_no_escape("\033[1;31mabcd\t\n\033[7h") << std::endl; // 4
// ...

If you'd like to try it for yourself, here you go:

Live example



Related Topics



Leave a reply



Submit