String-interning at compiletime for profiling
Identical literal strings are not guaranty to be identical, but you can build type from it which can compare identical (without comparing string), something like:
// Sequence of char
template <char...Cs> struct char_sequence
{
template <char C> using push_back = char_sequence<Cs..., C>;
};
// Remove all chars from char_sequence from '\0'
template <typename, char...> struct strip_sequence;
template <char...Cs>
struct strip_sequence<char_sequence<>, Cs...>
{
using type = char_sequence<Cs...>;
};
template <char...Cs, char...Cs2>
struct strip_sequence<char_sequence<'\0', Cs...>, Cs2...>
{
using type = char_sequence<Cs2...>;
};
template <char...Cs, char C, char...Cs2>
struct strip_sequence<char_sequence<C, Cs...>, Cs2...>
{
using type = typename strip_sequence<char_sequence<Cs...>, Cs2..., C>::type;
};
// struct to create a aligned char array
template <typename chars> struct static_string;
template <char...Cs>
struct static_string<char_sequence<Cs...>>
{
static constexpr char str[sizeof...(Cs)] = {Cs...};
};
template <char...Cs>
constexpr
char static_string<char_sequence<Cs...>>::str[sizeof...(Cs)];
// helper to get the i_th character (`\0` for out of bound)
template <std::size_t I, std::size_t N>
constexpr char at(const char (&a)[N]) { return I < N ? a[I] : '\0'; }
// helper to check if the c-string will not be truncated
template <std::size_t max_size, std::size_t N>
constexpr bool check_size(const char (&)[N])
{
static_assert(N <= max_size, "string too long");
return N <= max_size;
}
// Helper macros to build char_sequence from c-string
#define PUSH_BACK_8(S, I) \
::push_back<at<(I) + 0>(S)>::push_back<at<(I) + 1>(S)> \
::push_back<at<(I) + 2>(S)>::push_back<at<(I) + 3>(S)> \
::push_back<at<(I) + 4>(S)>::push_back<at<(I) + 5>(S)> \
::push_back<at<(I) + 6>(S)>::push_back<at<(I) + 7>(S)>
#define PUSH_BACK_32(S, I) \
PUSH_BACK_8(S, (I) + 0) PUSH_BACK_8(S, (I) + 8) \
PUSH_BACK_8(S, (I) + 16) PUSH_BACK_8(S, (I) + 24)
#define PUSH_BACK_128(S, I) \
PUSH_BACK_32(S, (I) + 0) PUSH_BACK_32(S, (I) + 32) \
PUSH_BACK_32(S, (I) + 64) PUSH_BACK_32(S, (I) + 96)
// Macro to create char_sequence from c-string (limited to 128 chars)
#define MAKE_CHAR_SEQUENCE(S) \
strip_sequence<char_sequence<> \
PUSH_BACK_128(S, 0) \
>::type::template push_back<check_size<128>(S) ? '\0' : '\0'>
// Macro to return an static c-string
#define MAKE_STRING(S) \
aligned_string<MAKE_CHAR_SEQUENCE(S)>::str
So
MEASURE_SCOPE(MAKE_STRING("text_rendering_code"));
would still return same pointer than you can compare directly.
You can modify your Macro MEASURE_SCOPE
to include directly MAKE_STRING
.
gcc has an extension to simplify MAKE_STRING
:
template <typename CHAR, CHAR... cs>
const char* operator ""_c() { return static_string<cs...>{}::str; }
and then
MEASURE_SCOPE("text_rendering_code"_c);
When to use intern() on String literals
This is a technique to ensure that CONSTANT
is not actually a constant.
When the Java compiler sees a reference to a final static primitive or String, it inserts the actual value of that constant into the class that uses it. If you then change the constant value in the defining class but don't recompile the using class, it will continue to use the old value.
By calling intern() on the "constant" string, it is no longer considered a static constant by the compiler, so the using class will actually access the defining class' member on each use.
JLS citations:
definition of a compile-time constant: http://docs.oracle.com/javase/specs/jls/se6/html/expressions.html#5313
implication of changes to a compile-time constant (about halfway down the page): http://docs.oracle.com/javase/specs/jls/se6/html/binaryComp.html#45139
Is it good practice to use java.lang.String.intern()?
When would I use this function in favor to String.equals()
when you need speed since you can compare strings by reference (== is faster than equals)
Are there side effects not mentioned in the Javadoc?
The primary disadvantage is that you have to remember to make sure that you actually do intern() all of the strings that you're going to compare. It's easy to forget to intern() all strings and then you can get confusingly incorrect results. Also, for everyone's sake, please be sure to very clearly document that you're relying on the strings being internalized.
The second disadvantage if you decide to internalize strings is that the intern() method is relatively expensive. It has to manage the pool of unique strings so it does a fair bit of work (even if the string has already been internalized). So, be careful in your code design so that you e.g., intern() all appropriate strings on input so you don't have to worry about it anymore.
(from JGuru)
Third disadvantage (Java 7 or less only): interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.
(from Michael Borgwardt)
Prepend to static string at compile time
Arguments are not constexpr
, you have to turn them in type:
gcc/clang have an extension to allow to build UDL from literal string:
// That template uses the extension
template<typename Char, Char... Cs>
constexpr auto operator"" _cs() -> std::integer_sequence<Char, Cs...> {
return {};
}
See my answer from String-interning at compiletime for profiling to have MAKE_STRING
macro if you cannot used the extension (Really more verbose, and hard coded limit for accepted string length).
Then
template<char ... Cs>
static constexpr auto
create_hint_for_opt_cls(std::integer_sequence<char, Cs...>) {
return hint_for_opt_cls_holder<Cs...>::value;
}
constexpr static auto name = "foobar"_cs;
When is it beneficial to flyweight Strings in Java?
Don't use String.intern() in your code. At least not if you might get 20 or more different strings. In my experience using String.intern
slows down the whole application when you have a few millions strings.
To avoid duplicated String
objects, just use a HashMap
.
private final Map<String, String> pool = new HashMap<String, String>();
private void interned(String s) {
String interned = pool.get(s);
if (interned != null) {
return interned;
pool.put(s, s);
return s;
}
private void readFile(CsvFile csvFile) {
for (List<String> row : csvFile) {
for (int i = 0; i < row.size(); i++) {
row.set(i, interned(row.get(i)));
// further process the row
}
}
pool.clear(); // allow the garbage collector to clean up
}
With that code you can avoid duplicate strings for one CSV file. If you need to avoid them on a larger scale, call pool.clear()
in another place.
String interning in .Net Framework - What are the benefits and when to use interning
Interning is an internal implementation detail. Unlike boxing, I do not think there is any benefit in knowing more than what you have read in Richter's book.
Micro-optimisation benefits of interning strings manually are minimal hence is generally not recommended.
This probably describes it:
class Program
{
const string SomeString = "Some String"; // gets interned
static void Main(string[] args)
{
var s1 = SomeString; // use interned string
var s2 = SomeString; // use interned string
var s = "String";
var s3 = "Some " + s; // no interning
Console.WriteLine(s1 == s2); // uses interning comparison
Console.WriteLine(s1 == s3); // do NOT use interning comparison
}
}
How can I avoid string.intern() contention and keep the memory footprint low?
Add an extra indirection step: Have a second HashMap that keeps the keys, and look up the keys there first before inserting them in the in-memory structures. This will give you much more flexibility than String#intern().
However, if you need to parse that 200MB XML file on every tomcat startup, and the extra 10 seconds make people grumble (are they restarting tomcat every so often?) - that makes flags pop up (have you considered using a database, even Apache Derby, to keep the parsed data?).
store a string in a constexpr struct
You might do:
template<typename Char, Char... Cs>
struct CharSeq
{
static constexpr const Char s[] = {Cs..., 0}; // The unique address
};
// That template uses the extension
template<typename Char, Char... Cs>
constexpr CharSeq<Char, Cs...> operator"" _cs() {
return {};
}
See my answer from String-interning at compiletime for profiling to have MAKE_STRING
macro if you cannot used the extension (Really more verbose, and hard coded limit for accepted string length).
Then
struct A
{
template <char ... Cs>
constexpr A(CharSeq<char, Cs...>) : m_name(CharSeq<char, Cs...>::s) {}
constexpr auto name(){ return m_name; }
std::string_view m_name;
};
With only valid usages similar to:
A a = {"Hello"_cs};
constexpr A b = {"World"_cs};
Related Topics
What's the Best Hashing Algorithm to Use on a Stl String When Using Hash_Map
Vector.Erase(Iterator) Causes Bad Memory Access
How to Convert a Time into Epoch Time
Symbol Not Found When Using Template Defined in a Library
How to Log Stack Frames with Windows X64
C++11 Cross Compiler/Standard Library Random Distribution Reproducibility
Any C/C++ Refactoring Tool Based on Libclang? (Even Simplest "Toy Example" )
Locking Strategies and Techniques for Preventing Deadlocks in Code
Why Is a C++ Reference Considered Safer Than a Pointer
Different Precision in C++ and Fortran
Opencv Imshow Not Displaying Image in Osx
How Does Std::Vector Support Contiguous Memory for Custom Objects of Unknown Size