What Are the Differences Between Concatenating Strings with Cat() and Paste()

What are the differences between concatenating strings with cat() and paste()?

cat and paste are to be used in very different situations.


paste is not print

When you paste something and don't assign it to anything, it becomes a character variable that is print-ed using print.default, the default method for character, hence the quotes, etc. You can look at the help for print.default for understanding how to modify what the output looks like.

  • print.default will not evaluate escape characters such as \n within a character string.

Look at the answers to this question for how to capture the output from cat.


Quoting from the easy to read help for cat (?cat)

Concatenate and Print


Description


Outputs the objects, concatenating the representations. cat performs
much less conversion than print.

...

Details


cat is useful for producing output in user-defined functions. It
converts its arguments to character vectors, concatenates them to a
single character vector, appends the given sep= string(s) to each
element and then outputs them.

Value


None (invisible NULL).

cat will not return anything, it will just output to the console or another connection.

Thus, if you try to run length(cat('x')) or mode(cat('x')), you are running mode(NULL) or length(NULL), which will return NULL.


The help for paste is equally helpful and descriptive

Concatenate Strings


Description


Concatenate vectors after converting to character.

....

Value


A character vector of the concatenated values. This will be of length
zero if all the objects are, unless collapse is non-NULL in which case
it is a single empty string.

Concatenating and Pasting within R console with cat() and paste()

Just move the calculation inside

cat("It is", paste0( p*100, "%"), "accurate")

paste0 is shorthand for sep = "".

The glue library is also very nice for this:

library(glue)

cat("It is", glue("{p*100}%"), "accurate")

What is the difference between cat and print?

cat is valid only for atomic types (logical, integer, real, complex, character) and names. It means you cannot call cat on a non-empty list or any type of object. In practice it simply converts arguments to characters and concatenates so you can think of something like as.character() %>% paste().

print is a generic function so you can define a specific implementation for a certain S3 class.

> foo <- "foo"
> print(foo)
[1] "foo"
> attributes(foo)$class <- "foo"
> print(foo)
[1] "foo"
attr(,"class")
[1] "foo"
> print.foo <- function(x) print("This is foo")
> print(foo)
[1] "This is foo"

Another difference between cat and print is returned value. cat invisibly returns NULL while print returns its argument. This property of print makes it particularly useful when combined with pipes:

coefs <- lm(Sepal.Width ~  Petal.Length, iris) %>%
print() %>%
coefficients()

Most of the time what you want is print. cat can useful for things like writing a string to file:

sink("foobar.txt")
cat('"foo"\n')
cat('"bar"')
sink()

As pointed by baptiste you can use cat to redirect output directly to file. So equivalent of the above would be something like this:

cat('"foo"', '"bar"', file="foobar.txt", sep="\n")

If you want to write lines incrementally you should use append argument:

cat('"foo"', file="foobar.txt", append=TRUE)
cat('"bar"', file="foobar.txt", append=TRUE)

Compared to sink approach it is far to verbose for my taste, but it is still an option.

Difference between Concatenation and Append

"Concatenate" joins two specific items together, whereas "append" adds what you specify to whatever may already be there.

Concatenating strings with multiple separators using paste() in R

  1. paste0 Perhaps the easiest way is to specify the + signs as arguments with paste0 rather than using sep:

    root <- "https://www.google.com/search?q="
    reprex_df %>%
    mutate(new_col = paste0(root, var1, "+", var2, "+", var3))
  2. sprintf sprintf is another possibility:

    fmt <- "https://www.google.com/search?q=%d+%d+%d"
    reprex_df %>%
    mutate(new_col = sprintf(fmt, var1, var2, var3))
  3. sub Yet another possibility is to use the code in the question but follow it with code to remove the first +:

    root <- "https://www.google.com/search?q="
    reprex_df %>%
    mutate(new_col = paste(root, var1, var2, var3, sep="+"),
    new_col = sub("\\+", "", new_col))
  4. allow extra + Google ignores the + after the equal sign so another approach is to just allow the extra plus to exist.

    root <- "https://www.google.com/search?q="
    reprex_df %>%
    mutate(new_col = paste(root, var1, var2, var3, sep="+"))

Concatenating two string variables in r

Following works:

> apply(ddf,1 ,function(x) paste0(toString(x[2]), toString(x[3])))
[1] "1983M01" "1983M02" "1983M03" "1983M04"
>
> apply(ddf,1 ,function(x) paste(toString(x[2]), toString(x[3])))
[1] "1983 M01" "1983 M02" "1983 M03" "1983 M04"

toString(ddf$year) binds entire column in one string:

> toString(ddf$year)
[1] "1983, 1983, 1983, 1983"
>
> toString(ddf$period)
[1] "M01, M02, M03, M04"
>
> paste(toString(ddf$year), toString(ddf$period))
[1] "1983, 1983, 1983, 1983 M01, M02, M03, M04"

Which is the preferred way to concatenate a string in Python?

The best way of appending a string to a string variable is to use + or +=. This is because it's readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit module:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is one
million appends of a one-character string, first to a string, then to a list:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

OK, turns out that even when the resulting string is a million characters long, appending was still faster.

Now let's try with appending a thousand character long string a hundred thousand times:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn't include the final a.join(). So how long would that take?

a.join(a):
0.43739795684814453

Oups. Turns out even in this case, append/join is slower.

So where does this recommendation come from? Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren't, what would you have a string that's 100MB in memory?)

But the real clincher is Python 2.3. Where I won't even show you the timings, because it's so slow that it hasn't finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.

Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn't anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. :-)

(Update: Turns out when I did the testing more carefully that using + and += is faster for two strings on Python 2.3 as well. The recommendation to use ''.join() must be a misunderstanding)

However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don't use a technique that's supposed "faster" unless you first measure it.

Therefore the "best" version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.

So why do I use a lot of append/join in my code? Because sometimes it's actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.

Concatenate a vector of strings/character

Try using an empty collapse argument within the paste function:

paste(sdata, collapse = '')

Thanks to http://twitter.com/onelinetips/status/7491806343

What is the most efficient string concatenation method in Python?

If you know all components beforehand once, use the literal string interpolation, also known as f-strings or formatted strings, introduced in Python 3.6.

Given the test case from mkoistinen's answer, having strings

domain = 'some_really_long_example.com'
lang = 'en'
path = 'some/really/long/path/'

The contenders and their execution time on my computer using Python 3.6 on Linux as timed by IPython and the timeit module are

  • f'http://{domain}/{lang}/{path}' - 0.151 µs

  • 'http://%s/%s/%s' % (domain, lang, path) - 0.321 µs

  • 'http://' + domain + '/' + lang + '/' + path - 0.356 µs

  • ''.join(('http://', domain, '/', lang, '/', path)) - 0.249 µs (notice that building a constant-length tuple is slightly faster than building a constant-length list).

Thus the shortest and the most beautiful code possible is also fastest.


The speed can be contrasted with the fastest method for Python 2, which is + concatenation on my computer; and that takes 0.203 µs with 8-bit strings, and 0.259 µs if the strings are all Unicode.

(In alpha versions of Python 3.6 the implementation of f'' strings was the slowest possible - actually the generated byte code is pretty much equivalent to the ''.join() case with unnecessary calls to str.__format__ which without arguments would just return self unchanged. These inefficiencies were addressed before 3.6 final.)



Related Topics



Leave a reply



Submit