Difference Between 'Paste', 'Str_C', 'Str_Join', 'Stri_Join', 'Stri_C', 'Stri_Paste'

Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?

  • stri_join, stri_c, and stri_paste come from package stringi and are pure aliases

  • str_c comes from stringr and is just stringi::stri_join with a parameter ignore_null hardcoded to TRUE while stringi::stri_join has it set to FALSE by default. stringr::str_join is a deprecated alias for str_c

see:

library(stringi)
identical(stri_join, stri_c)
# [1] TRUE
identical(stri_join, stri_paste)
# [1] TRUE

library(stringr)
str_c
# function (..., sep = "", collapse = NULL)
# {
# stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE)
# }
# <environment: namespace:stringr>

stri_join is very similar to base::paste with a few differences enumerated below:


1. sep = "" by default

So it behaves more like paste0 by default, but paste0 lost its sep argument.

identical(paste0("a","b")        , stri_join("a","b"))
# [1] TRUE
identical(paste("a","b") , stri_join("a","b",sep=" "))
# [1] TRUE
identical(paste("a","b", sep="-"), stri_join("a","b", sep="-"))
# [1] TRUE

str_c will behave just like stri_join here.


2. Behavior with NA

if you paste to NA using stri_join, the result is NA, while paste converts NA to "NA"

paste0(c("a","b"),c("c",NA))
# [1] "ac" "bNA"
stri_join(c("a","b"),c("c",NA))
# [1] "ac" NA

str_c will behave just like stri_join here as well


3. Behavior with length 0 arguments

When a length 0 value is encountered, character(0) is returned, except if ignore_null is set to FALSE, then the value is ignored. It is different from the behavior of paste which would convert the length 0 value to "" and thus contain 2 consecutive separators in the output.

stri_join("a",NULL, "b")  
# [1] character(0)
stri_join("a",character(0), "b")
# [1] character(0)

paste0("a",NULL, "b")
# [1] "ab"
stri_join("a",NULL, "b", ignore_null = TRUE)
# [1] "ab"
str_c("a",NULL, "b")
# [1] "ab"

paste("a",NULL, "b") # produces double space!
# [1] "a b"
stri_join("a",NULL, "b", ignore_null = TRUE, sep = " ")
# [1] "a b"
str_c("a",NULL, "b", sep = " ")
# [1] "a b"

4. stri_join warns more

paste(c("a","b"),c("c","d","e"))
# [1] "a c" "b d" "a e"
paste("a","b", sep = c(" ","-"))
# [1] "a b"

stri_join(c("a","b"),c("c","d","e"), sep = " ")
# [1] "a c" "b d" "a e"
# Warning message:
# In stri_join(c("a", "b"), c("c", "d", "e"), sep = " ") :
# longer object length is not a multiple of shorter object length
stri_join("a","b", sep = c(" ","-"))
# [1] "a b"
# Warning message:
# In stri_join("a", "b", sep = c(" ", "-")) :
# argument `sep` should be one character string; taking the first one

5. stri_join is faster

microbenchmark::microbenchmark(
stringi = stri_join(rep("a",1000000),rep("b",1000),"c",sep=" "),
base = paste(rep("a",1000000),rep("b",1000),"c")
)

# Unit: milliseconds
# expr min lq mean median uq max neval cld
# stringi 88.54199 93.4477 97.31161 95.17157 96.8879 131.9737 100 a
# base 166.01024 169.7189 178.31065 171.30910 176.3055 215.5982 100 b

What is the difference between paste/paste0 and str_c?

paste0(..., collapse = NULL)is a wrapper for paste(..., sep = "", collapse = NULL), which means there is no separator. In other words, with paste0() you can not apply some sort of separator, while you do have that option with paste(), whereas a single space is the default.

str_c(..., sep = "", collapse = NULL) is equivalent to paste(), which means you do have the option to customize your desired separator. The difference is for str_c() the default is no separator, so it acts just like paste0() as a default.

Paste() and paste0() are both functions from the base package, whereas str_c() comes from the stringr package.

I did not test/microbenchmark it, but from my experience I do agree to Ryan str_c() is generally faster.

How to find whether to single insert or delete from a string would make two strings equal?

Since I can't add a comment I will post here the answer.

def single_insert_or_delete(s1,s2):
s1,s2 = s1.lower(), s2.lower()
count = 0
if s1 == s2:
return 0
elif len(s1) == len(s2):
return 2
elif len(s1) - len(s2) == -1:
if s1 == s2[:-1]:
return 1
else:
for i in range(len(s2)):
if s1 == s2[:i] + s2[i+1:]:
return 1
else:
return 2
elif len(s1) - len(s2) == 1:
if s1[:-1] == s2 or s1[1:] == s2:
return 1
else:
for i in range(len(s1)):
if s2 == s1[:i] + s1[i+1:]:
return 1
else:
return 2

else:
return 2

What is the difference between char stringA[LEN] and char* stringB[LEN] in C

Are they any different?

Yes.

Both variables stringA and stringB are arrays. stringA is an array of char of size LEN and stringB is an array of char * of size LEN.

char and char * are two different types. stringA can hold only one character string of length LEN while elements of stingB can point to LEN number of strings.

Or does stringB again becomes immutable as in the case before?

Whether strings pointed by elements of stringB is mutable or not will depend on how memory is allocated. If they are initialized with string literals

char* stringB[LEN] = { "Apple", "Bapple", "Capple"};  

then they are immutable. In case of

for(int i = 0; i < LEN; i++)
stringB[i] = malloc(30) // Allocating 30 bytes for each element

strcpy(stringB[0], "Apple");
strcpy(stringB[1], "Bapple");
strcpy(stringB[2], "Capple");

they are mutable.

Joining loop output integers as a string?

The problem is that print ends with a newline, that you don't want

  • Either change the end. Note that you don't a "".join(str(c)), just use c

    for i in text:
    c = int(i, base=8)
    print(c, end=" ")
  • Or save them in a loop, and print them all at once

    result = []
    for i in text:
    result.append(str(int(i, base=8)))
    print(" ".join(result))

Note that you neither need the strip nor the if/else and can combine all that in a list comprehension

dataset = "120"
result = [str(int(i, base=8)) for i in dataset.split()]
print(" ".join(result))

Handling zero length character vectors as empty strings

You don't need to map over tweets, str_extract_all can handle vectors

library(stringr)
str_extract_all(tweets, mention_rx)

#[[1]]
#character(0)

#[[2]]
#character(0)

#[[3]]
#[1] "@you"

#[[4]]
#[1] "@you" "@me"

#[[5]]
#[1] "@bla" "@me" "@you"

Now if you need one comma-separated string then you can use map

purrr::map_chr(str_extract_all(tweets, mention_rx), toString)
#[1] "" "" "@you" "@you, @me" "@bla, @me, @you"

To answer the "why" questions, we can look at the documentation of paste and str_c functions.

From ?paste

Vector arguments are recycled as needed, with zero-length arguments being recycled to "".

From ?str_c

Zero length arguments are removed.

Hence, by default str_c removes zero-length arguments which makes the output a 0-length string which fails for map_chr but it works with map as map returns a list

map(tweets, ~str_c(str_extract_all(.x, mention_rx)[[1]], collapse = ", "))

#[[1]]
#character(0)

#[[2]]
#character(0)

#[[3]]
#[1] "@you"

#[[4]]
#[1] "@you, @me"

#[[5]]
#[1] "@bla, @me, @you"

How can I avoid complex for loops?

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)

setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))

#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.



Related Topics



Leave a reply



Submit