Difference Between 'Paste', 'Str_C', 'Str_Join', 'Stri_Join', 'Stri_C', 'Stri_Paste'

Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?

stri_join, stri_c, and stri_paste come from package stringi and are pure aliases
str_c comes from stringr and is just stringi::stri_join with a parameter ignore_null hardcoded to TRUE while stringi::stri_join has it set to FALSE by default. stringr::str_join is a deprecated alias for str_c

see:

library(stringi)
identical(stri_join, stri_c)
# [1] TRUE
identical(stri_join, stri_paste)
# [1] TRUE

library(stringr)
str_c
# function (..., sep = "", collapse = NULL) 
# {
#   stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE)
# }
# <environment: namespace:stringr>

stri_join is very similar to base::paste with a few differences enumerated below:

1. sep = "" by default

So it behaves more like paste0 by default, but paste0 lost its sep argument.

identical(paste0("a","b")        , stri_join("a","b"))
# [1] TRUE
identical(paste("a","b")         , stri_join("a","b",sep=" "))
# [1] TRUE
identical(paste("a","b", sep="-"), stri_join("a","b", sep="-"))
# [1] TRUE

str_c will behave just like stri_join here.

2. Behavior with NA

if you paste to NA using stri_join, the result is NA, while paste converts NA to "NA"

paste0(c("a","b"),c("c",NA))
# [1] "ac"  "bNA"
stri_join(c("a","b"),c("c",NA))
# [1] "ac" NA

str_c will behave just like stri_join here as well

3. Behavior with length 0 arguments

When a length 0 value is encountered, character(0) is returned, except if ignore_null is set to FALSE, then the value is ignored. It is different from the behavior of paste which would convert the length 0 value to "" and thus contain 2 consecutive separators in the output.

stri_join("a",NULL, "b")  
# [1] character(0)
stri_join("a",character(0), "b")  
# [1] character(0)

paste0("a",NULL, "b")
# [1] "ab"
stri_join("a",NULL, "b", ignore_null = TRUE)
# [1] "ab"
str_c("a",NULL, "b")
# [1] "ab"

paste("a",NULL, "b") # produces double space!
# [1] "a  b" 
stri_join("a",NULL, "b", ignore_null = TRUE, sep = " ")
# [1] "a b"
str_c("a",NULL, "b", sep = " ")
# [1] "a b"

4. stri_join warns more

paste(c("a","b"),c("c","d","e"))
# [1] "a c" "b d" "a e"
paste("a","b", sep = c(" ","-"))
# [1] "a b"

stri_join(c("a","b"),c("c","d","e"), sep = " ")
# [1] "a c" "b d" "a e"
# Warning message:
#   In stri_join(c("a", "b"), c("c", "d", "e"), sep = " ") :
#   longer object length is not a multiple of shorter object length
stri_join("a","b", sep = c(" ","-"))
# [1] "a b"
# Warning message:
#   In stri_join("a", "b", sep = c(" ", "-")) :
#   argument `sep` should be one character string; taking the first one

5. stri_join is faster

microbenchmark::microbenchmark(
  stringi = stri_join(rep("a",1000000),rep("b",1000),"c",sep=" "),
  base    = paste(rep("a",1000000),rep("b",1000),"c")
)

# Unit: milliseconds
#    expr       min       lq      mean    median       uq      max neval cld
# stringi  88.54199  93.4477  97.31161  95.17157  96.8879 131.9737   100  a 
# base    166.01024 169.7189 178.31065 171.30910 176.3055 215.5982   100   b

What is the difference between paste/paste0 and str_c?

paste0(..., collapse = NULL)is a wrapper for paste(..., sep = "", collapse = NULL), which means there is no separator. In other words, with paste0() you can not apply some sort of separator, while you do have that option with paste(), whereas a single space is the default.

str_c(..., sep = "", collapse = NULL) is equivalent to paste(), which means you do have the option to customize your desired separator. The difference is for str_c() the default is no separator, so it acts just like paste0() as a default.

Paste() and paste0() are both functions from the base package, whereas str_c() comes from the stringr package.

I did not test/microbenchmark it, but from my experience I do agree to Ryan str_c() is generally faster.

How to find whether to single insert or delete from a string would make two strings equal?

Since I can't add a comment I will post here the answer.

def single_insert_or_delete(s1,s2):
    s1,s2 = s1.lower(), s2.lower()
    count = 0
    if s1 == s2:
        return 0
    elif len(s1) == len(s2):
        return 2
    elif len(s1) - len(s2) == -1:
        if s1 == s2[:-1]:
            return 1
        else:
            for i in range(len(s2)):
                if s1 == s2[:i] + s2[i+1:]:
                    return 1
            else:
               return 2
    elif len(s1) - len(s2) == 1:
        if s1[:-1] == s2 or s1[1:] == s2:
            return 1
        else:
            for i in range(len(s1)):
                if s2 == s1[:i] + s1[i+1:]:
                    return 1
            else:
                 return 2  

    else:
        return 2

What is the difference between char stringA[LEN] and char* stringB[LEN] in C

Are they any different?

Yes.

Both variables stringA and stringB are arrays. stringA is an array of char of size LEN and stringB is an array of char * of size LEN.

char and char * are two different types. stringA can hold only one character string of length LEN while elements of stingB can point to LEN number of strings.

Or does stringB again becomes immutable as in the case before?

Whether strings pointed by elements of stringB is mutable or not will depend on how memory is allocated. If they are initialized with string literals

char* stringB[LEN] = { "Apple", "Bapple", "Capple"};

then they are immutable. In case of

for(int i = 0; i < LEN; i++)
    stringB[i] = malloc(30)  // Allocating 30 bytes for each element  

strcpy(stringB[0], "Apple");
strcpy(stringB[1], "Bapple");
strcpy(stringB[2], "Capple");

they are mutable.

Joining loop output integers as a string?

The problem is that print ends with a newline, that you don't want

Either change the end. Note that you don't a "".join(str(c)), just use c
```
for i in text:
    c = int(i, base=8)
    print(c, end=" ")
```

Or save them in a loop, and print them all at once

result = []
for i in text:
    result.append(str(int(i, base=8)))
print(" ".join(result))

Note that you neither need the strip nor the if/else and can combine all that in a list comprehension

dataset = "120"
result = [str(int(i, base=8)) for i in dataset.split()]
print(" ".join(result))

Handling zero length character vectors as empty strings

You don't need to map over tweets, str_extract_all can handle vectors

library(stringr)
str_extract_all(tweets, mention_rx)

#[[1]]
#character(0)

#[[2]]
#character(0)

#[[3]]
#[1] "@you"

#[[4]]
#[1] "@you" "@me" 

#[[5]]
#[1] "@bla" "@me"  "@you"

Now if you need one comma-separated string then you can use map

purrr::map_chr(str_extract_all(tweets, mention_rx), toString)
#[1] ""    ""      "@you"     "@you, @me"      "@bla, @me, @you"

To answer the "why" questions, we can look at the documentation of paste and str_c functions.

From ?paste

Vector arguments are recycled as needed, with zero-length arguments being recycled to "".

From ?str_c

Zero length arguments are removed.

Hence, by default str_c removes zero-length arguments which makes the output a 0-length string which fails for map_chr but it works with map as map returns a list

map(tweets, ~str_c(str_extract_all(.x, mention_rx)[[1]], collapse = ", "))

#[[1]]
#character(0)

#[[2]]
#character(0)

#[[3]]
 #[1] "@you"

#[[4]]
#[1] "@you, @me"

#[[5]]
#[1] "@bla, @me, @you"

How can I avoid complex for loops?

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)

setNames(lapply(files, function(x) 
          x %>%
            arrange(desc(flow)) %>%
            mutate(area_portion = Area/sum(Area)*100, 
                   flow_portion = flow/sum(flow) * 100, 
                   cum_area = cumsum(area_portion),
                   cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))

#$A_sorted
#  Area flow area_portion flow_portion  cum_area  cum_flow
#1    4    1     26.66667     33.33333  26.66667  33.33333
#2    6    1     40.00000     33.33333  66.66667  66.66667
#3    5    1     33.33333     33.33333 100.00000 100.00000

#$B_sorted
#  Area flow area_portion flow_portion  cum_area cum_flow
#1    8    2     44.44444           50  44.44444       50
#2    6    1     33.33333           25  77.77778       75
#3    4    1     22.22222           25 100.00000      100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")), 
  . %>% arrange(desc(flow)) %>%
  mutate(area_portion = Area/sum(Area)*100, 
         flow_portion = flow/sum(flow) * 100, 
         cum_area = cumsum(area_portion),
         cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

Difference Between 'Paste', 'Str_C', 'Str_Join', 'Stri_Join', 'Stri_C', 'Stri_Paste'