Difference between `paste`, `str_c`, `str_join`, `stri_join`, `stri_c`, `stri_paste`?
stri_join
,stri_c
, andstri_paste
come from packagestringi
and are pure aliasesstr_c
comes fromstringr
and is juststringi::stri_join
with a parameterignore_null
hardcoded toTRUE
whilestringi::stri_join
has it set toFALSE
by default.stringr::str_join
is a deprecated alias forstr_c
see:
library(stringi)
identical(stri_join, stri_c)
# [1] TRUE
identical(stri_join, stri_paste)
# [1] TRUE
library(stringr)
str_c
# function (..., sep = "", collapse = NULL)
# {
# stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE)
# }
# <environment: namespace:stringr>
stri_join
is very similar to base::paste
with a few differences enumerated below:
1. sep = ""
by default
So it behaves more like paste0
by default, but paste0
lost its sep
argument.
identical(paste0("a","b") , stri_join("a","b"))
# [1] TRUE
identical(paste("a","b") , stri_join("a","b",sep=" "))
# [1] TRUE
identical(paste("a","b", sep="-"), stri_join("a","b", sep="-"))
# [1] TRUE
str_c
will behave just like stri_join
here.
2. Behavior with NA
if you paste to NA
using stri_join
, the result is NA
, while paste
converts NA
to "NA"
paste0(c("a","b"),c("c",NA))
# [1] "ac" "bNA"
stri_join(c("a","b"),c("c",NA))
# [1] "ac" NA
str_c
will behave just like stri_join
here as well
3. Behavior with length 0
arguments
When a length 0 value is encountered, character(0)
is returned, except if ignore_null
is set to FALSE
, then the value is ignored. It is different from the behavior of paste
which would convert the length 0
value to ""
and thus contain 2 consecutive separators in the output.
stri_join("a",NULL, "b")
# [1] character(0)
stri_join("a",character(0), "b")
# [1] character(0)
paste0("a",NULL, "b")
# [1] "ab"
stri_join("a",NULL, "b", ignore_null = TRUE)
# [1] "ab"
str_c("a",NULL, "b")
# [1] "ab"
paste("a",NULL, "b") # produces double space!
# [1] "a b"
stri_join("a",NULL, "b", ignore_null = TRUE, sep = " ")
# [1] "a b"
str_c("a",NULL, "b", sep = " ")
# [1] "a b"
4. stri_join
warns more
paste(c("a","b"),c("c","d","e"))
# [1] "a c" "b d" "a e"
paste("a","b", sep = c(" ","-"))
# [1] "a b"
stri_join(c("a","b"),c("c","d","e"), sep = " ")
# [1] "a c" "b d" "a e"
# Warning message:
# In stri_join(c("a", "b"), c("c", "d", "e"), sep = " ") :
# longer object length is not a multiple of shorter object length
stri_join("a","b", sep = c(" ","-"))
# [1] "a b"
# Warning message:
# In stri_join("a", "b", sep = c(" ", "-")) :
# argument `sep` should be one character string; taking the first one
5. stri_join
is faster
microbenchmark::microbenchmark(
stringi = stri_join(rep("a",1000000),rep("b",1000),"c",sep=" "),
base = paste(rep("a",1000000),rep("b",1000),"c")
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# stringi 88.54199 93.4477 97.31161 95.17157 96.8879 131.9737 100 a
# base 166.01024 169.7189 178.31065 171.30910 176.3055 215.5982 100 b
What is the difference between paste/paste0 and str_c?
paste0(..., collapse = NULL)
is a wrapper for paste(..., sep = "", collapse = NULL)
, which means there is no separator. In other words, with paste0()
you can not apply some sort of separator, while you do have that option with paste()
, whereas a single space is the default.
str_c(..., sep = "", collapse = NULL)
is equivalent to paste()
, which means you do have the option to customize your desired separator. The difference is for str_c()
the default is no separator, so it acts just like paste0()
as a default.
Paste()
and paste0()
are both functions from the base package, whereas str_c()
comes from the stringr package.
I did not test/microbenchmark it, but from my experience I do agree to Ryan str_c()
is generally faster.
How to find whether to single insert or delete from a string would make two strings equal?
Since I can't add a comment I will post here the answer.
def single_insert_or_delete(s1,s2):
s1,s2 = s1.lower(), s2.lower()
count = 0
if s1 == s2:
return 0
elif len(s1) == len(s2):
return 2
elif len(s1) - len(s2) == -1:
if s1 == s2[:-1]:
return 1
else:
for i in range(len(s2)):
if s1 == s2[:i] + s2[i+1:]:
return 1
else:
return 2
elif len(s1) - len(s2) == 1:
if s1[:-1] == s2 or s1[1:] == s2:
return 1
else:
for i in range(len(s1)):
if s2 == s1[:i] + s1[i+1:]:
return 1
else:
return 2
else:
return 2
What is the difference between char stringA[LEN] and char* stringB[LEN] in C
Are they any different?
Yes.
Both variables stringA
and stringB
are arrays. stringA
is an array of char
of size LEN
and stringB
is an array of char *
of size LEN
.
char
and char *
are two different types. stringA
can hold only one character string of length LEN
while elements of stingB
can point to LEN
number of strings.
Or does
stringB
again becomes immutable as in the case before?
Whether strings pointed by elements of stringB
is mutable or not will depend on how memory is allocated. If they are initialized with string literals
char* stringB[LEN] = { "Apple", "Bapple", "Capple"};
then they are immutable. In case of
for(int i = 0; i < LEN; i++)
stringB[i] = malloc(30) // Allocating 30 bytes for each element
strcpy(stringB[0], "Apple");
strcpy(stringB[1], "Bapple");
strcpy(stringB[2], "Capple");
they are mutable.
Joining loop output integers as a string?
The problem is that print
ends with a newline, that you don't want
Either change the
end
. Note that you don't a"".join(str(c))
, just usec
for i in text:
c = int(i, base=8)
print(c, end=" ")Or save them in a loop, and print them all at once
result = []
for i in text:
result.append(str(int(i, base=8)))
print(" ".join(result))
Note that you neither need the strip
nor the if/else
and can combine all that in a list comprehension
dataset = "120"
result = [str(int(i, base=8)) for i in dataset.split()]
print(" ".join(result))
Handling zero length character vectors as empty strings
You don't need to map
over tweets
, str_extract_all
can handle vectors
library(stringr)
str_extract_all(tweets, mention_rx)
#[[1]]
#character(0)
#[[2]]
#character(0)
#[[3]]
#[1] "@you"
#[[4]]
#[1] "@you" "@me"
#[[5]]
#[1] "@bla" "@me" "@you"
Now if you need one comma-separated string then you can use map
purrr::map_chr(str_extract_all(tweets, mention_rx), toString)
#[1] "" "" "@you" "@you, @me" "@bla, @me, @you"
To answer the "why" questions, we can look at the documentation of paste
and str_c
functions.
From ?paste
Vector arguments are recycled as needed, with zero-length arguments being recycled to "".
From ?str_c
Zero length arguments are removed.
Hence, by default str_c
removes zero-length arguments which makes the output a 0-length string which fails for map_chr
but it works with map
as map
returns a list
map(tweets, ~str_c(str_extract_all(.x, mention_rx)[[1]], collapse = ", "))
#[[1]]
#character(0)
#[[2]]
#character(0)
#[[3]]
#[1] "@you"
#[[4]]
#[1] "@you, @me"
#[[5]]
#[1] "@bla, @me, @you"
How can I avoid complex for loops?
Using lapply
to loop over files
and dplyr
mutate
to add new columns
library(dplyr)
setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))
#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000
#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100
Or completely going tidyverse
way we can change lapply
with map
and setNames
with set_names
library(tidyverse)
map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))
Updated the tidyverse
approach following some pointers from @Moody_Mudskipper.
Related Topics
Passing Arguments into Multiple Match_Fun Functions in R Fuzzyjoin::Fuzzy_Join
How to Plot Pie Charts in Haplonet Haplotype Networks {Pegas}
Display Frequency Instead of Count with Geom_Bar() in Ggplot
R: Why Kable Doesn't Print Inside a for Loop
Predict() with Arbitrary Coefficients in R
Gcc: Error: Libgomp.Spec: No Such File or Directory with Amazon Linux 2017.09.1
What Is the Internal Implementation of Lists
Package 'Pbkrtest' Is Not Available (For R Version 3.2.2)
Constructing a Named List Without Having to Type Each Object's Name Twice
How to Add Abline with Lattice Xyplot Function
Disabling/Enabling Sidebar from Server Side
R Histogram with Multiple Populations
Contingency Table Based on Third Variable (Numeric)