breaking up a long regular expression in R
The regular expression is just a string. You can paste it together across multiple lines like any other string
regex_of_sites <- paste0("side|southeast|north|computer|engineer|",
"first|south|pharm|left|southwest|",
"level|second|thirteenth")
Can I format regular expressions in multiple lines in R?
To turn on free-spacing regular expressions start the regular expressoin with the modifier (?x)
and specify perl=TRUE
. Here is an example where the whitespace in the regular expression between a
and b
is ignored.
grep("(?x)a
b", c("ab", "a b", "a\nb", "ab"), perl = TRUE)
## [1] 1 4
How to split long regular expression rules to multiple lines in Python
You can split your regex pattern by quoting each segment. No backslashes needed.
test = re.compile(('(?P<full_path>.+):\d+:\s+warning:\s+Member'
'\s+(?P<member_name>.+)\s+\((?P<member_type>%s)\) '
'of (class|group|namespace)\s+(?P<class_name>.+)'
'\s+is not documented') % (self.__MEMBER_TYPES), re.IGNORECASE)
You can also use the raw string flag 'r'
and you'll have to put it before each segment.
See the docs.
How to split a long regular expression into multiple lines in JavaScript?
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp()
:
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s@\"]+(\\.[^<>(),[\]\\.,;:\\s@\"]+)*)',
'|(\\".+\\"))@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp
accepts modifiers as a second parameter/regex/g
=>new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
- Disadvantage here is that you can't use plain whitespace in the regular expression string (always use
\s
,\s+
,\s{1,x}
,\t
,\n
etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)|
(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Syntax in R for breaking up LHS of assignment over multiple lines
You can put a line break between any 2 characters that aren't part of a name, and that doesn't leave a syntactically complete expression before the line break (so that the parser knows to look for more). None of these look great, but basically after any [[
or $
or before ]]
you can put a line break. For example:
results$
cases[[i]]$
samples[[j]]$
portions[[k]]$
analytes[[l]]$
column <- x
Or going to the extreme, putting in every syntactically valid line break (without introducing parentheses which would let you do even more):
results$
cases[[
i
]]$
samples[[
j
]]$
portions[[
k
]]$
analytes[[
l
]]$
column <-
x
With parentheses, we lose the "doesn't leave a syntactically complete expression" rule, because the expression won't be complete until the parenthses close. You can add breaks anywhere except in the middle of a name (object or function name). I won't bother with nested indentation for this example.
(
results
$
cases
[[
i
]]
$
samples
[[
j
]]
$
portions
[[
k
]]
$
analytes
[[
l
]]
$
column
<-
x
)
If you want to bring attention to the x
being assigned, you could also use right assignment.
x -> results$cases[[i]]$samples[[j]]$
portions[[k]]$analytes[[l]]$column
Breaking up PascalCase in R
x <- c("BobDylanUSA",
"MikhailGorbachevUSSR",
"HelpfulStackOverflowPeople")
gsub('[a-z]\\K(?=[A-Z])', ' ', x, perl = TRUE)
# [1] "Bob Dylan USA" "Mikhail Gorbachev USSR"
# [3] "Helpful Stack Overflow People"
Or
gsub('(?<=[a-z])(?=[A-Z])', ' ', x, perl = TRUE)
# [1] "Bob Dylan USA" "Mikhail Gorbachev USSR"
# [3] "Helpful Stack Overflow People"
Or this guy which will also split single letter words like I or A
x <- c("BobDylanUSA",
"MikhailGorbachevUSSR",
"HelpfulStackOverflowPeople",
"IAmATallDrinkOfWater")
gsub('(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])', ' ', x, perl = TRUE)
# [1] "Bob Dylan USA" "Mikhail Gorbachev USSR"
# [3] "Helpful Stack Overflow People" "I Am A Tall Drink Of Water"
Splitting a data.table column based on a regular expression
We can use tidyr::separate
:
library(data.table)
dt1 <- fread("category label count
Navigation Product || Green 2
Navigation Survey || Green 5
Navigation Product || Red 10
Navigation Survey || Red 10")
tidyr::separate(dt1, label, sep = "\\|\\|", into = c("Type","Color"))
#> category Type Color count
#> 1: Navigation Product Green 2
#> 2: Navigation Survey Green 5
#> 3: Navigation Product Red 10
#> 4: Navigation Survey Red 10
Split code over multiple lines in an R script
You are not breaking code over multiple lines, but rather a single identifier. There is a difference.
For your issue, try
R> setwd(paste("~/a/very/long/path/here",
"/and/then/some/more",
"/and/then/some/more",
"/and/then/some/more", sep=""))
which also illustrates that it is perfectly fine to break code across multiple lines.
Related Topics
Convert an Integer Column to Time Hh:Mm
Obtain Function from Akima::Interp() Matrix
What Does the %<>% Operator Mean in R
R: How to Match/Join 2 Matrices of Different Dimensions (Nrow/Ncol)
How to Format the X-Axis of the Hard Coded Plotting Function of Spei Package in R
Programmatically Create Tab and Plot in Markdown
How to Find Correct Executable with Sys.Which on Windows
Ggplot2_Error: Geom_Point Requires the Following Missing Aesthetics: Y
How to Split a Vector by Delimiter
Multiplying Vector Combinations
How to Custom or Display Modebar in Plotly
How to Add a Legend for the Secondary Axis Ggplot
Filtering a Dataframe Showing Only Duplicates
How to Merge Two Data Frame Based on Partial String Match with R