Most pythonic way to delete text between two delimiters
Regular expressions are a good match for your problem.
>>> import re
>>> input_str = 'foo [[bar]] baz [[etc.]]'
If you are wanting to remove the whole [[...]]
, which is I think what you are asking about,
>>> re.sub(r'\[\[.*?\]\]', '', input_str)
'foo baz '
If you are wanting to leave the contents of the [[...]]
in,
>>> re.sub(r'\[\[(.*?)\]\]', r'\1', input_str)
'foo bar baz etc.'
Removing strings between two delimiters
Use regex replace with a reluctant quantifier:
str = str.replaceAll("--/--.*?--/--\\s*", "");
The expression *?
is a reluctant quantifier, which means it matches as little as possible while still matching, which in turn means it will stop at the next delimiter after the first in case there are multiple delimiter pairs in the input.
I added \s*
to the end to also remove trailing spaces after the closing delimiter (which your example seemed to suggest was wanted).
To use this approach, you're going to have to read the text file line at a time, rather than word at a time, process the line to remove the username then split into words:
while (textFile.hasNextLine()) {
for (string word : textFile.nextLine().trim().toLowerCase().replaceAll("--/--.*?--/--\\s*", "").split("\\s+")) {
words.add(word);
}
}
C#: Deleting all characters between two Delimiters
A regex can indeed help here:
rs = Regex.Replace(s, "(?<=;)Note=.*?;", "");
Let me explain the more obscure parts of it:
(?<=;)
makes sureNote
is preceded by a semicolon. That semicolon is, however, not part of the replacement. (That's a positive look-behind assertion)..*?;
matches all characters until the semicolon, but non-greedy. This ensures thatNote=A;x=B;
is only matched until the first semicolon andx=B
is retained.
How to remove text between two delimiters in Python
Python has a built in json
module for parsing and modifying JSON. A regular expression is likely to be fragile and more headache than it's probably worth.
You can do the following:
import json
with open('samplfile.json') as input_file, open('output.json', 'w') as output_file:
data = json.load(input_file)
for i in range(len(data['annotations'])):
data['annotations'][i]['segmentation'] = []
json.dump(data, output_file, indent=4)
Then, output.json
contains:
{
"annotations": [
{
"id": 1,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
621.63,
1085.67,
220.02999999999997,
259.03999999999996
],
"area": 56996,
"category_id": 1124044
},
{
"id": 2,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
887.62,
1355.7,
227.0200000000001,
259.8399999999999
],
"area": 58988,
"category_id": 1124044
},
{
"id": 3,
"image_id": 1,
"segmentation": [],
"iscrowd": 0,
"bbox": [
1157.61,
1411.84,
247.2800000000002,
249.7900000000002
],
"area": 61768,
"category_id": 1124044
}
]
}
Remove string between two delimiter, inclusively
Indeed, sed is greedy. But you can do:
sed 's/(def[^)]*)//gi'
Note that not all sed accept the i
flag, so you may need to do:
sed 's/([dD][eE][fF][^)]*)//g'
Related Topics
How to Normalize a Numpy Array to Within a Certain Range
Check Json Data Is None in Python
Find the Index of the First Digit in a String
A Way to Quick Preview .Ipynb Files
Replacing All Negative Values in Certain Columns by Another Value in Pandas
How to Retrieve SQL Result Column Value Using Column Name in Python
How to Set Automatically the Width of a Column in Xlsxwriter
Getting All Possible Combinations from a List With Duplicate Elements
Invalidargumenterror: Logits and Labels Must Have the Same First Dimension Seq2Seq Tensorflow
How to Continue a Loop After Catching Exception in Try ... Except
How to Display Last 2 Digits from a Number in Python
Comparing Items in Lists Within Same Indices Python
Selecting Specific Rows and Columns from Numpy Array
How to Convert a 1 Channel Image into a 3 Channel With Opencv2
Find Value in Dictionary Using Regex in Python