How to Remove Empty Paragraph Tags from String

How to remove empty paragraph tags from string?

use this regex to remove empty paragraph

/<p[^>]*><\\/p[^>]*>/

example

<?php
$html = "abc<p></p><p>dd</p><b>non-empty</b>";
$pattern = "/<p[^>]*><\\/p[^>]*>/";
//$pattern = "/<[^\/>]*>([\s]?)*<\/[^>]*>/"; use this pattern to remove any empty tag

echo preg_replace($pattern, '', $html);
// output
//abc<p>dd</p><b>non-empty</b>
?>

How do I remove empty p tags from a string using JavaScript or Cheerio?

In your code in the empty <p> tags you have \u200b (Zero width space) characters. This character stay invisible but is there

You can use split() and join('') methods

var test = "<p>This is a slightly longer post about something. Let's see how long this lasts. Okay so this is one paragraph now. </p><p>​</p><p>Let's write another paragraph, and see how it renders when I read this post later. </p><p>​</p><p>This is another short paragraph</p>";

var str = test.split('<p>​</p>').join('');

console.log(str);

Using regex to remove empty paragraph tags p /p (standard str_replace on space not working)

I would use:

$str = preg_replace('~<p>\s*<\/p>~i','',$str);

where \s signifies a white space of any kind (tab, space, etc.) and * indicates 0 or more occurence of this (space). So <p></p>, <p> </p>, <p>{multiple spaces here}</p> will all be replaced by an empty string. The additional i flag is for case-insensitivity, just in case <p>'s might instead be <P>'s.

Remove empty p/p tag from String javascript

Regex is the wrong tool for this.

If you're doing it in a browser, it's easy:

var div = document.createElement('div');
div.innerHTML = str;
Array.prototype.slice.call(div.querySelectorAll('p'), function(p) {
var html = p.innerHTML.trim();
if (!html || html.toLowerCase() == " ") {
p.parentNode.removeChild(p);
}
});
str = div.innerHTML; // Yes, the case of tag names may have changed, etc., but nothing substantive

If you're doing it in another environment, there's an HTML parser available for that environment. NodeJS has several, including cheerio. The JVM (if you're using JavaScript on the JVM) has the excellent JSoup. .Net (if you're using "JScript") has a port of JSoup. Etc.

PHP RegEx remove empty paragraph tags

Well, in conflict with my suggestion not to parse HTML with regexes, I wrote up a regex to do just that:

"#<p>(\s| |</?\s?br\s?/?>)*</?p>#"

This will match properly for:

<p></p>

<p> </p> <!-- ([space]) -->

<p> </p> <!-- (That's a [tab] character in there -->

<p> </p>

<p><br /></p>

<p>
 </p>

<p>
<br /><br />
 </p>

What it does:

# /                --> Regex start
# <p> --> match the opening <p> tag
# ( --> group open.
# \s --> match any whitespace character (newline, space, tab)
# | --> or
#   --> match  
# | --> or
# </?\s?br\s?/?> --> match the <br> tag
# )* --> group close, match any number of any of the elements in the group
# </?p> --> match the closing </p> tag ("/" optional)
# / --> regex end.

Remove empty p tag from String variable using Java?

Have a look at the following snippet:

public class Test {

public static void main(String[] args) {
try {
String html = "<p id=\"Id44\">see the image and see the color... ?</p>\r\n" + "<p id=\"Id40\"></p>\r\n"
+ "<div id=\"Id87\" style=\"display:inline-block\">\r\n"
+ "<video id=\"Id30\" src=\"http://Id3.qa.cete.us/117973/video.mp4\"></video>\r\n" + "</div>\r\n"
+ "<p id=\"Id28\"></p>\r\n" + "<p id=\"Id-1\"></p>\r\n" + "<div id =\"Id21\">\r\n"
+ "<img id=\"img_44186\" src=\"/129884/apple.jpg\" />\r\n" + "</div>\r\n" + "<p id=\"Id-320046-3-21\"></p>";
new Test().modifyMediaVariantContent(html);
} catch (Exception e) {
e.printStackTrace();
}
}

private void modifyMediaVariantContent(String html) {
org.jsoup.nodes.Document doc = Jsoup.parse(html);
for (org.jsoup.nodes.Element element : doc.getElementsByTag("p")) {
if (!element.hasText() && element.isBlock()) {
element.remove();
}
if (element.hasText() && element.parent() == doc.body()) {
Element replacment = new Element(Tag.valueOf("div"), "");
replacment.appendChild(element.clone());
element.replaceWith(replacment);
}
}

System.out.println(doc.body().html());
}
}

This outputs the following:

<div>
<p id="Id44">see the image and see the color... ?</p>
</div>
<div id="Id87" style="display:inline-block">
<video id="Id30" src="http://Id3.qa.cete.us/117973/video.mp4"></video>
</div>
<div id="Id21">
<img id="img_44186" src="/129884/apple.jpg">
</div>

To convert the Jsoup document to a org.w3c.dom.Document use org.jsoup.helper.W3CDom:

W3CDom w3cDom = new W3CDom();
org.w3c.dom.Document w3cDoc = w3cDom.fromJsoup(doc);


Related Topics



Leave a reply



Submit