How to Replace All Xhtml/HTML Line Breaks (<Br>) with New Lines

How to replace all XHTML/HTML line breaks (br) with new lines?

I would generally say "don't use regex to work with HTML", but, on this one, I would probably go with a regex, considering that <br> tags generally look like either :

  • <br>
  • or <br/>, with any number of spaces before the /


I suppose something like this would do the trick :

$html = 'this <br>is<br/>some<br />text <br    />!';
$nl = preg_replace('#<br\s*/?>#i', "\n", $html);
echo $nl;

Couple of notes :

  • starts with <br
  • followed by any number of white characters : \s*
  • optionnaly, a / : /?
  • and, finally, a >
  • and this using a case-insensitive match (#i), as <BR> would be valid in HTML

Convert (render) HTML to Text with correct line-breaks

The code below works correctly with the example provided, even deals with some weird stuff like <div><br></div>, there're still some things to improve, but the basic idea is there. See the comments.

public static string FormatLineBreaks(string html)
{
//first - remove all the existing '\n' from HTML
//they mean nothing in HTML, but break our logic
html = html.Replace("\r", "").Replace("\n", " ");

//now create an Html Agile Doc object
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

//remove comments, head, style and script tags
foreach (HtmlNode node in doc.DocumentNode.SafeSelectNodes("//comment() | //script | //style | //head"))
{
node.ParentNode.RemoveChild(node);
}

//now remove all "meaningless" inline elements like "span"
foreach (HtmlNode node in doc.DocumentNode.SafeSelectNodes("//span | //label")) //add "b", "i" if required
{
node.ParentNode.ReplaceChild(HtmlNode.CreateNode(node.InnerHtml), node);
}

//block-elements - convert to line-breaks
foreach (HtmlNode node in doc.DocumentNode.SafeSelectNodes("//p | //div")) //you could add more tags here
{
//we add a "\n" ONLY if the node contains some plain text as "direct" child
//meaning - text is not nested inside children, but only one-level deep

//use XPath to find direct "text" in element
var txtNode = node.SelectSingleNode("text()");

//no "direct" text - NOT ADDDING the \n !!!!
if (txtNode == null || txtNode.InnerHtml.Trim() == "") continue;

//"surround" the node with line breaks
node.ParentNode.InsertBefore(doc.CreateTextNode("\r\n"), node);
node.ParentNode.InsertAfter(doc.CreateTextNode("\r\n"), node);
}

//todo: might need to replace multiple "\n\n" into one here, I'm still testing...

//now BR tags - simply replace with "\n" and forget
foreach (HtmlNode node in doc.DocumentNode.SafeSelectNodes("//br"))
node.ParentNode.ReplaceChild(doc.CreateTextNode("\r\n"), node);

//finally - return the text which will have our inserted line-breaks in it
return doc.DocumentNode.InnerText.Trim();

//todo - you should probably add "&code;" processing, to decode all the   and such
}

//here's the extension method I use
private static HtmlNodeCollection SafeSelectNodes(this HtmlNode node, string selector)
{
return (node.SelectNodes(selector) ?? new HtmlNodeCollection(node));
}

How can i convert/replace every newline to 'br/'?

You need to use html_safe if you want to render embedded HTML:

<%= @the_string.html_safe %>

If it might be nil, raw(@the_string) won't throw an exception. I'm a bit ambivalent about raw; I almost never try to display a string that might be nil.

Removing newline after h1 tags?

Sounds like you want to format them as inline. By default, h1 and h2 are block-level elements which span the entire width of the line. You can change them to inline with css like this:

h1, h2 {
display: inline;
}

Here's an article that explains the difference between block and inline in more detail: http://www.webdesignfromscratch.com/html-css/css-block-and-inline/

To maintain vertical padding, use inline-block, like this:

h1, h2 {
display: inline-block;
}

How do I create a new line in Javascript?

Use the \n for a newline character.

document.write("\n");

You can also have more than one:

document.write("\n\n\n"); // 3 new lines!  My oh my!

However, if this is rendering to HTML, you will want to use the HTML tag for a newline:

document.write("<br>");

The string Hello\n\nTest in your source will look like this:

Hello!

Test

The string Hello<br><br>Test will look like this in HTML source:

Hello<br><br>Test

The HTML one will render as line breaks for the person viewing the page, the \n just drops the text to the next line in the source (if it's on an HTML page).

HTML 5: Is it br, br/, or br /?

Simply <br> is sufficient.

The other forms are there for compatibility with XHTML; to make it possible to write the same code as XHTML, and have it also work as HTML. Some systems that generate HTML may be based on XML generators, and thus do not have the ability to output just a bare <br> tag; if you're using such a system, it's fine to use <br/>, it's just not necessary if you don't need to do it.

Very few people actually use XHTML, however. You need to serve your content as application/xhtml+xml for it to be interpreted as XHTML, and that will not work in old versions of IE - it will also mean that any small error you make will prevent your page from being displayed in browsers that do support XHTML. So, most of what looks like XHTML on the web is actually being served, and interpreted, as HTML. See Serving XHTML as text/html Considered Harmful for some more information.

Replacing line breaks with br tags in multi-line text nodes not enclosed in tags

So it's a little more complicated than what I said in my comment, but I think something like this might work:

public static void main (String[] args)
{
String text = "text11\n"
+ "text 21<p>tagged text1\n"
+ "tagged text2</p>\n"
+ "text 2";

StringBuilder sb = new StringBuilder("<body>");
sb.append(text);
sb.append("</body>");
Document doc = Jsoup.parseBodyFragment(sb.toString());
Element body = doc.select("body");
List<Node> children = body.childNodes();
StringBuilder sb2 = new StringBuilder();
for(Node n : children) {
if(n instanceof TextNode) {
n.text(n.getWholeText().replace("\n", "<br/>"));
}
sb2.append(n.toString());
}
System.out.println(sb2.toString());
}

Basically get all the Nodes, do a replace on the TextNodes, and put them back together. I'm not 100% sure this will work as-is, since I am not able to test it at the moment. But hopefully it gets the idea across.

What I said in my comment doesn't work because you have to be able to put the child elements back in place between the text. You can't do that if you just use getOwnText().

I haven't used Jsoup much myself, so improvements are welcome if anyone has any.

Keep line breaks in HTML string

HTML, in general, uses br tags to denote a new line. A plain textarea tag does not use this, it uses whatever the user's system uses to denote a new line. This can vary by operating system.

Your simplest solution is to use CSS

<main role="main" class="container">
<p style="margin-bottom: 2rem;white-space:pre-wrap;">{{review.body}}</p>
</main>

This will maintain any "white space" formatting, including additional spaces.

If you want to actually replace the newline characters with br tags you can use the following regex

<main role="main" class="container">
<p style="margin-bottom: 2rem;" [innerHTML]="review.body.replace(/(?:\r\n|\r|\n)/g, '<br>')"></p>
</main>

Edit Thanks to ConnorsFan for the heads up on replace not working with interpolation.

replace br tag from a string in php

preg_replace("/<br\W*?\/>/", "\n", $your_string);


Related Topics



Leave a reply



Submit