Convert HTML to Plain Text in Swift

Convert HTML to Plain Text in Swift

You can add this extension to convert your html code to a regular string:

edit/update:

Discussion The HTML importer should not be called from a background
thread (that is, the options dictionary includes documentType with a
value of html). It will try to synchronize with the main thread, fail,
and time out. Calling it from the main thread works (but can still
time out if the HTML contains references to external resources, which
should be avoided at all costs). The HTML import mechanism is meant
for implementing something like markdown (that is, text styles,
colors, and so on), not for general HTML import.

Xcode 11.4 • Swift 5.2

extension Data {
var html2AttributedString: NSAttributedString? {
do {
return try NSAttributedString(data: self, options: [.documentType: NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
print("error:", error)
return nil
}
}
var html2String: String { html2AttributedString?.string ?? "" }
}

extension StringProtocol {
var html2AttributedString: NSAttributedString? {
Data(utf8).html2AttributedString
}
var html2String: String {
html2AttributedString?.string ?? ""
}
}

cell.detailTextLabel?.text = item.itemDescription.html2String

Swift - Convert HTML text to Attributed String

I tried this solution with your html and worked fine:

let htmlText = "<medium><b><font color='#2f3744'>IPL Tamil Web Series Episode #3 | யாருடா Swetha ? | Tamil Comedy Web Series | Being Thamizhan</font></b></medium> has been succesfully scheduled on <medium><b><font color='#2f3744'>2018-05-23 08:51 PM</font></b></medium>"
let encodedData = htmlText.data(using: String.Encoding.utf8)!
var attributedString: NSAttributedString

do {
attributedString = try NSAttributedString(data: encodedData, options: [NSAttributedString.DocumentReadingOptionKey.documentType:NSAttributedString.DocumentType.html,NSAttributedString.DocumentReadingOptionKey.characterEncoding:NSNumber(value: String.Encoding.utf8.rawValue)], documentAttributes: nil)
} catch let error as NSError {
print(error.localizedDescription)
} catch {
print("error")
}

attributedString output:

IPL Tamil Web Series Episode #3 | யாருடா Swetha ? | Tamil Comedy Web Series | Being Thamizhan has been succesfully scheduled on 2018-05-23 08:45 PM

Convert attributed text to HTML in Swift 4

For Swift 4.x, the line:

let att = [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType]

should be:

let att = [NSAttributedString.DocumentAttributeKey.documentType: NSAttributedString.DocumentType.html]

Swift: Display HTML data in a label or textView

For Swift 5:

extension String {
var htmlToAttributedString: NSAttributedString? {
guard let data = data(using: .utf8) else { return nil }
do {
return try NSAttributedString(data: data, options: [.documentType: NSAttributedString.DocumentType.html, .characterEncoding:String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
return nil
}
}
var htmlToString: String {
return htmlToAttributedString?.string ?? ""
}
}

Then, whenever you want to put HTML text in a UITextView use:

textView.attributedText = htmlText.htmlToAttributedString

Converting HTML Tag to String

Try out this instead, no need to do the whole process that you have done:

extension String {
var htmlAttributedString: NSAttributedString? {
do {
return try NSAttributedString(data: Data(utf8), options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
print("error:", error)
return nil
}
}
var htmlString: String {
return htmlAttributedString?.string ?? ""

}

Usage:

let html = "<strong>I want to test HTML Tags<br></strong>dsfhjdjf sjdfdj  djfjdfj djkf dfjdhf <strong>adjf<br>asks</strong>djfdkf<br><strong>dfdjk dkfjdk </strong>dfjik iai <strong>adsfhj</strong>"
let str = html.htmlString

So you basically just use the String extension on your Strings. This is what I use in my projects

Update:

The above prints the following:


Sample Image

Here is a example project that you can try.

Converting HTML text into plain text using Objective-C

It depends what iOS version you are targeting. Since iOS7 there is a built-in method that will not only strip the HTML tags, but also put the formatting to the string:

Xcode 9/Swift 4

if let htmlStringData = htmlString.data(using: .utf8), let attributedString = try? NSAttributedString(data: htmlStringData, options: [.documentType : NSAttributedString.DocumentType.html], documentAttributes: nil) {
print(attributedString)
}

You can even create an extension like this:

extension String {
var htmlToAttributedString: NSAttributedString? {
guard let data = self.data(using: .utf8) else {
return nil
}

do {
return try NSAttributedString(data: data, options: [.documentType : NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
print("Cannot convert html string to attributed string: \(error)")
return nil
}
}
}

Note that this sample code is using UTF8 encoding. You can even create a function instead of computed property and add the encoding as a parameter.

Swift 3

let attributedString = try NSAttributedString(data: htmlString.dataUsingEncoding(NSUTF8StringEncoding)!,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil)

Objective-C

[[NSAttributedString alloc] initWithData:[htmlString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];

If you just need to remove everything between < and > (dirty way!!!), which might be problematic if you have these characters in the string, use this:

- (NSString *)stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}

is there any function to make touchable url while converting html text to plain text

Here is how you convert HTML

var htmlString = """
<p style=\"text-align: justify;\">The Ministry of Corporate Affairs (MCA) has informed vide Flash Alert that Form AGILE is likely to be revised on MCA21 Company Forms Download page with effect from May 31, 2019. </p><p style=\"text-align: justify;\"><a href=\"http://example.org\">Link</a></p>
"""

let attributedString = try? NSAttributedString(data: Data(htmlString.utf8), options: [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
], documentAttributes: nil)
textView.attributedText = attributedString

That yields:

Sample Image

How do I decode HTML entities in Swift?

This answer was last revised for Swift 5.2 and iOS 13.4 SDK.


There's no straightforward way to do that, but you can use NSAttributedString magic to make this process as painless as possible (be warned that this method will strip all HTML tags as well).

Remember to initialize NSAttributedString from main thread only. It uses WebKit to parse HTML underneath, thus the requirement.

// This is a[0]["title"] in your case
let htmlEncodedString = "The Weeknd <em>‘King Of The Fall’</em>"

guard let data = htmlEncodedString.data(using: .utf8) else {
return
}

let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]

guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return
}

// The Weeknd ‘King Of The Fall’
let decodedString = attributedString.string
extension String {

init?(htmlEncodedString: String) {

guard let data = htmlEncodedString.data(using: .utf8) else {
return nil
}

let options: [NSAttributedString.DocumentReadingOptionKey: Any] = [
.documentType: NSAttributedString.DocumentType.html,
.characterEncoding: String.Encoding.utf8.rawValue
]

guard let attributedString = try? NSAttributedString(data: data, options: options, documentAttributes: nil) else {
return nil
}

self.init(attributedString.string)

}

}

let encodedString = "The Weeknd <em>‘King Of The Fall’</em>"
let decodedString = String(htmlEncodedString: encodedString)


Related Topics



Leave a reply



Submit