Converting HTML Text into Plain Text Using Objective-C

Converting HTML text into plain text using Objective-C

It depends what iOS version you are targeting. Since iOS7 there is a built-in method that will not only strip the HTML tags, but also put the formatting to the string:

Xcode 9/Swift 4

if let htmlStringData = htmlString.data(using: .utf8), let attributedString = try? NSAttributedString(data: htmlStringData, options: [.documentType : NSAttributedString.DocumentType.html], documentAttributes: nil) {
print(attributedString)
}

You can even create an extension like this:

extension String {
var htmlToAttributedString: NSAttributedString? {
guard let data = self.data(using: .utf8) else {
return nil
}

do {
return try NSAttributedString(data: data, options: [.documentType : NSAttributedString.DocumentType.html, .characterEncoding: String.Encoding.utf8.rawValue], documentAttributes: nil)
} catch {
print("Cannot convert html string to attributed string: \(error)")
return nil
}
}
}

Note that this sample code is using UTF8 encoding. You can even create a function instead of computed property and add the encoding as a parameter.

Swift 3

let attributedString = try NSAttributedString(data: htmlString.dataUsingEncoding(NSUTF8StringEncoding)!,
options: [NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType],
documentAttributes: nil)

Objective-C

[[NSAttributedString alloc] initWithData:[htmlString dataUsingEncoding:NSUTF8StringEncoding] options:@{NSDocumentTypeDocumentAttribute: NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute: [NSNumber numberWithInt:NSUTF8StringEncoding]} documentAttributes:nil error:nil];

If you just need to remove everything between < and > (dirty way!!!), which might be problematic if you have these characters in the string, use this:

- (NSString *)stringByStrippingHTML {
NSRange r;
NSString *s = [[self copy] autorelease];
while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
s = [s stringByReplacingCharactersInRange:r withString:@""];
return s;
}

How to convert any string text into plain-text format in objective-c

I wonder if you're running into this problem...

By default, pasting formatted / rich text into a text field will not keep the formatting. It will be pasted as plain-text, and will be displayed with the font you've set for your field.

Try a test. Copy and paste this next line into your text field:

This should lose formatting.

Then try copy/paste with this line:

should formatting.

The first test line uses html tags. If you inspect the source, it should look like this:

<strong>This</strong> should <strong><em>lose</em></strong> formatting.

The second test line, however, uses unicode characters, and if you inspect the source it will look just like it looks here in this answer.

If that is what you're seeing, there is no (easy) way to "remove" the formatting, because there is no formatting.

It's much the same as if the user pastes an emoji, such as - - into your text field.

How to Convert HTML String to the Formatted String in Objective C

Try this one.this might be helpful

textview= [[UITextView alloc]initWithFrame:CGRectMake(10, 130, 250, 170)];
NSString *str = [NSString stringWithFormat:@"<font color='red'>A</font><br/> shared photo of <font color='red'>B</font> with <font color='red'>C</font>, <font color='red'>D</font> "];
[textview setValue:str forKey:@"contentToHTMLString"];
textview.textAlignment = NSTextAlignmentLeft;
textview.editable = NO;
textview.font = [UIFont fontWithName:@"Verdana" size:20.0];

How to convert NSString HTML markup to plain text NSString?

You can do it by parsing the html by using NSScanner class

- (NSString *)flattenHTML:(NSString *)html {

NSScanner *theScanner;
NSString *text = nil;
theScanner = [NSScanner scannerWithString:html];

while ([theScanner isAtEnd] == NO) {

[theScanner scanUpToString:@"<" intoString:NULL] ;

[theScanner scanUpToString:@">" intoString:&text] ;

html = [html stringByReplacingOccurrencesOfString:[NSString stringWithFormat:@"%@>", text] withString:@""];
}
//
html = [html stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

return html;
}

Hope this helps.



Related Topics



Leave a reply



Submit