What is the best practice to parse html in swift?
There are several nice libraries of HTML Parsing using Swift
and Objective-C
like the followings:
- hpple
- NDHpple
- Kanna( old Swift-HTML-Parser)
- Fuzi
- SwiftSoup
- Ji
Take a look in the following examples in the four libraries posted above, mainly parsed using XPath 2.0:
hpple:
let data = NSData(contentsOfFile: path)
let doc = TFHpple(htmlData: data)
if let elements = doc.searchWithXPathQuery("//a/@href[ends-with(.,'.txt')]") as? [TFHppleElement] {
for element in elements {
println(element.content)
}
}
NDHpple:
let data = NSData(contentsOfFile: path)!
let html = NSString(data: data, encoding: NSUTF8StringEncoding)!
let doc = NDHpple(HTMLData: html)
if let elements = doc.searchWithXPathQuery("//a/@href[ends-with(.,'.txt')]") {
for element in elements {
println(element.children?.first?.content)
}
}
Kanna (Xpath and CSS Selectors):
let html = "<html><head></head><body><ul><li><input type='image' name='input1' value='string1value' class='abc' /></li><li><input type='image' name='input2' value='string2value' class='def' /></li></ul><span class='spantext'><b>Hello World 1</b></span><span class='spantext'><b>Hello World 2</b></span><a href='example.com'>example(English)</a><a href='example.co.jp'>example(JP)</a></body>"
if let doc = Kanna.HTML(html: html, encoding: NSUTF8StringEncoding) {
var bodyNode = doc.body
if let inputNodes = bodyNode?.xpath("//a/@href[ends-with(.,'.txt')]") {
for node in inputNodes {
println(node.contents)
}
}
}
Fuzi (Xpath and CSS Selectors):
let html = "<html><head></head><body><ul><li><input type='image' name='input1' value='string1value' class='abc' /></li><li><input type='image' name='input2' value='string2value' class='def' /></li></ul><span class='spantext'><b>Hello World 1</b></span><span class='spantext'><b>Hello World 2</b></span><a href='example.com'>example(English)</a><a href='example.co.jp'>example(JP)</a></body>"
do {
// if encoding is omitted, it defaults to NSUTF8StringEncoding
let doc = try HTMLDocument(string: html, encoding: NSUTF8StringEncoding)
// XPath queries
for anchor in doc.xpath("//a/@href[ends-with(.,'.txt')]") {
print(anchor.stringValue)
}
} catch let error {
print(error)
}
The ends-with
function is part of Xpath 2.0.
SwiftSoup (CSS Selectors):
do{
let doc: Document = try SwiftSoup.parse("...")
let links: Elements = try doc.select("a[href]") // a with href
let pngs: Elements = try doc.select("img[src$=.png]")
// img with src ending .png
let masthead: Element? = try doc.select("div.masthead").first()
// div with class=masthead
let resultLinks: Elements? = try doc.select("h3.r > a") // direct a after h3
} catch Exception.Error(let type, let message){
print(message)
} catch {
print("error")
}
Ji (XPath):
let jiDoc = Ji(htmlURL: URL(string: "http://www.apple.com/support")!)
let titleNode = jiDoc?.xPath("//head/title")?.first
print("title: \(titleNode?.content)") // title: Optional("Official Apple Support")
I hope this helps you.
Is there an easy solution for parsing html in swift to get individual elements into their own variable?
Parsing HTML without a third party is not achievable without a WebView
, BUT YOU CAN easily use a webView and run a getElementsByTagName
with JS
on it to get anything from the HTML code like this:
1- Define the js code:
let js = "document.getElementsByTagName("title")[0].innerHTML"
2- Import WebKit
and load the html into a webView
class MyViewController : UIViewController {
let html = """
<#the HTML code, can be loaded from anywhere#>
"""
override func loadView() {
let webView = WKWebView()
webView.navigationDelegate = self // Here is the Delegate
webView.loadHTMLString(html, baseURL: nil)
self.view = webView
}
}
3- Take the delegation and implement this method:
extension MyViewController: WKNavigationDelegate {
func webView(_ webView: WKWebView, didFinish navigation: WKNavigation!) {
webView.evaluateJavaScript(js) {(result, error) in
guard error == nil else {
print(error!)
return
}
print(String(describing: result))
}
}
}
Note 1: remember getElementsByTagName
returns an array and you must pass the index you want the get like [0]
Note 2: since it use JavaScriptCore
, it can't be done without webView, and it must be run on mainThread. Only safari can do this off main thread, because it has V8 engine.
Note 3: You must wait for delegate to be completed even if you pass the HTML statically
Note 4: you can use a third party framework like SwiftSoap to do this.
Trying to parse HTML in Swift 4 using only the Standard Library
You can use regex to find all string occurrences between two specific strings (check this SO answer) and use the extension method ranges(of:)
from this answer to get all ranges of that regex pattern. You just need to pass options .regularExpression to that method.
extension String {
func ranges(of string: String, options: CompareOptions = .literal) -> [Range<Index>] {
var result: [Range<Index>] = []
var start = startIndex
while let range = range(of: string, options: options, range: start..<endIndex) {
result.append(range)
start = range.lowerBound < range.upperBound ? range.upperBound : index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
}
return result
}
func slices(from: String, to: String) -> [Substring] {
let pattern = "(?<=" + from + ").*?(?=" + to + ")"
return ranges(of: pattern, options: .regularExpression)
.map{ self[$0] }
}
}
Testing playground
let itemListURL = URL(string: "http://steamcommunity.com/market/search?appid=252490")!
let itemListHTML = try! String(contentsOf: itemListURL, encoding: .utf8)
let result = itemListHTML.slices(from: "market_listing_row_link\" href=\"", to: "\"")
result.forEach({print($0)})
Result
http://steamcommunity.com/market/listings/252490/Night%20Howler%20AK47
http://steamcommunity.com/market/listings/252490/Hellcat%20SAR
http://steamcommunity.com/market/listings/252490/Metal
http://steamcommunity.com/market/listings/252490/Volcanic%20Stone%20Hatchet
http://steamcommunity.com/market/listings/252490/Box
http://steamcommunity.com/market/listings/252490/High%20Quality%20Bag
http://steamcommunity.com/market/listings/252490/Utilizer%20Pants
http://steamcommunity.com/market/listings/252490/Lizard%20Skull
http://steamcommunity.com/market/listings/252490/Frost%20Wolf
http://steamcommunity.com/market/listings/252490/Cloth
parsing HTML in swift
A couple of thoughts:
The use of
//
says "find this anywhere in the HTML". If you want to control what level you want to consider, just use/
and follow this from the root of the document. For example, to get the second level, but not the first or third levels, you'd do something like:let tutorialsParser = TFHpple(HTMLData: data)
let tutorialsXPathString = "/html/body/ul/li/ul/li"
if let tutorialNodes = tutorialsParser.searchWithXPathQuery(tutorialsXPathString) as? [TFHppleElement] {
for element in tutorialNodes {
let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
let identifier = element.attributes["id"] as String
println("id = \(identifier); content = \(content)")
}
}Note, I'm not sure why you were using the scanner, but if you want the attributes of an element, you can use the
attributes
method.I also defined the
tutorialNodes
to be an array ofTFHppleElement
objects, which simplifies thefor
loop a bit.If you wanted the top level
/ul/li
followed by the second level, but not the third level, you could do something like:let tutorialsParser = TFHpple(HTMLData: data)
let tutorialsXPathString = "/html/body/ul/li"
if let tutorialNodes = tutorialsParser.searchWithXPathQuery(tutorialsXPathString) as? [TFHppleElement] {
for element in tutorialNodes {
let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
let identifier = element.attributes["id"] as String
println("id = \(identifier); content = \(content)")
if let ul = element.childrenWithTagName("ul") as? [TFHppleElement] {
if let li = ul.first?.childrenWithTagName("li") as? [TFHppleElement] {
for element in li {
let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
let identifier = element.attributes["id"] as String
println(" child id = \(identifier); content = \(content)")
}
}
}
}
}Or you could do something like:
let tutorialsParser = TFHpple(HTMLData: data)
let tutorialsXPathString = "/html/body/ul/li"
if let tutorialNodes = tutorialsParser.searchWithXPathQuery(tutorialsXPathString) as? [TFHppleElement] {
for element in tutorialNodes {
let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
let identifier = element.attributes["id"] as String
println("id = \(identifier); content = \(content)")
if let children = element.searchWithXPathQuery("/html/body/li/ul/li") as? [TFHppleElement] {
for element in children {
let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
let identifier = element.attributes["id"] as String
println(" child id = \(identifier); content = \(content)")
}
}
}
}
Swift - Parsing a Web Page
I found the solution:
import UIKit
import Alamofire
import SwiftSoup
class ViewController: UIViewController {
override func viewDidLoad() {
super.viewDidLoad()
let diyanetURL = "https://namazvakitleri.diyanet.gov.tr/tr-TR/8648"
// let params = ["ulkeId" : 2, "ilId" : 500,"ilceId" : 9146]
Alamofire.request(diyanetURL, method: .post, parameters: nil, encoding: URLEncoding.default).validate(contentType: ["application/x-www-form-urlencoded"]).response { (response) in
if let data = response.data, let utf8Text = String(data: data, encoding: .utf8) {
do {
let html: String = utf8Text
let doc: Document = try SwiftSoup.parse(html)
for row in try! doc.select("tr") {
print("------------------")
for col in try! row.select("td") {
print(try col.text())
}
}
} catch let error {
print(error.localizedDescription)
}
}
}
}
}
Related Topics
What Is the Purpose of the "Role" Attribute in Html
How to Force a Long String Without Any Blank to Be Wrapped
How to Make an Editable Div Look Like a Text Field
How to *Really* Justify a Horizontal Menu in Html+Css
CSS Background Image Not Loading
Required Attribute Not Work in Safari Browser
Dynamically Adding and Removing Components in Angular
Should Global CSS Styles Be Set on the HTML Element or the Body Element
Bootstrap - Align Button to the Bottom of Card
Multiple Spaces Between Words in HTML Without &Nbsp;
Changing the Color of an Hr Element
Imitating a Blink Tag With Css3 Animations
Two Inline-Block, Width 50% Elements Wrap to Second Line
Ng-App Vs. Data-Ng-App, What Is the Difference
How to Change Font-Family of Drop Down's List Item
How to Post/Submit an Input Checkbox That Is Disabled