Converting Docx Files to Text in Swift

Converting Docx Files To Text In Swift

Your initial issue is with how you get the string from the URL. String(File_Name) is not the correct way to convert a file URL into a file path. The proper way is to use the path function.

let location = NSURL.fileURLWithPath(NSTemporaryDirectory())
let fileURL = location.URLByAppendingPathComponent("My File.docx")
let fileContent = try? NSString(contentsOfFile: fileURL.path, encoding: NSUTF8StringEncoding)

Note the many changes. Use proper naming conventions. Name variables more clearly.

Now here's the thing. This still won't work because a docx file is a zipped up collection of XML and other files. You can't load a docx file into an NSString. You would need to use NSData to load the zip contents. Then you would need to unzip it. Then you would need to go through all of the files and find the desired text. It's far from trivial and it is far beyond the scope of a single stack overflow post.

Convert text to .docx document file and open share dialog, Swift 2

I figured this one out for you.

Take this answer user Casey made, and just change this line:

let fileName = "\(title).txt"

to this

let fileName = "\(title).docx"

and Swift will figure out that you want to export your text as a document, and convert it automatically.

How can I parse text from rich text format files like (.doc, .pages, .docx, etc.)

This can't be done using Data(contentsOf:) or String(contentsOf:) because .docx format is a zipped format consists of xml and other files. In order to parse the text from the .docx file, you should unzip the doc file. In my case, I used ZIPFoundation to unzip the document. Parse the file named word/document.xml under the extract path using any XML Parser and you will be able to get the text from the document.

Sources:

Converting Docx Files To Text In Swift

Reading or Converting word .doc files iOS

Read contents of a DOC (.docx or .doc) file and convert it to String

You can use this SNDocx

OR

Doing this is not as easy as you imagine,
a docx file is a zipped up collection of XML and other files. You can't load a docx file into an String. You would need to use Data to load the zip contents. Then you would need to unzip it. Then you would need to go through all of the files and find the desired word/document.xml then read xml and parse .

Im use Zippy

Look this code

 guard let originalFileURL = Bundle.main.url(forResource: "test", withExtension: "docx") else {
print("file not found :( ")
return
}
do{

let filename = try! ZipFile.init(url: originalFileURL)
// file name content
// - 0 : "[Content_Types].xml"
// - 1 : "word/numbering.xml"
// - 2 : "_rels/.rels"
// - 3 : "word/theme/theme1.xml"
// - 4 : "word/fontTable.xml"
// - 5 : "word/document.xml"
// - 6 : "word/settings.xml"
// - 7 : "word/styles.xml"
// - 8 : "word/_rels/document.xml.rels"

for file in filename {
if file.contains("document.xml"){
let data = filename[file]
print(String.init(data: data!, encoding: String.Encoding.utf8))
}
}

}catch{
print(error)
}

Output

<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r<w:document xmlns:mc=\"http://schemas.openxmlformats.org/markup-compatibility/2006\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:m=\"http://schemas.openxmlformats.org/officeDocument/2006/math\" xmlns:v=\"urn:schemas-microsoft-com:vml\" xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\" xmlns:w10=\"urn:schemas-microsoft-com:office:word\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" xmlns:wne=\"http://schemas.microsoft.com/office/word/2006/wordml\" xmlns:sl=\"http://schemas.openxmlformats.org/schemaLibrary/2006/main\" xmlns:a=\"http://schemas.openxmlformats.org/drawingml/2006/main\" xmlns:pic=\"http://schemas.openxmlformats.org/drawingml/2006/picture\" xmlns:c=\"http://schemas.openxmlformats.org/drawingml/2006/chart\" xmlns:lc=\"http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas\" xmlns:dgm=\"http://schemas.openxmlformats.org/drawingml/2006/diagram\" xmlns:wps=\"http://schemas.microsoft.com/office/word/2010/wordprocessingShape\" xmlns:wpg=\"http://schemas.microsoft.com/office/word/2010/wordprocessingGroup\" xmlns:w14=\"http://schemas.microsoft.com/office/word/2010/wordml\" xmlns:w15=\"http://schemas.microsoft.com/office/word/2012/wordml\"><w:body><w:p w:rsidR=\"00000000\" w:rsidDel=\"00000000\" w:rsidP=\"00000000\" w:rsidRDefault=\"00000000\" w:rsidRPr=\"00000000\" w14:paraId=\"00000000\"><w:pPr><w:contextualSpacing w:val=\"0\"/><w:rPr/></w:pPr><w:r w:rsidDel=\"00000000\" w:rsidR=\"00000000\" w:rsidRPr=\"00000000\"><w:rPr><w:rtl w:val=\"0\"/></w:rPr><w:t xml:space=\"preserve\">test</w:t></w:r></w:p><w:sectPr><w:pgSz w:h=\"15840\" w:w=\"12240\"/><w:pgMar w:bottom=\"1440\" w:top=\"1440\" w:left=\"1440\" w:right=\"1440\" w:header=\"0\"/><w:pgNumType w:start=\"1\"/></w:sectPr></w:body></w:document>

You have to parse xml this and you will find in the my output that it must be parse until this value is obtained

<w:t xml:space=\"preserve\">test</w:t>

docx XML Format reference

Create a word document, Swift

Unfortunately, it is nearly impossible to create a .docx file in Swift, given how complicated they are (you can see for yourself by changing the file extension on any old .docx file to .zip, which will reveal their inner structure). The next best thing is to simply create a .txt file, which can also be opened into Pages (though sadly not Docs). If you're looking for a more polished format, complete with formatting and possibly even images, you could choose to create a .pdf file.


Here are some code samples that might be of assistance:

Creating and sharing a .txt file in Swift 3:

func export(_ string: String, title: String) throws {
// create a file path in a temporary directory
let fileName = "\(title).txt"
let filePath = (NSTemporaryDirectory() as NSString).appendingPathComponent(fileName)

// save the string to the file
try string.write(toFile: filePath, atomically: true, encoding: String.Encoding.utf8)

// open share dialog

// Initialize Document Interaction Controller
self.interactionController = UIDocumentInteractionController(url: URL(fileURLWithPath: filePath))
// Configure Document Interaction Controller
self.interactionController!.delegate = self
// Present Open In Menu
self.interactionController!.presentOptionsMenu(from: yourexportbarbuttonoutlet, animated: true) // create an outlet from an Export bar button outlet, then use it as the `from` argument
}

This can be called with

export("Hello World", title: "HelloWorld")

to instantly create a txt file and open the share dialog for it.


Creating and sharing a simple .pdf file in Swift 3:

func openPDF(_ string: String, title: String) throws {
// 1. Create a print formatter

let html = "<h2>\(title)</h2><br><h4>\(string)</h4>" // create some text as the body of the PDF with html.

let fmt = UIMarkupTextPrintFormatter(markupText: html)

// 2. Assign print formatter to UIPrintPageRenderer

let render = UIPrintPageRenderer()
render.addPrintFormatter(fmt, startingAtPageAt: 0)

// 3. Assign paperRect and printableRect

let page = CGRect(x: 10, y: 10, width: 595.2, height: 841.8) // A4, 72 dpi, x and y are horizontal and vertical margins
let printable = page.insetBy(dx: 0, dy: 0)

render.setValue(NSValue(cgRect: page), forKey: "paperRect")
render.setValue(NSValue(cgRect: printable), forKey: "printableRect")

// 4. Create PDF context and draw

let pdfData = NSMutableData()
UIGraphicsBeginPDFContextToData(pdfData, CGRect.zero, nil)

for i in 1...render.numberOfPages {

UIGraphicsBeginPDFPage();
let bounds = UIGraphicsGetPDFContextBounds()
render.drawPage(at: i - 1, in: bounds)
}

UIGraphicsEndPDFContext();

// 5. Save PDF file

var path = "\(NSTemporaryDirectory())\(title).pdf"
pdfData.write(toFile: path, atomically: true)
print("open \(path)") // check if we got the path right.
// open share dialog
print("opening share dialog")
// Initialize Document Interaction Controller
self.interactionController = UIDocumentInteractionController(url: URL(fileURLWithPath: path))
// Configure Document Interaction Controller
self.interactionController!.delegate = self
// Present Open In Menu
self.interactionController!.presentOptionsMenu(from: yourexportbarbuttonoutlet, animated: true) // create an outlet from an Export bar button outlet, then use it as the `from` argument
}

This can be called with

openPDF("Hello World", title: "HelloWorld")

to instantly create a pdf file and open the share dialog for it.


Edit: Found an interesting (though not polished) workaround to getting text to open up in Google Docs: use the function from the "creating a .txt file" section here, and just change the filename to "\(title).docx". This will fool Docs into thinking it's a .docx document, which will allow the text to open in Docs successfully. Unfortunately, this creates an invalid document that can't be opened by Pages, Word, or really any other app because it doesn't actually create a real document file. And the Interaction Controller will make it look to the user like they can also open it in Pages, though that invariably fails.



Related Topics



Leave a reply



Submit