Get Docx File Contents Using Javascript/Jquery

Get docx file contents using javascript/jquery

With docxtemplater, you can easily get the full text of a word (works with docx only) by using the doc.getFullText() method.

HTML code:
<body>
<button onclick="gettext()">Get document text</button>
</body>
<script src="https://cdnjs.cloudflare.com/ajax/libs/docxtemplater/3.26.2/docxtemplater.js"></script>
<script src="https://unpkg.com/pizzip@3.1.1/dist/pizzip.js"></script>
<script src="https://unpkg.com/pizzip@3.1.1/dist/pizzip-utils.js"></script>
<script>
function loadFile(url, callback) {
PizZipUtils.getBinaryContent(url, callback);
}
function gettext() {
loadFile(
"https://docxtemplater.com/tag-example.docx",
function (error, content) {
if (error) {
throw error;
}
var zip = new PizZip(content);
var doc = new window.docxtemplater(zip);
var text = doc.getFullText();
console.log(text);
alert("Text is " + text);
}
);
}
</script>

Output the text from a docx file to a text area

Here is what you can do using :Docxtemplater

Things to remember:

If you are directly downloading the file from build and not from cdn as used in this script, then you will have to create new Docxtemplater() instead of new window.docxtemplater();

var openFile = function(event) {  var input = event.target;  var reader = new FileReader();  reader.onload = function() {    var zip = new JSZip(reader.result);    var doc = new window.docxtemplater().loadZip(zip);    var text = doc.getFullText();    var node = document.getElementById('output');    node.innerText = text;  };  reader.readAsBinaryString(input.files[0]);};
<script src="https://cdnjs.cloudflare.com/ajax/libs/docxtemplater/3.1.9/docxtemplater.js"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/jszip/2.6.1/jszip.js"></script><input type='file' onchange='openFile(event)'><br><div id='output'>...</div>

problem with javascript reading text of docx

You can't achieve it your way. You must use dedicated library to read docx files. If you run docx file in notepad you see that it's not plain text, because you have all additional data, meta data etc. in this file.

Examples:

get docx file contents using javascript/jquery

JavaScript library to read doc and docx on client

How to modify the current code to read .docx file using HTML5 file api

In order to read a DOCX file, you need to unzip its content (which is a mix of folders, xml files, and resources like images).
Maybe you can have some clues in this post :
Unzipping files

I doubt you can read a DOC file because it's a binary (and closed) format.

Unrecognized character result content for function Load jQuery?

Instead of jQuery Load() function for reading word document(docx) or other formats (pdf) I use fancybox.jquery.

JavaScript library to read doc and docx on client

You can use docxtemplater for this (even if normally, it is used for templating, it can also just get the text of the document) :

var zip = new JSZip(content);
var doc=new Docxtemplater().loadZip(zip)
var text= doc.getFullText();
console.log(text);

See the Doc for installation information (I'm the maintainer of this project)

However, it only handles docx, not doc



Related Topics



Leave a reply



Submit