Get docx file contents using javascript/jquery
With docxtemplater, you can easily get the full text of a word (works with docx only) by using the doc.getFullText() method.
HTML code:<body>
<button onclick="gettext()">Get document text</button>
</body>
<script src="https://cdnjs.cloudflare.com/ajax/libs/docxtemplater/3.26.2/docxtemplater.js"></script>
<script src="https://unpkg.com/pizzip@3.1.1/dist/pizzip.js"></script>
<script src="https://unpkg.com/pizzip@3.1.1/dist/pizzip-utils.js"></script>
<script>
function loadFile(url, callback) {
PizZipUtils.getBinaryContent(url, callback);
}
function gettext() {
loadFile(
"https://docxtemplater.com/tag-example.docx",
function (error, content) {
if (error) {
throw error;
}
var zip = new PizZip(content);
var doc = new window.docxtemplater(zip);
var text = doc.getFullText();
console.log(text);
alert("Text is " + text);
}
);
}
</script>
Output the text from a docx file to a text area
Here is what you can do using :Docxtemplater
Things to remember:
If you are directly downloading the file from build and not from cdn as used in this script, then you will have to create new Docxtemplater()
instead of new window.docxtemplater()
;
var openFile = function(event) { var input = event.target; var reader = new FileReader(); reader.onload = function() { var zip = new JSZip(reader.result); var doc = new window.docxtemplater().loadZip(zip); var text = doc.getFullText(); var node = document.getElementById('output'); node.innerText = text; }; reader.readAsBinaryString(input.files[0]);};
<script src="https://cdnjs.cloudflare.com/ajax/libs/docxtemplater/3.1.9/docxtemplater.js"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/jszip/2.6.1/jszip.js"></script><input type='file' onchange='openFile(event)'><br><div id='output'>...</div>
problem with javascript reading text of docx
You can't achieve it your way. You must use dedicated library to read docx
files. If you run docx
file in notepad you see that it's not plain text, because you have all additional data, meta data etc. in this file.
Examples:
get docx file contents using javascript/jquery
JavaScript library to read doc and docx on client
How to modify the current code to read .docx file using HTML5 file api
In order to read a DOCX file, you need to unzip its content (which is a mix of folders, xml files, and resources like images).
Maybe you can have some clues in this post :
Unzipping files
I doubt you can read a DOC file because it's a binary (and closed) format.
Unrecognized character result content for function Load jQuery?
Instead of jQuery Load() function for reading word document(docx) or other formats (pdf) I use fancybox.jquery.
JavaScript library to read doc and docx on client
You can use docxtemplater for this (even if normally, it is used for templating, it can also just get the text of the document) :
var zip = new JSZip(content);
var doc=new Docxtemplater().loadZip(zip)
var text= doc.getFullText();
console.log(text);
See the Doc for installation information (I'm the maintainer of this project)
However, it only handles docx, not doc
Related Topics
How to Calculate the Sum of Table Column in Angular 2+
How to Sort Data in Ascending or Descending Order in Reactjs
How to Iterate Through Json Nested Objects With Ngfor
Jquery Get Closest Td Text Value
Access a Variable Outside of .Then Function
Use Localstorage Across Subdomains
React Typescript Is Not Assignable to Parameter of Type
Label and Input Fields on the Same Line
Eslint: Disable Warning - 'Defined But Never Used' for Specific Function
Rendering Mjpeg Stream in Html5
Javascript: How to Redirect a Page After Validation
Print Embedded Pdf from Browser With Javascript, Html5, Angularjs
Youtube API - Failed to Execute 'Postmessage' on 'Domwindow'
How to Mute Video With JavaScript or Jquery
React-Redux State Lost After Refresh
Trying to Add Image to Pdf Using Jspdf
Disable Prev Control on First Slide and Disable Next Control on Last Slide