Converting HTML to Odt, Doc, Docx

Converting HTML to odt, doc, docx

To convert to odt it's pretty easy after installing pandoc.

After the relatively hard part: from odt (or even html) you can script (Open|Libre)Office via e.g. unoconv

Or you can like:

abiword --to=doc filename.odt

Also see this thread, and this blog post.

HTH

Differences between command-line and GUI converting (*.html, *.odt, *.doc)

If you want to preserve the HTML, I'd still maintain LiveDocx might just prove useful. I did some more digging, and stumbled on phpdocx. On the bottom of the page there's a link that shows you how to embed HTML.
Both LiveDocX and phpdocx offer examples on their respective sites. I suggest you browse through those.

SO showed up a few times, too: interesting questions might be:

  • OpenTbs convert html tags to MS Word tags
  • How can I convert a docx document to html using php?

I know the latter is the opposite of what you're trying to do, but don't write it off just for that reason. Often, it's quite helpful to look at things from another perspective.

Since your last comment leads me to believe you haven't actually gotten round to coding (I just need a script etc...) I would like to say that SO is not a code generator. When you're done reading about phpdocx and livedocx, perhaps you should read what makes a good question.

I found what you were looking for, I think, here. If you want to use the php-cli, my guess would be setting your script's output stream to a file, and using the headers found below (copy-paste from link).

    header("Content-type: application/vnd.ms-word");
header("Content-Disposition: attachment; Filename=SaveAsWordDoc.doc");

Sorry if I came across a bit harsh, with the remark on SO not being a code generator, and the link to 'what makes a good question'. Didn't mean to bash you.

UPDATE

Sorry, the previous example would only work as a download link: below a working script that turns out a .doc file, based on a html string:

#!/bin/php -n
<?php
$opts = array('file'=>array('header'=>'Content-type: application/vnd.ms-word'."\r\n".'Content-type: application/vnd.ms-word'."\r\n"));
$resource = stream_context_create($opts);
$doc = fopen('asDoc.doc','w+',false,$resource);
if (!$doc)
{
die('FFS');
}
$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$html .='<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=Windows-1252\"><title>Foo</title></head><body><h1>Hello, world</h1></body></html>';
fwrite($doc,$html);
fclose($doc);
exit();
?>

The headers are defined in the context stream, so the first two lines of code are crucial. As is the <meta http-equiv=\"Content-Type\" content=\"text/html; charset=Windows-1252\"> meta tag. All else is pretty basic.
All functions you need are here, so refer to their man pages for more info if you want to know what does what...

Best of luck

Converting HTML + Javascript to Word/OpenOffice document programatically

Thanks to all for the answers! In the end I made it work with PhantomJS (custom script converts JS generated parts to images) and Pandoc (which converts the resulting HTML to DOCX).



Related Topics



Leave a reply



Submit