Converting HTML to odt, doc, docx
To convert to odt
it's pretty easy after installing pandoc
.
After the relatively hard part: from odt
(or even html
) you can script (Open|Libre)Office via e.g. unoconv
Or you can like:
abiword --to=doc filename.odt
Also see this thread, and this blog post.
HTH
Differences between command-line and GUI converting (*.html, *.odt, *.doc)
If you want to preserve the HTML, I'd still maintain LiveDocx might just prove useful. I did some more digging, and stumbled on phpdocx. On the bottom of the page there's a link that shows you how to embed HTML.
Both LiveDocX and phpdocx offer examples on their respective sites. I suggest you browse through those.
SO showed up a few times, too: interesting questions might be:
- OpenTbs convert html tags to MS Word tags
- How can I convert a docx document to html using php?
I know the latter is the opposite of what you're trying to do, but don't write it off just for that reason. Often, it's quite helpful to look at things from another perspective.
Since your last comment leads me to believe you haven't actually gotten round to coding (I just need a script etc...
) I would like to say that SO is not a code generator. When you're done reading about phpdocx and livedocx, perhaps you should read what makes a good question.
I found what you were looking for, I think, here. If you want to use the php-cli, my guess would be setting your script's output stream to a file, and using the headers found below (copy-paste from link).
header("Content-type: application/vnd.ms-word");
header("Content-Disposition: attachment; Filename=SaveAsWordDoc.doc");
Sorry if I came across a bit harsh, with the remark on SO not being a code generator, and the link to 'what makes a good question'. Didn't mean to bash you.
UPDATE
Sorry, the previous example would only work as a download link: below a working script that turns out a .doc file, based on a html string:
#!/bin/php -n
<?php
$opts = array('file'=>array('header'=>'Content-type: application/vnd.ms-word'."\r\n".'Content-type: application/vnd.ms-word'."\r\n"));
$resource = stream_context_create($opts);
$doc = fopen('asDoc.doc','w+',false,$resource);
if (!$doc)
{
die('FFS');
}
$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$html .='<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=Windows-1252\"><title>Foo</title></head><body><h1>Hello, world</h1></body></html>';
fwrite($doc,$html);
fclose($doc);
exit();
?>
The headers are defined in the context stream, so the first two lines of code are crucial. As is the <meta http-equiv=\"Content-Type\" content=\"text/html; charset=Windows-1252\">
meta tag. All else is pretty basic.
All functions you need are here, so refer to their man pages for more info if you want to know what does what...
Best of luck
Converting HTML + Javascript to Word/OpenOffice document programatically
Thanks to all for the answers! In the end I made it work with PhantomJS (custom script converts JS generated parts to images) and Pandoc (which converts the resulting HTML to DOCX).
Related Topics
What Is the Best Tool to Convert Common Video Formats to Flv on a Linux Cli
Will Ctrl+C Send Sigint Signals to Both Parent and Child Processes in Linux
How to Check Whether the Processor Cache Has Been Flushed Recently
How to Use Aio and Epoll Together in a Single Event Loop
Maven: Bash Mvn Permission Denied
Mapping Physical Addresses to Virtual Address Linux
Force Linux to Use Only Memory Over 4G
Compiling a Linux Program for Arm Architecture - Running on a Host Os
How to Get Ec2 Load Balancing Properly Set Up to Allow for Real Time File Syncing
Is There a Clang Mingw Cross Compiler for Linux
Compressing the Core Files During Core Generation
Can't Install Freetds via Yum Package Manager
How to Force Node.Js Require to Be Case Sensitive
How to Detect Usb Device Disconnect Under Linux/Qt/C++
Interprocess Communication via File