Imagemagick: generate raw image data for PDF flate embedding?
Ok, fixed it; the problem was that one had to specify 8-bit depth in the convert
command line; thus the correct invocation is:
convert -depth 8 -size 150x150 gradient:\#4b4-\#bfb rgb:test.raw
Then we have:
du -b test.raw # 67500 bytes
python -c "import zlib,sys;sys.stdout.write(zlib.compress(sys.stdin.read()))" < test.raw > test.flate
du -b test.flate # 664 bytes
# replace /Length 664, and then:
perl -ne 's/^###/`cat test.flate`/e;print' hello.pdf > hello2.pdf
Finally, the hello2.pdf
opens in evince
and displays the bitmap correctly:
Btw, I found this because I'm actually trying to debug an image in another document; so I basically did the following:
# extract and save the stream of this image object
qpdf --show-object=23 --raw-stream-data mybadfile.pdf > myraw.file
# get raw binary data - deflate the saved object stream
python -c "import zlib,sys;sys.stdout.write(zlib.decompress(sys.stdin.read()))" < myraw.file > myraw.deflate
identify myraw.deflate
# identify: no decode delegate for this image format `myraw.deflate' @ constitute.c/ReadImage/530.
identify rgb:myraw.deflate
# identify: Must specify image size `myraw.deflate' @ rgb.c/ReadRGBImage/155.
identify -size 588x508 rgb:myraw.deflate
# rgb:myraw.deflate=>myraw.deflate RGB 588x508 588x508+0+0 16-bit TrueColor DirectClass 875KiB 0.020u 0:00.030
# identify: Unexpected end-of-file `myraw.deflate': No such file or directory @ rgb.c/ReadRGBImage/261.
display -size 588x508 rgb:myraw.deflate
# display: Unexpected end-of-file `myraw.deflate': No such file or directory @ rgb.c/ReadRGBImage/261. ### but it shows correctly, except for size?
identify -depth 8 -size 588x508 rgb:myraw.deflate
# rgb:myraw.deflate=>myraw.deflate RGB 588x508 588x508+0+0 8-bit TrueColor DirectClass 875KiB 0.020u 0:00 ## OK
display -depth 8 -size 588x508 rgb:myraw.deflate
# OK; choosing rgba: is already bad - so confirmed 8-bit rgb
Hope this helps someone,
Cheers!
Create small high quality PDF embedding optimized PNG?
PDF Spec suggests PNG is supported?
PNG isn't supported per se; PDF allows embedding JPEG images as-is, but not PNG images. PDF does borrow a set of features of the PNG format, however.
rinohtype (full disclosure: I'm the author) tries to embed as much as possible from PNG images as-is into the PDF. This does involve some bit-juggling to separate the alpha channel from the color data for example, but no reencoding of the image is performed. It does not (yet) support interlaced PNGs.
rinohtype should be able to do what you want to achieve. But please note that it currently is in a beta stage, so you might encounter some bugs.
Even plain text PDF files are surprisingly large
To keep the PDF size as small as possible, make sure not to embed/subset any of the fonts. Use only the fonts from the base 14 PDF fonts which are provided by PDF readers.
Re-encoding only images of a PDF? (or, ghostscript fails on 8-bit RGB while optimizing)
First, if you find a Ghostscript bug, please report it to us as http://bugs.ghostscript.com
Secondly I suggest you update the current shipping version of 9.05 which probably has this bug fixed.
How to generate plain-text source-code PDF examples that work in a document viewer?
You should append a (syntactically correct) xref
and trailer
section to the end of the file. That means: each object in your PDF needs one line in the xref table, even if the byte offset isn't correctly stated. Then Ghostscript, pdftk or qpdf can re-establish a correct xref and render the file:
[...]
endobj
xref
0 8
0000000000 65535 f
0000000010 00000 n
0000000020 00000 n
0000000030 00000 n
0000000040 00000 n
0000000050 00000 n
0000000060 00000 n
0000000070 00000 n
trailer
<</Size 8/Root 1 0 R>>
startxref
555
%%EOF
PDF: Object Stream with FlateDecode
This PDF is encrypted. PDF file trailer is:
endobj
startxref
116
%%EOF
Cross reference stream @byte offset 116 (with some formatting) is:
<</DecodeParms<</Columns 5/Predictor 12>>
/Encrypt 389 0 R
% ... etc
/Type/XRef /W[1 3 1]
>> stream
Encryption dictionary 389 0 R (formatted) is:
389 0 obj <<
/CF <<
/StdCF <<
/AuthEvent /DocOpen
/CFM /AESV2
/Length 16
>>
>>
/EncryptMetadata false
/Filter /Standard
/O (...) % binary owner key
/P -1084
/R 4
/StmF /StdCF
/StrF /StdCF
/U (...) % binary user key
/V 4
/Length 128
>>
endobj
The PDF 32000 ISO States:
7.6.1
General
A PDF document can be encrypted (PDF 1.1) to protect its contents from unauthorized access. Encryption
applies to all strings and streams in the document's PDF file, with the following exceptions:
• The values for the ID entry in the trailer
• Any strings in an Encrypt dictionary
• Any strings that are inside streams such as content streams and compressed object streams, which themselves are encrypted
The referenced object is content stream in an encrypted PDF. In order to process this stream, you need to implement encryption (AESV2 in this case) and decrypt streams before applying other filters.
Note: this PDF is encrypted with a blank user password, so it opens in most viewers without the need to enter a user password.
Type 3 fonts conversion
PostScript has mostly the same filters as PDF. You don't need to decompress the data, just use the FlateDecode filter in PostScript and leave the compressed data untouched.
Note you'll need Language Level 3 for Predictor 15 (or any other PNG predictor) but that shouldn't be a problem, level 3 has been the standard for 18 years.
Otherwise you'll need to implement a version of the FlateDecode filter which supports the PNG Predictor. I believe zlib is quite capable of this.
[EDIT]
Your 'PostScript output' is incomplete, you are using PDF operators (q and Q) which you have not provided a definition for. Apart from anything else this makes it impossible to run the code through an interpreter. Kindly supply a complete simple example file, as requested. Not pasted code, I'm not inclined to go and create a file myself, and besides, binary doesn't cut and paste at all well.
Off the top of my head from desk checking I can't immediately see a problem, but since I can't run the code, I could easily be missing something.
[EDIT 2]
And that file, unsurprisingly, works fine.
You haven't supplied the PostScript file that you are creating. Its rather hard for me to tell what's wrong with the PostScript you created by looking at the PDF file you started with.
You could, of course, use Ghostscript (and I see you've used it to create the PDF file) to create a PostScript file, and then look at what that contains. If you set -dCompressFonts=false then the output font won't even be compressed.
For example:
37 0 4 -52 33 -1 d1
0.01 0 0 0.01 0 0 cm
q 2900 0 0 -5100 400 -99.9998 cm
BI
/IM true
/W 29
/H 51
/BPC 1
/D[1
0]
/F[/A85
/CCF]
/DP[null
<</K -1
/Columns 29>>]
ID
-D=,M5m+t^0_>op8\HM"Du]KKrr2rthqG/5qU_ik]$f$TlUslD91qoN93j0%dckk:ld^*DV25!+
!WX>~>
EI Q
Of course you'll need to look at the prolog to see how all the procedures used there are defined, but you can do that yourself, you certainly don't need me to do it. Notice that the imagemask uses the CCITTFax and ASCII85 decode filters, its trivial to add additional filters. Since the data is guaranteed to be 'monochrome' (its a mask) the CCITT filter generally gives superior compression to Flate.
Note that if you are really using Ghostscript 9.05 then you should upgrade, that is 6 years old.
It might possibly help if you were to explain why you want to take an ugly, bitmapped, type 3 font from PDF and make an ugly, bitmapped type 3 PostScript font from it.
[EDIT 3]
well looking at your PostScript file, the definition of the glyphs does not match what you've put in your question. The actual content looks like this:
/g10135{
88 0 4 -70 82 8 setcachedevice
q
[
0.01 0 0 0.01 0 0 ] M
q
[7800 0 0 -7800 400 800 ]M
<<
/ImageType 1
/Width 78
/Height 78
/ImageMatrix [ 78 0 0 -78 0 78]
/BitsPerComponent 1
/Decode [1
0]
/DataSource ....binary data.....
<< /Predictor 15
/Columns 78
/BitsPerComponent 1>>
/FlateDecode filter def
>> imagemask
Q
Q
}bind def
You have not supplied either a file, procedure or string source as a value for the DataSource key in the dictionary. Essentially, the PostScript interpreter reads and tokenises the /DataSource
key, and then proceeds to process the binary as PostScript. Unsurprisingly this causes an error 'syntaxerror in (binary token, type=156)' when processed with Ghostscript.
If you had got past that then you would have discovered that the filter
operator takes a data source as well and you haven't supplied one for that either.
So you need to create a data source for your binary data. Up to you how you do that but currentfile
is one way. Or readstring
given that you know the string length.
So something like:
<<
/ImageType 1
/Width 29
/Height 51
/ImageMatrix [29 0 0 -51 0 51]
/BitsPerComponent 1
/Decode [1 0]
/DataSource
<length> string dup
currentfile exch readstring
.....binary data.....
<<
/Predictor 15
/Columns 29
>> /FlateDecode filter
>> imagemask
Obviously you'll have to fill in yourself by knowing the string length. The dictionary argument to FlateDecode looks to me like it shouldn't be needed.
[Edit 4]
I notice that this is appears to be intended for commercial use. Nothing wrong with that, but I'm not going to do all your homework for you, if its your job its up to you to learn the language well enough to do the job.
I'm skipping lightly over the actual implementation details below in an attempt to outline where you are going wrong. In practice things are a little more complex, I haven't discussed how the procedure stored in the CharStrings dictionary is created, or the difference with early name binding (which is an important concept in PostScript).
Your existing code is:
/g10135{
88 0 4 -70 82 8 setcachedevice
q
[
0.01 0 0 0.01 0 0 ] M
q
[7800 0 0 -7800 400 800 ]M
<<
/ImageType 1
/Width 78
/Height 78
/ImageMatrix [ 78 0 0 -78 0 78]
/BitsPerComponent 1
/Decode [1
0]
/DataSource {417 string dup
currentfile exch readstring}
...binary data....
<< /Predictor 15
/Columns 78
>>/FlateDecode filter def
>> imagemask
Q
Q
}bind def
So, the PostScript interpreter reads those bytes one at a time, and converts them into tokens. This either results in an executable token, which is executed, or an operation on one of the stacks.
So /g10135
is terminated by the {
character, because that's a reserved character. The /
introduces a name object, so we end up with the name object g10135
which we push on to the operand stack. The {
character introduces an executable array so we put a mark
on the operand stack.
Next we read 88, terminated by a white space character. That's a numeric so we store that on the operand stack, likewise the other numbers. The operand stack now contains:
/g10135
mark
88
0
4
-70
82
8
We then read setcachedevice, which is terminated by a white space. That isn't a standard token so the interpreter starts looking through the dictionaries on the dictionary stack, looking for a definition. Since it is a standard operator, we find it in systemdict and execute it. That consumes 6 operands from the operand stack, it has no other effects (actually it does, but this is a bit special because we are executing inside a font, but we'll ignore that for now).
Next we encounter a q
, again this is looked up in every dictionary on the dictionary stack to find a definition. This is defined in your own prolog as a gsave
, so it takes no operands and returns no operands, it simply saves the graphics state, incrementing the save depth by 1.
I'm not going to go through the rest it would be tedious, however, eventually we reach your /DataSource
, this is a name, so we push it on the operand stack. The next thing we encounter is a {
that's a procedure definition so we push a mark on the operand stack. We then encounter a 417
so we push that, string
, dup
, currentfile
, exch
and readstring
, so our stack looks like:
/DataSource
mark
417
string
dup
currentfile
exch
readstring
Then we get the character }
That is the closing mark for an executable array, so we create the array and push it onto the operand stack:
/DataSource
{....}
Then we return to the procedure and continue executing it. The next thing we find is some binary data so we try to execute that as PostScript binary tokens. Because it isn't valid the interpreter throws an error.
Just creating an executable array is not sufficient to actually execute it. If you look at the outline code I posted at the end of edit 3 above you will note that I did not put the readstring
and so on in an executable array, I simply allowed the interpreter to execute that code immediately.
By doing so the readstring
acts on currentfile
(the actual PostScript program in this case) and reads bytes of data from the current point in that file. The current point will be immediately after consuming the white space which terminates the readstring
, ie the actual binary data. The readstring
operator reads enough bytes from the file to fill the string, leaving the string on the operand stack. The file pointer has moved on to the byte after the binary data, and the interpreter resumes token scanning at that point. So it then creates the FilterParams dictionary puts the /FlateDecode name on the stack and then executes the filter
operator which consumes the name, the dictionary and the string operands, returning a file object. That file object then becomes the value associated with the DataSource key in the image dictionary which is passed to the imagemask
operator.
While I haven't tested that code, its basically correct. There are of course other ways to achieve the same aim.
That's basically about as far as I'm prepared to go with this, you need to go and look at what I've written and compare it with your own program.
Note that the simplest way to investigate this is to take the contents of the CharProc (excluding the setcachedevice) and just run that as a PostScript program.
Related Topics
Bash -C Variable Does Not Get Assigned
Movdqu Instruction + Page Boundary
Knowing If a Remote Port Forward Was Successful
Why Doesn't Set -E Cause a Failure with 'False || False && True'
How to Solve "Bash: Ls: Command Not Found"
Is Ethernet Checksum Exposed via Af_Packet
Qserialport Cannot Open Tty After Application Has Previously Been Run by 'Root'
How to Set a Non-Standard Baudrate on a Serial Port Device on Linux
How to Get the Offset in a Block Device of an Inode in a Deleted Partition
How to Show Dialog Gauge for Wget
How to Read from Text File Line-By-Line and Split the Line by a Character
How to Add a System Call via a Lkm
Measuring Stack Usage for Linux Multi-Threaded App
Why Ln -Sf Does Not Overwrite Existing Link to Directory