PDF Specifications for Coders: Adobe or Iso

PDF file structure: what's the HTML code for?

So I'm wondering, wether if this structure is standard or not,

It is not standard. According to the PDF standard (ISO 32000-2, similarly also already in ISO 32000-1):

The PDF file begins with the 5 characters “%PDF–”

(ISO 32000-2, section 7.5.2 "File header")

Acrobat Reader opens it nonetheless as it uses relaxed criteria

Acrobat viewers require only that the header appear somewhere within the first 1024 bytes of the file.

(Adobe PDF Reference sixth edition, appendix H.3 "Implementation Notes", item 13)

and a number of other PDF processors, in particular viewers, follow Adobe's example and do so, too.

Nonetheless, this is a deviation from the standard.



and what is it used for ?

Apparently that PDF has originally been received from some web page, and this web page seems to have a bug: It sends a HTML starting segment in spite of the request being for a PDF. The PDF library used here (mPDF) outputs an error message to that effect right after the PDF. Due to the relaxed requirements of Adobe Reader and other PDF viewers, though, this bug seems to have gone unnoticed or at least seems to not have been considered grave enough for fixing.



Somehow, I'm having trouble reading//converting using java libraries

While PDF viewers can afford to be quite lax (because their respective user can quickly tell whether the result looks broken and drop the file), automatic PDF processors need to be more strict (because otherwise broken data may be stored in legally required archives or sent out to thousands and thousands of recipients).

What is the difference between pdf markup and pdf annotation terms?

Many annotation types are defined as markup annotations because they are used primarily to mark up PDF documents

source: Section 12.5.6.2 in PDF32000_2008.pdf

In the PDF specification, a Markup is a type of annotation (one which is visible on the page), but not all annotations are visible on the page, and so not all annotations are markups.

See Table 169 in Section 12.5.6.1 PDF32000_2008.pdf.

Of course any particular library could mean something else by Markup (for instance maybe editing the PDF page content itself (which annotating doesn't edit). But generally markup should mean annotating.

What is the correct format of a date string?

ISO 32000-1:2008 is the official standard and superseded the Adobe PDF Reference. Many areas were improved from the 1.7 spec prior to publication by ISO.

In this case, as 32K-1 shows, the extra ' (apostrophe) in the date example is indeed incorrect. If Adobe's products are writing that extra one at the end, it's a bug and we'll see about fixing it.

I will also point out that for the forthcoming ISO 32000-2 (PDF 2.0), DocInfo is deprecated in favor of XMP. So if someone is writing a PDF producer in 2016/2017, they shouldn't be worrying about DocInfo but instead focusing on XMP.

How can I know page orientation in PDF files?

Without using a preëxisting PDF handling library? Nothing "quick" nor "easy" about that.

At the very least, you must be able to read and parse the PDF Page tree, which in turn requires you to read and parse the PDF Object tree (which may be compressed and updated several times).

Scanning the Page tree, you may find pages are rotated and/or have dimensions indicating they are wider than they are high, or the other way around (a common definition of "portrait" and "landscape"). Of course, a page may have its size defined in landscape orientation but then rotated by 90 or 270 degrees.

But it's more complicated than that! Page rotation or size does not define the orientation; ultimately, it's the text on that page that defines it. Suppose a page has a portrait size and is not rotated; yet, it is perfectly possible to have all of its contents (text and graphics) rotated -- sideways to the left or right, upside down, or at any other angle.

Furthermore, for a PDF designed for a book or journal, it's not uncommon to see an upright page with its header and/or footer in the "regular" position, and have content, such as a wide table, rotated.

.. Of course, it's tremendous fun to write all of this by yourself. The official PDF Specification contains enough information to get you started; see PDF specifications for coders: Adobe or ISO?. Make sure to reserve plenty of time to read all of it.

What are some resources for learning to write specifications?

The most important part of development documentation in my opinion, is having the correct person do it.

  • Requirements Docs - Users + Business Analyst
  • Functional Spec - Business Analyst + developer
  • Technical Spec (how the functionality will actually be implemented) - Sr. Developer /
    Architect
  • Time estimates for scheduling purposes - The specific developer assigned to the task

Having anyone besides the Sr. Developer / Architect define table structures / interfaces etc. is an exercise in futility - as the more experienced developer will generally throw most of it out.

Wikipedia is actually a good start for the Functional Spec, which seems similar to your Spec - http://en.wikipedia.org/wiki/Functional_specification.



Related Topics



Leave a reply



Submit