How to Read Xmp Data from a Jpg with PHP

How can I read XMP data from a JPG with PHP?

XMP data is literally embedded into the image file so can extract it with PHP's string-functions from the image file itself.

The following demonstrates this procedure (I'm using SimpleXML but every other XML API or even simple and clever string parsing may give you equal results):

$content = file_get_contents($image);
$xmp_data_start = strpos($content, '<x:xmpmeta');
$xmp_data_end = strpos($content, '</x:xmpmeta>');
$xmp_length = $xmp_data_end - $xmp_data_start;
$xmp_data = substr($content, $xmp_data_start, $xmp_length + 12);
$xmp = simplexml_load_string($xmp_data);

Just two remarks:

  • XMP makes heavy use of XML namespaces, so you'll have to keep an eye on that when parsing the XMP data with some XML tools.
  • considering the possible size of image files, you'll perhaps not be able to use file_get_contents() as this function loads the whole image into memory. Using fopen() to open a file stream resource and checking chunks of data for the key-sequences <x:xmpmeta and </x:xmpmeta> will significantly reduce the memory footprint.

Writing XMP Metadata in jpeg (with PHP) - Using Single or Multiple rdf:Description blocks

About RDF

It appears that what Photoshop is doing is reading a valid, well formed, RDF/XML serialization of some data, and then displaying it back to the user in UI in another valid, well-formed, RDF/XML serialization that happens to follow some additional conventions.

RDF is a graph-based data representation. The fundamental piece of knowledge in RDF is the triple, also called a statement. Each triple has a subject, a predicate, and an object. Subjects, predicates, and objects may all be IRI references; subjects and objects can also be blank nodes, and objects may also be literals (e.g., a string). RDF/XML is one particular serialization of RDF. The RDF/XML snippet:

<rdf:Description rdf:about="" xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/">
<photoshop:Instructions>OOOInstructions</photoshop:Instructions>
<photoshop:Headline>OOOHeadline</photoshop:Headline>
<photoshop:CaptionWriter>OOO </photoshop:CaptionWriter>
</rdf:Description>

contains three triples:

<this-document> <http://ns.adobe.com/photoshop/1.0/Instructions> "OOOInstructions"
<this-document> <http://ns.adobe.com/photoshop/1.0/Headline> "OOOHeadline"
<this-document> <http://ns.adobe.com/photoshop/1.0/CaptionWriter> "OOO "

where <this-document> is the result of resolving the reference "" (the value of the rdf:about attribute. (Page 21 of the XMP documentation says that the value of the rdf:about attribute may be an empty string …, which means that the XMP is physically local to the resource being described. Applications must rely on knowledge of the file format to correctly associate the XMP with the resource".)

Doing

<rdf:Description rdf:about=""
xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/">
<Iptc4xmpCore:IntellectualGenre/>
</rdf:Description>

<rdf:Description rdf:about=""
xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/">
<photoshop:Instructions>OOOInstructions</photoshop:Instructions>
<photoshop:Headline>OOOHeadline</photoshop:Headline>
<photoshop:CaptionWriter>OOO </photoshop:CaptionWriter>
</rdf:Description>

is exactly the same as doing

<rdf:Description rdf:about=""
xmlns:Iptc4xmpCore="http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/"
xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/">
<Iptc4xmpCore:IntellectualGenre/>
<photoshop:Instructions>OOOInstructions</photoshop:Instructions>
<photoshop:Headline>OOOHeadline</photoshop:Headline>
<photoshop:CaptionWriter>OOO </photoshop:CaptionWriter>
</rdf:Description>

They serialize the same set of triples. Neither is invalid or incorrect. It's just a matter of which you prefer. Other variations are possible as well. For instance, in some cases you can use element attributes to indicate property values. The triple:

<this-document> <http://ns.adobe.com/photoshop/1.0/Instructions> "OOOInstructions"

can be seralized using elements, as described in Section 2.2 Node Elements and Property Elements of the RDF/XML recommendation:

<rdf:Description rdf:about="" xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/">
<photoshop:Instructions>OOOInstructions</photoshop:Instructions>
</rdf:Description>

or using attributes to indicate the property value, as described in Section 2.5 Property Attributes of the same document:

<rdf:Description rdf:about="" xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"
photoshop:Instructions="OOOInstructions">
</rdf:Description>

So, as to your second question:

Why should I spend the time to format my output to the RDF specs when it works nicely all jumbled together in a single rdf:Description?

If the output is supposed to be in RDF, you should make it valid RDF. Whether it's in a particular aesthetically pleasing format is an entirely different question. It's relatively easy to translate between the two of these, and I expect that what Photoshop is doing is reading a blob of RDF as it should (i.e., not depending on any particular structure of the XML serialization, since that's not always the same (e.g., you shouldn't try to manipulate RDF with XPath)) and then formatting that data for the user in a way that it considers nice, namely, the convention that you mentioned.

If you're not already, I very strongly suggest that you use an RDF library in PHP to construct the metadata graph, and not try to construct the RDF/XML serialization by hand.

About XMP in RDF

Note: this is an update based on the documentation. According to the documentation, page 19, XMP only supports a subset of RDF, so it is still a meaningful question about whether the RDF above and in the question, though suitable as RDF, is suitable as XMP. However, also from page 19:

The sections below describe the high-level structure of XMP data in an XMP Packet:

  • The outermost element is optionally an x:xmpmeta element
  • It contains a single rdf:RDF element
  • which in turn contains one or more rdf:Description elements
  • each of which contains one or more XMP Properties.

Page 20 contains some elaboration about the rdf:Description elements (emphasis added):

The rdf:RDF element can contain one or more rdf:Description elements.
… By convention, all
properties from a given schema, and only that schema, are listed
within a single rdf:Description element. (This is not a requirement,
just a means to improve readability.)

The part with added emphasis is what we need in order to conclude that both forms we've seen above are acceptable. It's probably easier to just create one big blob, and consider yourself lucky if some other tool splits it into the conventional form for you.

how to get Description metadata of an image?

Different formats

(Without any link to the file you tested, I assume that...)

The reason is that Exif (Exchangeable image file format) is not the only metadata format and that it does not know any item for descriptions. Most likely, because it is primarily inserted by cameras, not programs. There are other metadata formats:

  • IPTC (International Press Telecommunications Council):
    record 2 ("application"), dataset 120 ("Caption/Abstract: A textual description of the objectdata, particularly used where the object is not text.")
  • XMP (Extensible_Metadata_Platform):
    a description element in any namespace, mostly <dc:description>
  • RIFF (Resource Interchange File Format):
    INFO chunk, but without any "description" item
  • QTFF (QuickTime File Format):
    list atoms ©des, desc, ldes, sdes, dscp, or key atom com.apple.quicktime.description

Files versus metadata

Which meta formats can be expected in which file formats?















































































File formats \ Metadata formatsExifIPTCXMPRIFFQTFFproprietary
JFIF/JPEGcomment
TIFF, CR2, ORF, DNG, RAW, JPEG-XR, NIFF, MDImany
PNG, JNG, MNGfree text
GIFcomment
WebPmany
JPEG2000, JPEG-XL, HEIFmany
PSDcaption

Remove EXIF data from JPG using PHP

Use gd to recreate the graphical part of the image in a new one, that you save with another name.

See PHP gd


edit 2017

Use the new Imagick feature.

Open Image:

<?php
$incoming_file = '/Users/John/Desktop/file_loco.jpg';
$img = new Imagick(realpath($incoming_file));

Be sure to keep any ICC profile in the image

    $profiles = $img->getImageProfiles("icc", true);

then strip image, and put the profile back if any

    $img->stripImage();

if(!empty($profiles)) {
$img->profileImage("icc", $profiles['icc']);
}

Comes from this PHP page, see comment from Max Eremin down the page.

Reading a File's Metadata

While I have not used this myself the XMP PHP Toolkit on sourceforge sounds like just what you might be looking for: http://xmpphptoolkit.sourceforge.net/ That being said - it's in alpha and hasn't been updated in over a year it appears.

XMP Toolkit PHP Extension is a PHP module which includes the Adobe XMP
Toolkit SDK. This PHP5 extension will provide classes and methods to
manipulate XMP Metadatas from files like jpegs, tiff, png, but also
wav, mp3, avi, mpeg4, pdf, ai, eps… It’s based from the Adobe XMP
Toolkit SDK 4.4.2. The goal of this extension is to have php classes
which can open files, extract metadatas, manipulate them, and put them
back within few lines of php code. This project is under GPL v3
License.

You are also be able to write arbitrary metadata to an image file with iptcembed. As you mention in your comment this only works for JPEG files.

http://php.net/manual/en/function.iptcembed.php

Here is a script from the comments of a class that will get and set IPTC data:

<?

/************************************************************\

IPTC EASY 1.0 - IPTC data manipulator for JPEG images

All reserved www.image-host-script.com

Sep 15, 2008

\************************************************************/

DEFINE('IPTC_OBJECT_NAME', '005');
DEFINE('IPTC_EDIT_STATUS', '007');
DEFINE('IPTC_PRIORITY', '010');
DEFINE('IPTC_CATEGORY', '015');
DEFINE('IPTC_SUPPLEMENTAL_CATEGORY', '020');
DEFINE('IPTC_FIXTURE_IDENTIFIER', '022');
DEFINE('IPTC_KEYWORDS', '025');
DEFINE('IPTC_RELEASE_DATE', '030');
DEFINE('IPTC_RELEASE_TIME', '035');
DEFINE('IPTC_SPECIAL_INSTRUCTIONS', '040');
DEFINE('IPTC_REFERENCE_SERVICE', '045');
DEFINE('IPTC_REFERENCE_DATE', '047');
DEFINE('IPTC_REFERENCE_NUMBER', '050');
DEFINE('IPTC_CREATED_DATE', '055');
DEFINE('IPTC_CREATED_TIME', '060');
DEFINE('IPTC_ORIGINATING_PROGRAM', '065');
DEFINE('IPTC_PROGRAM_VERSION', '070');
DEFINE('IPTC_OBJECT_CYCLE', '075');
DEFINE('IPTC_BYLINE', '080');
DEFINE('IPTC_BYLINE_TITLE', '085');
DEFINE('IPTC_CITY', '090');
DEFINE('IPTC_PROVINCE_STATE', '095');
DEFINE('IPTC_COUNTRY_CODE', '100');
DEFINE('IPTC_COUNTRY', '101');
DEFINE('IPTC_ORIGINAL_TRANSMISSION_REFERENCE', '103');
DEFINE('IPTC_HEADLINE', '105');
DEFINE('IPTC_CREDIT', '110');
DEFINE('IPTC_SOURCE', '115');
DEFINE('IPTC_COPYRIGHT_STRING', '116');
DEFINE('IPTC_CAPTION', '120');
DEFINE('IPTC_LOCAL_CAPTION', '121');

class iptc {
var $meta=Array();
var $hasmeta=false;
var $file=false;

function iptc($filename) {
$size = getimagesize($filename,$info);
$this->hasmeta = isset($info["APP13"]);
if($this->hasmeta)
$this->meta = iptcparse ($info["APP13"]);
$this->file = $filename;
}
function set($tag, $data) {
$this->meta ["2#$tag"]= Array( $data );
$this->hasmeta=true;
}
function get($tag) {
return isset($this->meta["2#$tag"]) ? $this->meta["2#$tag"][0] : false;
}

function dump() {
print_r($this->meta);
}
function binary() {
$iptc_new = '';
foreach (array_keys($this->meta) as $s) {
$tag = str_replace("2#", "", $s);
$iptc_new .= $this->iptc_maketag(2, $tag, $this->meta[$s][0]);
}
return $iptc_new;
}
function iptc_maketag($rec,$dat,$val) {
$len = strlen($val);
if ($len < 0x8000) {
return chr(0x1c).chr($rec).chr($dat).
chr($len >> 8).
chr($len & 0xff).
$val;
} else {
return chr(0x1c).chr($rec).chr($dat).
chr(0x80).chr(0x04).
chr(($len >> 24) & 0xff).
chr(($len >> 16) & 0xff).
chr(($len >> 8 ) & 0xff).
chr(($len ) & 0xff).
$val;

}
}
function write() {
if(!function_exists('iptcembed')) return false;
$mode = 0;
$content = iptcembed($this->binary(), $this->file, $mode);
$filename = $this->file;

@unlink($filename); #delete if exists

$fp = fopen($filename, "w");
fwrite($fp, $content);
fclose($fp);
}

#requires GD library installed
function removeAllTags() {
$this->hasmeta=false;
$this->meta=Array();
$img = imagecreatefromstring(implode(file($this->file)));
@unlink($this->file); #delete if exists
imagejpeg($img,$this->file,100);
}
};

?>

Example read copyright string:

$i = new iptc("test.jpg");
echo $i->get(IPTC_COPYRIGHT_STRING);

Update copyright statement:

$i = new iptc("test.jpg");
echo $i->set(IPTC_COPYRIGHT_STRING,"Here goes the new data");
$i->write();

php support for WEBP image metadata

It is better to use ExifTool for this.

Install ExifTool

https://exiftool.org/

PHP example

class ExifToolException extends RuntimeException{}

function getInfo(string $file) : object
{
$info = shell_exec('exiftool -json ' . escapeshellarg($file) . ' 2>&1');
if(strpos($info, 'Error:') > -1) {
throw new ExifToolException(rtrim($info, PHP_EOL));
}
return json_decode($info)[0];
}

try {
var_dump(getInfo('abc.webp')->Megapixels);
} catch(ExifToolException $e) {
var_dump($e->getMessage());
}

Update: ExifTool does not support writing webp

Instead you can look at webpmux from Google:
https://developers.google.com/speed/webp/docs/webpmux

Extract EXIF data from an image blob (binary string) in PHP

A stream wrapper can turn your string/image into something useable as a filehandle, as shown here. However, I can't see any way of turning that filehandle into something that can masquerade as the filename that exif_read_data expects.

You might try passing the data:// pseudo-url listed on that page and see if the exif function will accept it.



Related Topics



Leave a reply



Submit