Microdata VS Rdfa

Microdata vs RDFa



Differences between Microdata and RDFa

While there are many (technical, smaller) differences, here’s a selection of those I consider important (used my answer on Webmasters as a base).

Specifications

  • As W3C’s HTML WG found no volunteer to edit the Microdata specification, it is now merely a W3C Group Note (see history), which means that there are no plans for any further work on it.

    So the Microdata section in WHATWG’s "HTML Living Standard" is the only place where Microdata may evolve. Depending on what gets changed, it may happen that their Microdata becomes incompatible to W3C’s HTML5.

    Update: In 2017, work started again, with the aim to publish Microdata as W3C Recommendation.

  • RDFa is published as W3C Recommendation.

Applicability

  • Microdata can only be used in (X)HTML5 (resp. HTML as defined by the WHATWG).

  • RDFa can be used in various host languages, i.e. several (X)HTML variants and XML (thus also in SVG, MathML, Atom etc.).

    And new host languages can be supported, as RDFa Core "is a specification for attributes to express structured data in any markup language".

Use of multiple vocabularies

  • In Microdata, it’s harder, and sometimes impossible, to use several vocabularies for the same content.

  • Thanks to its use of prefixes, RDFa allows to mix vocabularies.

Use of reverse properties

  • Microdata doesn’t provide a way to use reverse properties. You need this for vocabularies that don’t define inverse properties (e.g., they only define parent instead of parent & child). The popular Schema.org is such a vocabulary (with only a few older exceptions).

    While the W3C Note Microdata to RDF defines the experimental itemprop-reverse, this attribute is not part of W3C’s nor WHATWG’s Microdata.

  • RDFa supports the use of reverse properties (with the rev attribute).

Semantic Web

  • By using Microdata, you are not directly playing part in the Semantic Web (and AFAIK Microdata doesn’t intend to), mostly because it’s not defined as RDF serialization (although there are ways to extract RDF from Microdata).

  • RDFa is an RDF serialization, and RDF is the foundation of W3C’s Semantic Web.


The specifications RDFa Core and HTML+RDFa may be more complex than HTML Microdata, but it’s not a "fair" comparison because they offer more features.

Similar to Microdata would be RDFa Lite (which "does work for most day-to-day needs"), and this spec, at least in my opinion, is way less complex than Microdata.

What to do?

If you want to support specific consumers (for example, a search engine and a browser add-on), you should check their documentation about supported syntaxes.

If you want to learn only one syntax and have no specific consumers in mind, (attention, subjective opinion!) go with RDFa. Why?

  • RDFa matured over the years and is a W3C Rec, while Microdata is a relatively new invention and not standardized by the W3C.
  • RDFa can be used in many languages, not only HTML5.
  • RDFa allows mixed use of vocabularies for the same content, and it natively supports the use of reverse properties.

Can’t decide? Use both.

Note that you can also use several syntaxes for the same content, so you could have Microdata and RDFa (and Microformats, and JSON-LD, and …) for maximum compatibility.

  • Here’s a simple Microdata snippet:

    <p itemscope itemtype="http://schema.org/Person">
    <span itemprop="name">John Doe</span> is his name.
    </p>
  • Here’s the same snippet using RDFa (Lite):

    <p typeof="schema:Person">
    <span property="schema:name">John Doe</span> is his name.
    </p>
  • And here both syntaxes are used together:

    <p itemscope itemtype="http://schema.org/Person" typeof="schema:Person">
    <span itemprop="name" property="schema:name">John Doe</span> is his name.
    </p>

But it’s typically not necessary/recommended to go down this route.

RDFa Vs. Microdata

  1. Yes, Microdata is also a W3C standard: HTML Microdata. There are two standards because RDFa was designed to address the requirements of the RDF community, and the WHATWG HTML5 editor found the living RDFa spec too advanced, so designed his own thing.

  2. This is a very subjective question, but I'd say the advantages of RDFa are: it makes it easier to combine multiple vocabularies/schemas, it has better support for datatypes and the like, and is compatible with all the other RDF-based standards in case you care about that. The advantage of Microdata is that it's simpler to grasp for HTML authors. Here's a big comparison of both formats (done by and RDFa guy, so perhaps a bit biased).

Microdata, RDFa or JSON-LD Appropriate or best usage?

Schema.org is a vocabulary that can, like any other vocabulary, be used in many forms. The website http://schema.org/ has examples using Microdata and the RDF syntaxes RDFa and JSON-LD, but these are not the only syntaxes it can be used with. You could, for example, use it with any other RDF syntax like Turtle or RDF/XML.

There is no best syntax. They all have advantages and disadvantages. See for example my answer about differences between Microdata and RDFa. Note that you can use different syntaxes (and vocabularies) in the same document.

Now, if you have a specific consumer in mind, you should consult their documentation. However, support of syntaxes comes and goes, and not everything they might support is necessarily documented, and not everything that is documented necessarily works.

In case of Google, you are probably interested in their Rich Snippets. Their documentation about Rich Snippets mentions Microdata, Microformats and RDFa. However, note that not all linked examples use the Schema.org vocabulary, but the older Data-vocabulary.org or Microformats (as you can’t use vocabularies like Schema.org nor Data-vocabulary.org with Microformats). And there are also some Rich Snippets that aren’t listed on that page, like the Sitelinks Search Box, for which they even recommend the JSON-LD syntax.

As general advice: Search engines typically favor visible content over hidden metadata. For example, having keywords as hidden metadata easily allows authors to claim that their documents are about something different than they really are (either because of trying to trick the search engine, or because authors forget to update content in both places). Therefore, uncoupling the metadata from the content, like it’s the case with JSON-LD, could (possibly!) lead to the same issues current search engines have with hidden metadata. (If or which search engines actually handle it like that is a question which is off-topic on Stack Overflow.)

Another possible advantage for coupling the metadata with the content (for example, with RDFa), is that you could easily and automatically generate the same information in JSON-LD, Turtle etc. because everything’s just RDF. Just parse the RDFa, convert to formats of your preference, and embed (in script) or link (with rel-alternate) it if it makes sense.

But yes, adding RDFa is often more complex than adding a JSON-LD blob, because you have to adapt it to the existing markup. (However, it should not "break validation" unless you’re making mistakes.)

What is the relationship between RDF, RDFa, Microformats and Microdata

<sarcasm>I can't imagine what you find so confusing</sarcasm> (edit: these tags were invisible before)

Very briefly:

  • Microformats A way to use html pages as both a human readable document and machine readable data, without repetition (e.g. sticking CSV in the head element).
  • RDF A data model designed for the web. Schemaless, uses URLs to name types and relations.
  • RDFa A way to encode (write) RDF in html, following the style of microformats (i.e. minimising repetition). Works by adding a few attributes to html.
  • Microdata An alternative to RDFa, using different attributes and a different data model.

Less briefly, RDF came from attempts to make data 'webby'. There was even a plug-in for browsing RDF's precursor MCF (from Apple, curiously). The data model was designed so that you could write what you want without having to pre-arrange how to column names or key meanings with everyone else on the web. RDF was standardly written (serialised) using XML (although other, more pleasant, formats are available).

So in this world you might have a readable homepage (me.html) and a data homepage (me.rdf) for machines to scoop up. Those machines won't understand the meaning of <p>I live in <a href="http://example.com/Chicago">Chicago</a></p>, but they might be able to use :me ex:livesIn <http://example.com/Chicago> if they look up 'livesIn'.

Microformats also tries to make data 'webby', or perhaps that should be the web 'data-y'. The insight here is that there's a lot of data in web pages, between the prose. If you have a few hints a machine can work out that that piece of html above is basically an address. Those hints are microformats. Typically they use conventions around html class names to indicate that the content is more than text.

So microformats don't need a separate web page for machines. But microformats only cover a few types of data (addresses, friend links, position...) because of the way they work. Each is a particular convention which needs to be agreed. Used without care they can mix badly, too.

RDFa is an attempt to get both the flexibility of RDF and the simplicity of microformats.
Microdata was a response to RDFa, but with a different data model (roughly arrays and hashes rather than a relational model). Unlike microformats neither dictate what sort of information is provided, just how it is encoded.

To complete what's missing we now have schema.org, which is an initiative from the big search engines (Google, Bing, Yahoo, Yandex) to index web data. schema.org coordinates those meanings so the search engine knows that, say, events have locations. Microdata was originally used by schema.org, but now both microdata and RDFa are supported.

I'd ignore microformats these days and go for schema.org. My personal taste is the RDFa encoding, but either it or microdata should work fine.

microformats, rdf or microdata

*Edit, May 2015: Times have changed... again. Schema.org seems the way to go, using either microdata (W3C note) or RDFa (W3C recommendation), where the RDFa Lite variant is easiest to learn. Meanwhile recently Microformats released a new version as well, which nobody is paying attention to currently.

Also see the answer to What is the relationship between RDF, RDFa, Microformats and Microdata?

Edit, August 2011: Times have changed. Forget my recommendation below. Just use microdata and forget that the other two exist.

Microformats: the oldest and the simplest of the three. If the existing specs cover your needs (that is, you want to mark up addresses, events, friend links, or another one of the supported data types), then they are a nice and practical choice. The problem is that you cannot make your own microformat if you want to mark up some kind of data that's not supported by the official specs.

RDFa: This one is based on W3C's RDF data model (it's basically a way of embedding RDF data into HTML pages). RDF has been around for a long time and there's a large amount of fancy tools for doing stuff with RDF data (stores, search engines, query languages, graph visualizers and so on). So RDFa takes you into this big existing ecosystem. But this also makes RDFa kind of complicated, and the learning curve is steeper than for the other proposals.

Microdata: This is Ian Hickson's counter-proposal to RDFa. In spirit, it is an extensible version of microformats. It doesn't have the RDF connection and is simpler than RDFa. It's still very new and hasn't seen much adoption yet, so it's a bit early to tell. Update: schema.org really seals the deal here.

My recommendation would be to go with microformats if they cover your need, and RDFa otherwise.

Schema.org: Use Microdata, RDFa or JSON-LD?

There is no general answer, it depends on the consumer of the data.

A specific consumer supports a specific set of syntaxes, and might or might not recommend a subset of these supported syntaxes.

Because search engines usually try to make sure not to get lead astray (e.g., a page about X claims via its Schema.org use to be about Y), it seems natural that they would prefer a syntax that couples the Schema.org metadata to the visible content of the page (in HTML5, this would be Microdata or RDFa); for the same reasons why many meta tags are dead for SEO.

However, this is not necessarily always the case. Google, for example, recommends the use of JSON-LD for a few of their features (bold emphasis mine):

  • Promote Critic Reviews:

    Though we strongly recommend using JSON-LD, Google can also read schema.org fields embedded in a web page with the microdata or RDFa standards.

  • Sitelinks Search Box:

    We recommend JSON-LD. Alternatively, you can use microdata.

What is RDFa + microdata / microformats and why might I need them?

Semantics. Nowadays, the main reason is so Google can show Rich Snippets. To see how this works, put your page into http://www.google.com/webmasters/tools/richsnippets and see what it picks up. Try the examples on that page to see what they are using in search results.

Google also have a page called http://schema.org/ that outlines their microdata schema for marking things up.

In my opinion microdata is the way to go, microformats overloaded the class attribute, and though I understand it's within the spec, for me it's no longer the best place to put this info. RDFa is really good, but it's complex and not that obvious to most. Microdata is clear, well defined and backed by Google, hence my recommendation to use that.

On my sites, I've completely replace microformats with microdata, despite being a super early adopter and follower of the whole microformat system.

Microdata or JSON-LD? I'm confused

Consumers that support Microdata support Microdata, no matter if or where Microdata is specified.

It’s conceivable that new consumers might decide not to support it, but the syntax is still very popular and still part of WHATWG’s HTML Living Standard, so it’s probably not going to vanish.

About the consumer Google

Some years ago, JSON-LD was not supported for many of their features, and they recommended that authors use Microdata (and they supported RDFa, too). Today it’s different.

See Google’s Markup formats and placement:

JSON-LD is the recommended format. Google is in the process of adding JSON-LD support for all markup-powered features. The table below lists the exceptions to this. We recommend using JSON-LD where possible.

According to the mentioned table, Microdata and RDFa support all of Google’s data types, while JSON-LD supports everything except their Breadcrumbs feature.

I wouldn’t give much weight to their recommendation. They say that "Structured data markup is most easily represented in JSON-LD format", but I think it’s safe to say that this only applies to authors that generate the structured data programmatically (especially from tools that support JSON).

For authors that manually add the structured data markup, it’s typically easier to use Microdata or RDFa, and using these syntaxes minimizes the risk that an author updates the content without updating the structured data, too (see DRY principle).

JSON-LD vs. Microdata vs. RDFa

Unless you know (and care for) consumers that don’t support all three syntaxes, it doesn’t matter. Use what is easier for you and your tools.

If you have no preference, I would say JSON-LD or RDFa, because contrary to Microdata,

  • both are W3C Recommendations,
  • both can be used in non-HTML5 contexts,
  • both allow to (easily) mix several vocabularies.

JSON-LD if you like your structured data not "intermingled" with your markup (= duplicating the content), RDFa if you like to use your existing markup (= not duplicating the content).

Statistics about Microformat vs HTML+RDFa adoption

Now I see, there are some statistics (!!), the link of Wikipedia was lost... I corrected. It isn't updated, is from "Winter 2013" (~1.5 or 2 years old collected data), but show reality and tendencies.

http://webdatacommons.org/structureddata/index.html#toc2

This is the chart at the report (with RDFa+HTML dominance!):

Sample Image

Sample Image

Interpreting:

  • the section 5, "Extraction Process", say that "on each page, we run our RDF extractor based on the Anything To Triples (Any23) library", so all (RDF and Microformat) resulted in "triples" (not only RDF).

  • The ideia for "per domain" statistics is that domains use uniform politics for all pages... But I think this uniformity is false, only few pages per domain adopt "semantic markup" ... It is not more unbiased than URLs, is only another picture. Anyway, the outcome was dead heat, ~57% vs 43%.

  • Only 21% of the "semantic markup URLs" of 2013 was Microformat, all other are RDFa-HTML (Microdata is also a kind of RDFa).

  • using the average of percentuals of Domains (Ds) and URLs (Us), (Ds+Us)/2, the outcome is ~60% for RDFs and ~40% for Microformats.

  • before 2013 there was a dominance of Microformats, so, is evident the big growing of "RDFa-HTML" since 2011... The tendency is clear.

  • If we adopt the arithmetic mean of "per domain" and "per URL" countings, we have Microformats and RDFa-HTML near each other, with but with little less Microformat (and the strong tendency to RDFa-HTML grow in 2014).

Here a table for @sashoalm discussion, showing the percentuals and totals

Sample Image


NOTE1: HTML5 was released only 2014-10-28, so only ~2015-10 we will can check the real (definitive) impact of the new standard on the Web. An important expected impact is that Microdata not was blessed by HTML5, so the only standard is HTML+RDFa (that recommends RDFa Lite)... In the future perhaps there will less Microdata and more schema.org.

NOTE2: methodological problem of counting web-pages, of boilerplate text with some huge-cloned "semantic markup": I think that the "next generation" of statiscs can use some "per domain analisys" to make URL substatistics (sampling) of diversity (of semantically marked pages). Ideal is to weigh (p. ex. count once the non-clones and use 1+SQRT(count) of clones) the boilerplate.

Conclusion

Today perhaps some people use Microformat, but there are more pages in the Web using RDFa-HTML (Microdata, RDFa, RDFa Lite, etc.), and the tendency is to grow.

If your project is for next years, the statistics say to use RDFa.



NOTE

Another insteresting counting for RDFa is not the use, but the reuse of vocabularies (!). See Linked Open Vocabularies (LOV)

LOV



Related Topics



Leave a reply



Submit