How to Successfully Use Rdap Protocol Instead of Whois

How to successfully use RDAP protocol instead of whois

Direct short answer

The answer to "How to successfully use RDAP protocol instead of WHOIS?" is: there are no way to successfully use RDAP, only, perhaps, to try some experimental implementation... But even experimental, I not know how to try it.

Answer to your wrong hypothesis

The URL that you used is wrong, so, in part, your question details started with wrong hypothesis.

The domain RDAP.ORG is not an "official authority", it is a owned by a commercial organization, so it is a false ".ORG". There are a footer at rdap.org pages acknowledging that is no offcial service, and is a tau.uk.com playground. Try some RDAP client.

Best interpretation and long answer

This ICANN report of 2015-12-03, "Registration Data Access Protocol (RDAP) Operational Profile for gTLD Registries and Registrars", have some clues, some history, ... and fished with

(...) it is premature to include a requirement for all gTLDs in the RDAP Profile
(...) A call for volunteers is planned by January 2016.

So, no one decides to enforce RDAP on all registrars... And there are no "call for volunteers" announced this year.

Dream with an intermediate alternative

The main problem with WHOIS today is the "free interpretation" of the published information. There are no "standard Rosetta stone", but we can start one (!), to offer a maturity intermediary for RDAP.

2019 news

Starting 2019-08-26 it will be an ICANN requirements (hence for all gTLDs) to have an RDAP server at registries and registrars.

From https://www.icann.org/rdap:

RDAP Timeline

gTLD registries and registrars are required to implement an RDAP
service by 26 August 2019. ICANN org continues to work with gTLD
registries and registrars to implement a service-level agreement and
registry reporting requirements for RDAP.

Read more on RDAP timeline at https://www.icann.org/resources/pages/rdap-background-2018-08-31-en

2020 news

RDAP services are working! You can try it in a RDAP client (as old whois client) or directelly in the top-level domain authority's API. Examples:

  • Authority's RDAP API endpoint of the top-level domain .com is https://rdap.verisign.com/com/v1/. For example the Brazilian's UOL.COM is described at https://rdap.verisign.com/com/v1/domain/uol.com .
  • Authority's RDAP API endpoint of the top-level domain .org is https://rdap.publicinterestregistry.net/rdap/org/. For example W3C's domain is described at https://rdap.publicinterestregistry.net/rdap/org/domain/w3c.org
  • There are many RDAP clients working fine, the OpenRDAP.org is one. You don't need to know the authority or its RDAP API endpoint, is direct.
    Examples: rdap -v uol.com or rdap -v w3c.org.
  • Online clients also can be used. Examples: client.rdap.org resolving uol.com or openrdap.org resolving w3c.org.

Entity transparency: the transparency policies are local. For example UOL.com.br is registered by the .br authority, and Registro.BR obligates that all domain name owner must to be revealed (see CNPJ entry). Check it by a good and universal client, like OpenRDAP, rdap -v uol.com.br.

Rdap query has less results than whois for google.com?

TL;DR: the registrar concerned by the domain you choose as example is not following the regulations and indeed is not showing contact data through RDAP while it is showing it through whois; this is not what is supposed to happen and should be fixed at some point; it is not a defect of the protocol, just one actor not following the specifications. If you try with other names (at other registrars) you should get better results.

But since your problem may also come from other reasons, please find below more explanations.

This problem is not necessarily specific to RDAP, you have the exact same for whois, for the case of .COM/.NET as this is a thin registry, which means the registry does not have data about contacts.

whois clients typically emulates redirects (that do not exist in the whois protocol) and will first show the registry whois reply (no contacts there for a .COM) and then continue on the registrar whois reply (which has contacts).

You do not see these 2 steps by default if you do not pay attention with whois clients as it is an operational detail.

But RDAP being structured gives you the links and let you follow them, but your client needs to do it.

Let us start from scratch to follow a methodology that will work for all cases, and just manually emulating an RDAP client using wget and jq.

1) Finding authoritative RDAP server

The process is basically outlined by RFC 7484, but let us do it manually.

IANA is the authoritative source here, so if you go to http://data.iana.org/rdap/dns.json you find the authoritative RDAP server for .COM, which is: https://rdap.verisign.com/com/v1/

2) Querying registry RDAP server

Per RDAP specifications, from the base URL above you know you need to use
https://rdap.verisign.com/com/v1/domain/google.com as your first step
(i.e. concatenation of base URL, then domain, then the domain name you are after).

You can emulate it manually by something like wget -O - https://rdap.verisign.com/com/v1/domain/google.com | jq .

You will get a lot of data but nothing about contacts for the reasons outlined above that has nothing to do with the fact that you are using RDAP, it is just that the registry does not have the contact data.

But the reply gives you information on where to go next to have the missing data.
If you look closely at the returned JSON data you have this part:

  "links": [
{
"value": "https://rdap-core.vrsn.com/com/v1/domain/GOOGLE.COM",
"rel": "self",
"href": "https://rdap-core.vrsn.com/com/v1/domain/GOOGLE.COM",
"type": "application/rdap+json"
},
{
"value": "https://rdap.markmonitor.com/rdap/domain/GOOGLE.COM",
"rel": "related",
"href": "https://rdap.markmonitor.com/rdap/domain/GOOGLE.COM",
"type": "application/rdap+json"
}
],

Pay close attention to the rel property.
First link (it is an array in the response), has rel=self which means it gives you the canonical URL that represents the object for which you just got a reply. Using it again should give you the exact same reply - if the object did not change of course - and it is useful to keep the source URL in the document itself. And the fact that it is not the same as we used then the base URL differs from what exists at IANA is just an operational detail without consequences here.

But look at the second one with rel=related. If you look at RDAP specifications and ICANN rules, this is explained as to be the link to get more data, that is the registrar part for cases of split registry/registrars model like in all gTLDs.

So we should use that link for next step.

3) Querying registrar RDAP server

With wget -O - https://rdap.markmonitor.com/rdap/domain/GOOGLE.COM | jq .
if we search for the entities part, where contacts are located, we get:

  "entities": [
{
"objectClassName": "entity",
"handle": "292",
"events": [
{
"eventAction": "registrar expiration",
"eventDate": "2020-09-14T04:00:00.000+0000"
}
],
"roles": [
"registrar"
],

...

And indeed then no other entity, that is no other role than registrar.
This RDAP server of this registrar did not provide back any contact data, contrary to its whois access. This is obviously against the specification, and this server is not compliant under current ICANN rules.

Unfortunately, there is probably nothing you can do at your level to change that. It will change, as ICANN will start at some point to enforce things, but until then you will need to live with such broken cases, as there are multiple others.

4) Same for other domain, better results

If you repeat the above with another name, say stackoverflow.com you reach another registrar and in the final reply you can see:

  "entities": [

...

{
"objectClassName": "entity",
"handle": "",
"vcardArray": [
"vcard",
[
[
"version",
[],
"text",
"4.0"
],
[
"org",
{
"type": "work"
},
"text",
"Stack Exchange, Inc."
],
[
"adr",
[],
"text",
[
"",
"",
"",
"",
"NY",
"",
"US"
]
]
]
],
"roles": [
"registrant"
],
"remarks": [
{
"title": "REDACTED FOR PRIVACY",
"type": "object truncated due to authorization",
"description": [
"Some of the data in this object has been removed."
]
}
]
},

As you can see by registrant in roles, this structure describe registrant data. However, due to GDPR and hence ICANN temporary specification, most of the data is redacted and in fact not there. You have basically just the registrant name and country, in the vCard part.

5) Summary

Three points to remember here:

  • one of the advantages of RDAP (over whois) is exactly to be able to convey clear links on where to go next to get more information; this is the process outlined above
  • for now this relates only to COM/NET names as these TLDs are run under a thin registry model, one where the registry does not have contact data; note that this is bound to disappear: even if the process is postponed multiple times at ICANN it is indeed pending and in some future COM/NET will work like any other gTLD as the registry will have all contact data
  • all the above is heavily influenced by GDPR that restricts the amount of data shown nowadays in whois, specifically about contacts. As the future model of tiered access is not known today, maybe we will still have a multiple steps querying process to get more data on contacts depending on who requests the data.

Is there a listing of known whois query output formats?

"TL;DR: I need a source for as many different output formats from a whois query as possible."

There isn't, except if you use any kind of provider that does this for you, with whatever caveats.
Or more precisely there isn't something public, maintained and exhaustive. You can find various libraries that try to do this, in various languages, but none is complete, as this is basically an impossible task, especially if you want to include any TLDs, like ccTLDs (you are not framing your constraints space in a very detailed way, nor in fact really saying you are asking about domain name data in whois or IP addresses/ASN data?).

Some providers of course try to do that and offering you an abstract uniform API. But why would anyone share their internal secret sauce, that is list of parsers and so on? It makes no business incentive to do that.
As for opensource library authors (I was one at some point), it is just tedious and absolutely not rewarding at all to just update it forever with all new formats and tweaks per registry (battle scar example: one registrar in the past changed its output format at each query! one query gave you somefield: somevalue while next time it was somefield:somevalue or somefield somevalue, etc. of course that is only a simple example).

RFC 3912 specified just the transport part, not the content, hence a lot of cases appeared. Specifically in the ccTLD world, each registry is king in its kingdom and it is free to implement whatever it wants the way it wants. Also the protocol had some serious limitations (ex: internationalization, what is the "charset" used for the underlying data) that were circumvented in different ways (like passing "options" in your query... of course none of them are standardized in any way)

At the very least, gTLDs whois format is specified there:
https://www.icann.org/resources/pages/approved-with-specs-2013-09-17-en#whois

Note however that due to GDPR there were changes (see https://www.icann.org/resources/pages/gtld-registration-data-specs-en/#temp-spec) and will be other changes in the future.

However, you should be highly pressed to look at RDAP instead of whois.

RDAP is now a requirement in all gTLDs registries and registries. As it is JSON, it solves immediately the problem of format.

Its core specifications are:

  • RFC 7480 HTTP Usage in the Registration Data Access Protocol (RDAP)
  • RFC 7481 Security Services for the Registration Data Access Protocol (RDAP)
  • RFC 7482 Registration Data Access Protocol (RDAP) Query Format
  • RFC 7483 JSON Responses for the Registration Data Access Protocol (RDAP)
  • RFC 7484 Finding the Authoritative Registration Data (RDAP) Service

You can find various libraries doing RDAP for you (see below for links), but at its core it is JSON over HTTPS so you can emulate simple cases with any kind of HTTP client library.

Work is underway to fix some missing/not precise enough details on RFC 7482 and 7483.

You need also to take into account ICANN specifications (again, only for gTLDs of course):

  • https://www.icann.org/en/system/files/files/rdap-technical-implementation-guide-15feb19-en.pdf
  • https://www.icann.org/en/system/files/files/rdap-response-profile-15feb19-en.pdf

Note that, right now, even if it is an ICANN requirement, you will find a lot of missing or broken gTLD registries or registrar RDAP server. You will also find a lot of "deviations" in replies from what would be expected per the specification.

I gave full details in various other questions here, so maybe have a look:

  • https://stackoverflow.com/a/61877920/6368697
  • https://stackoverflow.com/a/48066735/6368697
  • https://webmasters.stackexchange.com/a/115605/75842
  • https://security.stackexchange.com/a/213854/137710
  • https://serverfault.com/a/999095/396475

PS: philosophical question on "Hoping there is a simple "go here and download this" answer. Hoping..." because a lot of people hoped for that in the past, and see initial remark at beginning. Let us imagine you go forward and build this magnificent resource with all exhaustive details. Would you be inclined to just share it with anyone, for free? The answer is probably no, for obvious reasons, so the same happened in the past for others that went on the same path as you, and hence the results of now various providers offering you more or less this service (you would need to find details on which formats are parsed, the rate limites, the prices, etc.), but nothing freely available to share.

Now you can just dream/hope that every registries and registrars switch to RDAP AND implement it properly. Then the problem of format is solved once for all. However, the above requirements ("every" + "properly") are not small, and may not happen "soon". Specifically in ccTLDs, where registries are in no way mandated by any external force (except market pressure?) to implement RDAP at all.

How do I findout domain expiry date of a .org and .in website in java

Do not bother with the whois protocol.

Now (since August 26th, 2019) per ICANN requirements, all gTLDs need to have an RDAP server. RDAP is like the successor of whois: kind of the same content exchanged but this time on top of HTTPS with some fixed JSON format. Hence, trivial to parse.

The expiry date will be in the "events" array with an action called "expiration".

You can go to https://data.iana.org/rdap/dns.json to find out the .ORG RDAP server, it is at URL https://rdap.publicinterestregistry.net/rdap/org/

You need to learn a little more about RDAP to understand how to use it (structure of the query and the reply), you can find some introduction at https://about.rdap.org/

But in short your case, this emulates what you need to do:

$ wget -qO - https://rdap.publicinterestregistry.net/rdap/org/domain/slashdot.org | jq '.events[] | select(.eventAction | contains("expiration")) | .eventDate'
"2019-10-04T04:00:00.000Z"

PS1: if you get no match from a whois query normally it really means that the domain does not exist; it could also be because of rate limiting

PS2: .IN may not have an RDAP server yet, since it is a ccTLD it is not bound by ICANN rules.



Related Topics



Leave a reply



Submit