How to Generic .Htaccess to Prevent Hotlink

how to generic .htaccess to prevent hotlink

You can't use variables inside the regex. You can work around this by using a RegEx backreference like so:

RewriteCond %{HTTP_REFERER} ^https?://([^/]+)/ [NC]
RewriteCond %1#%{HTTP_HOST} !^(.+)#\1$
RewriteRule \.(jpe?g|gif|bmp|png|swf|css)$ - [F]

(note the # is just used as a boundry. It could be any character that isn't used in domain-names.)

Allow/deny image hotlinking with .htaccess

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mydomain\.com [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?otherdomain\.com [NC]
RewriteRule \.(gif|jpe?g|js|css)$ - [F,NC,L]

Will work, as this says.

"Refererr is not nothing, and referer is not matching mydomain and referer is not matching otherdomain.

If it were the case that you were trying to do the opposite (blacklist a set of domains from hotlinking) you'd do something like

RewriteCond %{HTTP_REFERER} ^http://(www\.)?baddomain1\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?baddomain2\.com [NC]
RewriteRule \.(gif|jpe?g|js|css)$ - [F,NC,L]

.htaccess, prevent hotlinking, allow big bots, allow some access and allow my own domain without adding it statically

You have the logic in reverse. As written these conditions (RewriteCond directives) will always be successful and the request will always be blocked.

You have a series of negated conditions that are OR'd. These would only fail (ie. not block the request) if all the conditions match, which is impossible. (eg. The Referer header cannot be bing and facebook.)

You need to remove the OR flag on all your RewriteCond directives, so they are implicitly AND'd.

Incidentally, the suggestion in comments from @StephenOstermiller to combine the HTTP_REFERER checks into one (which is a good one) is the equivalent to having the individual conditions AND'd, not OR'd (as you have posted initially).

  1. I want to allow image crawling on my site from a couple of different bots and exclude all others.

Once you've corrected the OR/AND as stated above, this rule will likely allow ALL bots to crawl your site images because bots generally do not send a Referer header. These directives are not really about "crawling", they allow certain websites to display your images on their domain (ie. hotlinking). This is probably the intention, however, it's not what you are stating in point #1.

(To block bots from crawling your site you would need to check the User-Agent request header, ie. HTTP_USER_AGENT - which would probably be better done in a separate rule.)

RewriteCond %{HTTP_REFERER} !^https?://(www\.)?bing\..+$

Minor point, but the +$ at the end of the regex is superfluous. There's no need to match the entire Referer when you are only interested in the hostname. Although these sites probably have a Referrer-Policy set that prevents the URL-path being sent (by the browser) in the Referer header anyway, but it is still unnecessary.

RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1/.* [NC]

In comments, you were asking what this line does. This satisfies points #3 and #4 in your list, so it is certainly needed. It ensures that the requested Host header (HTTP_HOST) matches the hostname in the Referer. So the request is coming from the same site.

The alternative is to hardcode your domain in the condition, which you are trying to avoid.

(Again, the trailing .* on the regex is unnecessary and should be removed.)

This is achieved by using an internal backreference \1 in the regex against the HTTP_REFERER that matches HTTP_HOST in the TestString (first argument). The @@ string is just an arbitrary string that does not occur in the HTTP_HOST or HTTP_REFERER server variables.

This is clearer if you expand the TestString to see what is being matched. For example, if you make an internal request to https://example.com/myimage.jpg from your homepage (ie. https://example.com/) then the TestString in the RewriteCond directive is:

example.com@@https://example.com/

This is then matched against the regex ^([^@]*)@@https?://\1/ (the ! prefix on the CondPattern is an operator and is part of the argument, not the regex).

  1. ([^@]*) - the first capturing group captures example.com (The value of HTTP_HOST).
  2. @@https?:// - simply matches @@https:// in the TestString (part of the HTTP_REFERER).
  3. \1 - this is an internal backreference. So this must match the value captured from the first capturing group (#1 above). In this example, it must match example.com. And it does, so there is a successful match.
  4. The ! prefix on the CondPattern (not strictly part of the regex), negates the whole expression, so the condition is successful when the regex does not match.

So, in the above example, the regex matches and so the condition fails (because it's negated), so the rule is not triggered and the request is not blocked.

However, if a request is made to https://example.com/myimage.jpg from an external site, eg. https://external-site.example/ then the TestString in the RewriteCond directive is:

example.com@@https://external-site.example/

Following the steps above, the regex fails to match (because external-site.example does not match example.com). The negated condition is therefore successful and the rule is triggered, so the request is blocked. (Unless one of the other conditions failed.)

Note that with the condition as written, www.example.com is different to example.com. For example, if you were on example.com and you used an absolute URL to your image using www.example.com then the regex will fail to match and the request will be blocked. This could perhaps be incorporated into the regex, to allow for this. But this is very much an edge case and can be avoided with a canonical 301 redirect earlier in the config.

RewriteCond %{HTTP_REFERER} !^$

This allows an empty (or not present) Referer header. You "probably" do need this. It allows bots to crawl your images. It permits direct requests to images. It also allows users who have chosen to suppress the Referer header to be able to view your images on your site.

HOWEVER, it's also possible these days for a site to set a Referrer-Policy that completely suppresses the Referer header being sent (by the browser) and so bypasses your hotlink protection.

RewriteRule \.(bmp|gif|jpe?g|png|webp)$ - [F,L,NC]

Minor point, but the L flag is not required when the F flag is used (it is implied).

Are you really serving .bmp images?!

Aside: Sites don't necessarily "hotlink"

Some of these external sites (bing, Facebook, Google, Instagram, LinkedIn, Reddit, twitter, etc.) don't necessarily "hotlink" images anyway. They often make their own (resized/compressed) "copy" of the image instead (a bot makes the initial request to retrieve the image - with no Referer - so the request is not blocked).

So, explicitly permitting some of these sites in your "hotlink-protection" script might not be necessary anyway.

Summary

Taking the above points into consideration, the directives should look more like this:

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?(bing|facebook|google|instagram|linkedin|reddit|twitter)\.
RewriteCond %{REQUEST_URI} !^/cross-origin-resources/ [NC]
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1/
RewriteRule \.(gif|jpe?g|png|webp|bmp)$ - [F,NC]

Stop hotlinking using htaccess and non-specific domain code

Try this.

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} ^https?://(www\.)?([^/]+)/.*$ [NC]
RewriteCond %2#%{HTTP_HOST} !^(.+)#(www\.)?\1$ [NC]
RewriteRule \.(bmp|gif|jpe?g|png|swf)$ - [F,L,NC]

Would even work when only one of the referrer or target url has a leading www.

EDIT : (how does this % thing work?)

%n references the n(th) bracket's matched content from the last matched rewrite condition.

So, in this case

  • %1 = either www. OR "" blank (because it's optional; used ()? to do that)
  • %2 = yourdomain.com (without www always)

So, now the rewrite condition actually tries to match

yourdomain.com#stealer.com OR yourdomain.com#www.stealer.com

with ^(.+)#(www\.)?\1$ which means (.+)# anything and everything before # followed by www. (but again optional); followed by \1 the first bracket's matched content (within this regex; not the rewrite condition) i.e. the exact same thing before #.

So, stealer.com would fail the regex while yourdomain.com would pass. But, since we've negated the rule with a !; stealer.com passes the condition and hence the hot-link stopper rule is applied.

Hotlink protection, correct .htaccess rules?

As most apache setups already redirect requests like example.com to example.com/, there is no need for the third condition in your edit. So the code would become

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?mydomain\.com/.*$ [NC]
RewriteRule .*\.(jpg|jpeg|png|gif)$ - [F,NC,L]

Scrapers don't even need to use subdomains, as they can just fake the http headers being sent. No way to prevent this.

The rest of the code is okay. I would use this if I needed it.

Weird Hotlinking protection via .HTACCESS

if i put the link of any images hosted on this other website, directly in the address bar, i see the real image.

Well, the code you have posted should be doing the same thing as well. When you type a URL directly into the browsers address bar, the HTTP Referer (part of the HTTP Request headers) is empty - there is no Referer. And that is what the first condition checks for:

RewriteCond %{HTTP_REFERER} !^$

With the directives you have posted, the RewriteRule is only processed (ie. the redirect occurs) when the HTTP Referer is not empty.

Also, make sure these directives come before your WordPress directives.

Check the network request headers in the Browser's object inspector. Do you see a Referer header?

UPDATE:

RewriteCond %{HTTP_REFERER} !^http://mywebsite.com/.*$      [NC]
RewriteCond %{HTTP_REFERER} !^http://mywebsite.com$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mywebsite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mywebsite.com$ [NC]
RewriteCond %{HTTP_REFERER} !^https://mywebsite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://mywebsite.com$ [NC]
RewriteCond %{HTTP_REFERER} !^https://www.mywebsite.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^https://www.mywebsite.com$ [NC]

Incidentally, all of this is the same as the one-liner (which is preferable) from your original code sample:

RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?mywebsite.com [NC]

The code you had originally is preferable.


Just to summarise, the following should effectively prevent hotlinking:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?example\.com [NC]
RewriteRule \.(jpe?g|png|gif|bmp)$ - [NC,F]

Your websites domain being example.com. This simply returns a 403 Forbidden when an image is hotlinked (not redirected to another image). The first RewriteCond directive allows direct requests, when the HTTP Referer is empty.

If you still want your images viewable from (Google) image search, then you will also need to implement additional conditions for each. For example:

RewriteCond %{HTTP_REFERER} !google\. [NC]

Prevent hotlinking, no www

Try this rule:

RewriteCond %{HTTP_REFERER} !domain\.com [NC]
RewriteRule \.(jpe?g|png|gif)$ - [NC,F,L]

Proper regex for .htaccess redirect and prevention of hotlinks

I need everything other than http:// www.domain.com/ or http:// domain.com/ or domain.com/:

RewriteCond %{REQUEST_URI} !^/$

But, I don't want any of these urls in the results:

RewriteCond %{REQUEST_URI} !^/some_dir/

The matched part will then be appended to:

RewriteRule ^(.*)$ /some_dir/som_subdir/some_file.php?querystring=$1 [L]

So in all it should look something like this:

RewriteCond %{REQUEST_URI} !^/$
RewriteCond %{REQUEST_URI} !^/some_dir/
RewriteRule ^(.*)$ /some_dir/som_subdir/some_file.php?querystring=$1 [L]

Which will make it so when you request something like http://www.domain.com/asa45s.html, it will get internally rewritten to some_dir/som_subdir/some_file.php?querystring=asa45s.html. As for the hotlinking bit:

RewriteCond %{REQUEST_URI} ^/image_dir/
RewriteCond %{REQUEST_URI} \.(png|gif|jpe?g|bmp|ico)$ [NC]
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?domain.com/
RewriteRule ^(.*)$ /some_dir/som_subdir/some_file.php?querystring=$1 [L]

This checks that first the request is for something in the /image_dir/ directory, then that the requested resource ends with a png/gif/jpeg/bmp/ico extension, then that the HTTP referer [sic] does not start with http://www.domain.com/, https://domain.com/ or whatever combination of the 2. If all those are true, then it rewrites the request to the /some_dir/som_subdir/some_file.php file with the original URI as the querystring parameter.



Related Topics



Leave a reply



Submit