Whilst working on a client site recently I noticed that, due to our use of MaxCDN, the entire site was accessible over the *.netdna-cdn domain. Luckily we have measures in place to prevent duplication, otherwise the site would have been indexed twice by search engines – not ideal, to say the least.
This did however get me thinking as to whether there was a way we could find out instantly when our site is served via the wrong domain like this, even when we might not know that the domain exists in the first place.
After a few evenings writing code and this post, here’s a solution which I’ve dubbed DomainCanary. It’s a snippet which you can quickly add through Google Tag Manager, and will send you an email like below when your site loads from a domain where it shouldn’t do.
How to implement
1) Generate your unique token
To protect your email from prying eyes and scrapers, DomainCanary requires you to generate a token linked to your email address.
Head over to the registration form and submit the form there to get your email token.
Note: Your token can be used across multiple sites.
2) Add the tracking code via Tag Manager
Create a Custom HTML Tag within Tag Manager named “DomainCanary”, which uses the “All Pages” Trigger, and add the code snippet below.
- Replace YOUR_ORIGIN with the full origin your site should be loading from. An origin is the protocol plus the full domain (i.e. https://www.strategiq.co). Be sure to not add any trailing slashes!
- Replace YOUR_KEY with the token you generated in Step 1.
How it works
When the code you added via GTM runs, it does a check to see whether the current origin matches anything in the whitelist you specified. If it doesn’t, it’ll send a request with your email token and the current domain to the DomainCanary backend (a small Node.JS app hosted on Heroku).
When the backend receives the request it’ll check whether it’s already sent a notification for the domain and, if it hasn’t, it’ll send one to the email address linked to the token.
Triggering by GoogleBot
Due to the fact that JS is executed by the web rendering service and not GoogleBot itself (and therefore doesn’t seem to have a GoogleBot-related user-agent), we haven’t been able to definitely confirm that Google triggers DomainCanary. However, from our testing, DomainCanary will be triggered when you do a fetch and render via Google Search Console , so we’re fairly confident that it does.
If you can shed any more light on this to help us confirm, please do share!
You may get notified that your site has loaded on https://gtm-msr.appspot.com — a domain related to Google Tag Manager when it loads through an iFrame. This is normal.
DomainCanary only stores your email so it can send the notification whilst also protecting your address. I don’t want it for marketing or anything nefarious. If you don’t trust me, that’s fine — I’ve made the code available for you to review on Github, and you can self-host via Heroku if that’s how you roll.