Links are the very foundation of the web. They connect web resources with each other and make it possible for visitors to navigate between pages and allow pages to reference images and other content.
Unfortunately, unlike diamonds, links are not forever. They have a tendency to break over time. Companies go out of business, servers are shut down, blog posts get deleted, domains expire… the web is dynamic, and there are lots of reasons why a link that works today might stop working tomorrow.
At best, a broken link is merely annoying and results in a poor user experience. At worst, it can pose a security threat to anyone visiting the website.
Imagine what could happen if Google shuts down their Analytics service and later lets the google-analytics.com domain expire. There would be millions of websites left with obsolete script code that attempts to load and run code from https://www.google-analytics.com/analytics.js
. A third-party could snatch up the expired domain and serve malicious JavaScript code under this URL. This is one form of an attack called Broken Link Hijacking.
Broken Link Hijacking is an exploit in which an attacker gains control over the target of a broken link.
Typical candidates for link hijacking include:
Depending on how the hijacked link is embedded into the website’s code, there are different ways to exploit the vulnerability, with varying levels of risks.
If you have embedded an external script into your website (using code like this: <script src="https://example.com/script.js"></script>
) and the link’s domain name gets taken over, an attacker can inject arbitrary code into the site.
You might ask what harm could come from some extra JavaScript code. The answer is plenty. Here are a few examples of how an attacker could exploit this vulnerability:
The possibility to execute attacker-supplied code basically makes this a Stored Cross-Site Scripting (XSS) vulnerability, which Bugcrowd classifies as a P2 (high risk) issue.
A hijacked link to an image (<img src="https://example.com/image.jpg">
) or style sheet (<link href="https://example.com/styles.css" rel="stylesheet">
) is not as bad as a hijacked script link, but can still have serious security implications:
background: url("https://example.net/hacked.gif")
) and to inject text (body::before { content: "HACKED!" }
).Attacks like these are often referred to as defacement or content spoofing and typically fall into Bugcrowd’s P4 (low risk) category.
It’s also worth noting that each request made to an attacker-controlled external server leaks information about both the website and the visitor. The attacker is able to track who visits the site (IP address, browser user-agent, referring website) and how often.
When you link to an external page from your site (<a href="https://example.com/">Link</a>
), this link can be seen as a recommendation. You are indicating that the content of the page is relevant and worth a visit, otherwise you wouldn’t have included the link as part of your own content.
Gaining access to the target of the link allows an attacker to exploit the trust that your visitors give you and your recommendation in order to:
This is basically an impersonation attack. The attacker pretends that the linked website is legitimate and from a trusted source. Bugcrowd rates Impersonation via Broken Link Hijacking as P4 (low risk).
Subresource Integrity (SRI) allows you to ensure that linked scripts and style sheets are only loaded if they haven’t changed since the page was published. This is accomplished by computing a cryptographic hash of the content and adding it to the <script>
or <link>
element via the integrity
attribute (as a base64-encoded string):
<script src="https://example.com/script.js" integrity="sha384-/u6/iE9tq+bsqNaONz1r5IjNql63ZOiVKQM2/+n/lpaG8qnTYumou93257LhRV8t" crossorigin="anonymous"><script>
Before executing a script or applying a style sheet, the browser compares the requested resource to the expected integrity
hash value and discards it if the hashes don’t match.
By adding a Content-Security-Policy
HTTP header to your server’s responses, you can restrict which domains resources can be loaded from:
Content-Security-Policy: default-src 'self' example.net *.example.org
In this example, resources (such as scripts, style sheets, images, etc.) may only be requested from the site’s own origin (self
, excluding subdomains), example.net
(excluding subdomains) and example.org
(including subdomains). Requests to other origins are blocked by the browser.
A Content Security Policy doesn’t help when one of your trusted domains gets hijacked, but it does make sure that you don’t accidently embed resources from unexpected sources, whether that’s due to a simple typo or an obsolete link on an old and long-forgotten page.
Broken links happen. And when they do, it’s always better to know sooner than later, before an attacker might exploit the issue. Our link checker, Dr. Link Check, allows you to schedule regular scans of your website and notifies you of new link problems by email. Our crawler not only looks for typical issues like 404s, timeouts, and server errors, but also checks if links lead to parked domains.
Quite often, redirects are an early indicator that a link might break soon. When a website is redesigned and restructured, redirects are used to map the old URL structure to the new one. This typically works fine for the first redesign, but with each new restructure, the redirect chains get longer and longer, with more potential breaking points. It’s therefore advisable to keep a close eye on redirected links and update them if necessary.
In order to identify redirects on your website, run a scan in Dr. Link Check and click on one of the items in the Redirects section of the Overview report to see the details.
A broken external link doesn’t just disrupt the visitor experience; it can also have serious security implications. An attacker might be able to hijack the broken link and gain control over the link’s target. In the worst case, this can lead to an account takeover and the theft of sensitive data.
Using modern browser security features like Subresource Integrity and Content Security Policy you can mitigate these risks. Regular crawls with a broken link checker help you identify broken links early and reduce the attack surface.
Finishing the initial version of your site is only the first step on your journey as a website owner. Now that you have a website, tracking its vital statistics is crucial for success.
It’s easy to overlook the trivial things that negatively impact your website’s professionalism, security, Google rankings, and ultimately the revenue you make from it.
Luckily, there are a variety of tools and services that take the grunt work out of managing a website. When you maintain these eight essential elements of any successful website, your site’s odds of being successful will increase tremendously.
If your website is suddenly getting a lot more (or fewer) visitors, then it’s good to know when it started happening and why so that you can either capitalize on its newfound popularity or refresh your SEO strategy. It’s also useful to see which devices your users browse your site with, which sites they came from, and where they’re located.
User analytics packages (e.g., Google Analytics or Cloudflare Web Analytics) allow you to track a variety of statistics about the people who use your website, and what they do on it. Google Analytics can even send you an email alert if certain conditions are met (e.g., a sudden spike in traffic).
If you’re getting a lot of hits to your homepage but only a few purchases, then you can also see how much time users are spending on your site, and how many of them make it to each step in the process of buying something.
Links you make to other sites could suddenly stop working at any time if a website that you linked to is revamped, or a domain is sold to someone else. As bad as 404 errors are for your professionalism, the worst situation is when a website changes hands and redirects to something like a phishing site, or a parked domain full of ads.
To avoid having to manually check every link all the time, Dr. Link Check makes sure that all the links on your entire site (including images and external stylesheets) load correctly, have valid SSL certificates, aren’t on a blacklist of malware and phishing sites, and haven’t been parked.
After crawling every link across your whole website, Dr. Link Check prepares a searchable report and lets you download the data as a CSV file to do your own analytics.
A website can’t be successful if it’s down, so services like Uptime Robot and Pingdom check your website’s status every few minutes to make sure it hasn’t encountered an outage. As soon as it does, these services will alert you via an SMS message, email, or various other contact methods, so you can get it working again as quickly as possible.
Uptime Robot can also check protocols other than HTTP/S and generate status pages. Pingdom includes a full performance monitoring and analytics solution, as well. Each will load your site from multiple locations to determine if an outage is only affecting people in a certain geographical area.
Nothing erodes user trust and confidence quite like a security breach, so it’s of the utmost importance to avoid them entirely. Even if you follow good development practices and keep all your software up to date, it’s still possible to mess up somewhere, leaving an opportunity for a hacker to sabotage your business.
While automated tools aren’t a perfect substitute for a professional security audit, tools including Website Vulnerability Scanner, Mozilla Observatory, and WP-Scan (if you have a WordPress-based site) can help pinpoint configuration errors, XSS and SQL injection bugs, and outdated server software to keep your customers’ data secure.
Whether you simply have a few outdated plugins, or you forgot to sanitize user input in a hardly used form, an automated check can be the quickest way to find security bugs before hackers take advantage of them.
Your site’s place in search results for any given search term is always changing. Therefore, it’s important to be notified if you suddenly slip off the first page of results for a specific query.
SERPWatcher and RankTrackr are services that check your site’s position in search results on a daily basis and send you a message when it suddenly changes. Additionally, both offer dashboards that display all the different keywords that lead to your site, and where your site has ranked for those searches over time.
Many of these services can also track interactions from social media sites and widgets, so you can completely understand how your users find your website.
Forgetting to renew your SSL certificate is just as bad as your site going down unexpectedly, but with the added consequence that many users may lose trust in your site’s security. Worse, not renewing your domain on time could allow someone to buy it and use it for something else entirely.
To avoid potential customers being turned away by “your connection is not secure” errors, set a calendar reminder and use a service like CertsMonitor to make absolutely certain that you renew on time. Many registrars and certificate merchants offer auto-renew, as well, so you can truly “set it and forget it”.
Did you forget to add a title to a page? Did you miss ALT attributes on a few images? No robots.txt? Search engines will drop your page’s position in the rankings if you don’t fix issues like these.
SEOptimer and RavenTools crawl your site and find every instance of SEO mistakes on every page. Implementing the suggestions from either tool can significantly boost your rankings on Google and other engines. Google itself also offers tools to identify issues and assess your site’s speed and mobile compatibility.
The PageRank algorithm deep within the heart of Google ranks sites based on the number and quality of links that point to them. The idea is that high-quality websites will be linked from many other well-ranking sites. When your website is linked from a reputable blog or goes viral on a social media platform, you’ll notice that your site is displayed more prominently in search results.
To get notified whenever you get linked from both good and bad sites, Monitor Backlinks will tell you when new links begin pointing to your site. It can also give insight into which websites would give you the most beneficial backlinks.
It’s easy to forget to monitor some vital sign on your website, leading to a significant loss of business. Therefore, using a service to address each of the areas that need to be monitored will allow you to focus on your business, instead of the boring tasks required to keep your website up and running.
From SEO concerns to security, uptime, and even link rot, you can count on these monitoring services to alert you when something goes wrong.
FTP, short for File Transfer Protocol, is an old standard for transferring files from one computer to another. The protocol was first proposed in 1971, long before the advent of the modern TCP/IP-based internet. In spite of its age, FTP is still commonly used, and hundreds of thousands of websites link to files stored on FTP servers (using URLs that start with ftp://
).
Until recently, that wasn’t an issue. All major browsers had built-in support for FTP and were able to handle ftp://
links. This situation is changing. The developers of Chrome disabled FTP in version 88 (released on January 19, 2021), and it’s likely that other browsers will follow suit.
The rationale behind this decision is that FTP in its original form is an insecure protocol that doesn’t support encryption. This is understandable, but practically, it breaks existing ftp://
links for the majority of users.
If you want to make sure that your website is free of ftp://
links, follow the steps below.
Go to https://www.drlinkcheck.com/, enter the URL of your website, and press the Start Check button.
Wait until the check is complete and the website is fully crawled. The number of found ftp://
links is displayed in the Link Schemes section. If there are no ftp: items in the list, the crawler didn’t locate any ftp://
links on your site and you are all good and can skip the rest of the post.
Click the ftp: item under Link Schemes to get to the list of ftp://
links and review each item in the list. If you hover over a link and hit the Details button, you can see which pages contain the link (under Linked from). A click on Source will show you the exact location in the HTML source code.
Now it’s time to decide what to do with the found links. Here are a few options:
https://
URL you can link to.Forty years after its introduction, FTP is slowly being phased out as a protocol for serving files on the internet. With major browsers dropping support for FTP, now is a good time to clean up your website and get rid of all FTP links. You surely don’t want your website to appear outdated and broken.
Links are one of the most important factors in how search engines determine the ranking of a website. If you place a link from your website to another, search engines consider this a vote for the relevance and quality of the linked content. Just like a recommendation to friends or family in real life, a link is something that you put your good name behind.
If you don’t want to endorse a link, adding rel="nofollow"
tells search engines to ignore it when ranking pages. This makes sense for sites that you want your visitors to warn about (like a scam or a hack), user-generated links that you haven’t reviewed yet (like in the comments section of a blog), or links in ads that you were paid for placing on your website.
Considering the importance of marking links as nofollow (or not), it’s a good idea to periodically check if all your outbound links are correctly qualified. With a small website of only a few pages, you may be able to do this by hand, but larger sites require an automated solution. This is where Dr. Link Check comes into play. Our service is not only a great broken link checker, but can also give you an inventory of all the links on your website.
Head to the Dr. Link Check homepage, enter the address of your website, and click on Start Check.
While crawling your website, Dr. Link Check displays the number of found Dofollow and Nofollow links under Dofollow/Nofollow on the Overview page. Click on Nofollow to list the found links.
As you are probably only interested in links pointing to other websites, limit the report to outbound links by clicking the Add button in the Filter bar on the top and selecting Direction from the menu.
The list of links might also contain links found in image (<img src=...>
) or script (<script src=...>
) tags. If you want to restrict the report to normal page links, click again on Add and select Link type from the filter menu.
With these two filters, you've now identified all outbound nofollow links on your website. If you want to see all dofollow links instead, simply click on Nofollow in the filter bar and select Dofollow from the menu.
If you have a paid subscription (Standard plan and above), you can create rules for which links to follow and which links to ignore:
While previously a rule could only be based on the URL of a link (example: Url ENDSWITH ".gif"
), it’s now also possible to include or exclude links based on where they are found in the HTML code. As an example, the following exclude rule makes our crawler ignore all <a>
tags found inside HTML elements that have class="footer"
:
HtmlElement = ".footer a"
If you have worked with CSS before, the syntax should already be familiar to you. The table below lists the supported ways of selecting HTML elements:
Selector | Description |
---|---|
.class |
Selects all elements that have the specified class |
#id |
Selects an element based on the value of its ID attribute |
element |
Selects all elements that have the specified tag name |
[attr] |
Selects all elements that have an attribute with the specified name |
[attr=value] |
Selects all elements that have an attribute with the specified name and value |
[attr~=value] |
Selects all elements that have an attribute with the specified name and a value containing the specified word (which is delimited by spaces) |
[attr|=value] |
Selects all elements that have an attribute with the specified name and a value equal to the specified string or prefixed with that string followed by a hyphen (-) |
[attr^=value] |
Selects all elements that have an attribute with the specified name and a value beginning with the specified string |
[attr$=value] |
Selects all elements that have an attribute with the specified name and a value ending with the specified string |
[attr*=value] |
Selects all elements that have an attribute with the specified name and a value containing the specified string |
* |
Selects all elements |
A B |
Selects all elements selected by B that are inside elements selected by A |
A > B |
Selects all elements selected by B where the parent is an element selected by A |
A ~ B |
Selects all elements selected by B that follow an element selected by A (with the same parent) |
A + B |
Selects all elements selected by B that immediately follow an element selected by A (with the same parent) |
A, B |
Selects all elements selected by A and B |
Equipped with this knowledge, it’s possible to construct quite powerful rules:
Example | Matched HTML elements |
---|---|
HtmlElement = "#search > a, #filter > a" |
<a> tags directly under elements with the IDs search or filter |
HtmlElement = "a[rel~=nofollow]" |
<a> tags with a rel="nofollow" attribute |
HtmlElement = "img[src$=.png]" |
<img> tags with an src attribute value ending in .png |
HtmlElement = ".comments *" |
Everything inside elements with a comments class |
HtmlElement = "head > link[rel=alternate][hreflang=en-us]" |
All <link rel="alternate" hreflang="en-us"> elements directly inside the <head> tag |
This feature has been in beta for a while and has proven to be quite a valuable addition. We hope you will find it as useful as we do. If you run into any problems or have a suggestion, please drop us a note.
As the owner of Dr. Link Check, a broken link checker app, I have a love-hate relationship with 404 pages. On the one hand, I do everything I can to help our customers find and fix broken links and keep website visitors from running into 404 errors. On the other hand, I truly enjoy a unique and creative 404 page that sets a positive tone and lightens the mood.
While there are still too many websites that use the plain and dull default pages that come with their web servers, some custom 404 pages are gorgeous masterpieces. These hidden error pages are often the places where web designers can show their creativity and express themselves outside the bounds of what was specified by corporate.
For this article, I modified our crawler to search the web for the most beautiful 404 pages. The crawler went through a total of 500,000 websites, reducing the number of candidates to a few thousand, which I then manually narrowed down to the 40 pages presented below.
I hope these designs will give you the inspiration you’ll need to create your own distinctive 404 page. Crafting a pretty error page will not only be fun, but also makes sense from a customer conversion point of view. An appealing 404 page design can take away some of the bitterness of the broken link that a user just encountered. There’s a good chance the user will be pleasantly surprised and stay on your site rather than immediately hitting the back button and never visiting you again.
Now, enough with the chit-chat and on to why you’re actually here: the 404 pages.
Compiling this list of 404 pages was a fun thing for me to do. I hope you enjoyed the article and found some inspiration for your own website projects. If you already have a unique 404 page that you’re proud of, or have designed one based on this article, please message me via Twitter @wulfsoft and tell me about it. I’d love to see your creations.