How To Check Local Websites For Broken Links

Ensure all links and images exists

  • Published: November 1, 2016

Overview

After generating a website statically, it is a good practice to check that there are no broken links or images, there are some popular command line tools for this, in this case I will be using htmlproofer.

HTMLProofer is a set of tests to validate your HTML output. These tests check if your image references are legitimate, if they have alt tags, if your internal links are working, and so on. It's intended to be an all-in-one checker for your output.

Install

It can be installed directly with gem:

$ gem install html-proofer

Or using bundle:

Adding gem 'html-proofer' to the file Gemfile and then:

$ bundle install

Options

  • --allow-hash-href: If true, ignores the href #
  • --as-links: Assumes that PATH is a comma-separated array of links to check.
  • --alt-ignore image1,[image2,...]: A comma-separated list of Strings or RegExps containing imgs whose missing alt tags are safe to ignore
  • --assume-extension: Automatically add extension (e.g. .html) to file paths, to allow extensionless URLs (as supported by Jekyll 3 and GitHub Pages) (default: false).
  • --checks-to-ignore check1,[check2,...]: An array of Strings indicating which checks you’d like to not perform.
  • --check-external-hash: Checks whether external hashes exist (even if the webpage exists). This slows the checker down (default: false).
  • --check-favicon: Enables the favicon checker (default: false).
  • --check-html: Enables HTML validation errors from Nokogiri (default: false).
  • --check-img-http: Fails an image if it’s marked as http (default: false).
  • --check-opengraph: Enables the Open Graph checker (default: false).
  • --check-sri: Check that <link> and <script> external resources do use SRI (default: false).
  • --directory-index-file: Sets the file to look for when a link refers to a directory. (default: index.html)
  • --disable-external: If true, does not run the external link checker, which can take a lot of time (default: false)
  • --empty-alt-ignore: If true, ignores images with empty alt tags
  • --error-sort SORT: Defines the sort order for error output. Can be :path, :desc, or :status (default: :path).
  • --enforce-https: Fails a link if it’s not marked as https (default: false).
  • --extension EXT: The extension of your HTML files including the dot. (default: .html)
  • --external_only: Only checks problems with external references
  • --file-ignore file1,[file2,...]: A comma-separated list of Strings or RegExps containing file paths that are safe to ignore
  • --http-status-ignore 123,[xxx, ...]: A comma-separated list of numbers representing status codes to ignore.
  • --report-invalid-tags: Ignore check_html errors associated with unknown markup (default: false)
  • --report-missing-names: Ignore check_html errors associated with missing entities (default: false)
  • --report-script-embeds: Ignore check_html errors associated with scripts (default: false)
  • --log-level <level>: Sets the logging level, as determined by Yell. One of :debug, :info, :warn, :error, or :fatal. (default: :info)
  • --only-4xx: Only reports errors for links that fall within the 4xx status code range
  • --timeframe <time>: A string representing the caching timeframe.
  • --url-ignore link1,[link2,...]: A comma-separated list of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored
  • --url-swap re:string,[re:string,...]: A comma-separated list containing key-value pairs of RegExp => String. It transforms URLs that match RegExp into String via gsub. The escape sequences \: should be used to produce literal :s.
  • --internal-domains domain1,[domain2,...]: A comma-separated list of Strings containing domains that will be treated as internal urls.
  • --storage-dir PATH: Directory where to store the cache log (default: “tmp/.htmlproofer:)
  • -h, --help: Show this message
  • -v, --version: Print the name and version

Executing

To test for broken links, just specify the directory, for example using the common Jekyll output directory _site:


$ htmlproofer ./_site

Checking a Hugo static web generated website for broken links:


$    htmlproof --check-html \
        --http-status-ignore 999 \
        --internal-domains localhost:1313 \
        --disable-external \
        --assume-extension \
        public

References

Uruguay
Marcelo Canina
I'm Marcelo Canina, a developer from Uruguay. I build websites and web-based applications from the ground up and share what I learn here.
comments powered by Disqus
Except as otherwise noted, the content of this page is licensed under CC BY-NC-ND 4.0 ·