Deploy a Jekyll website to Amazon S3 having its URLs without extensions (.html)

Overview

One of the problems in having pretty URLs when hosting a Jekyll site in Amazon S3, is that even we set permalinks URLs without .html extensions, the files generated by Jekyll include this extension.

It relies on the server configuration to be able to detect its content-type1 so it can handle URLs that does not include the .html extension, and serve the corresponding file.

Amazon S3 isn’t able to make this translation, so it leaves us with two options to have URLs without ending in .html:

  • Create each post in Jekyll as a (directory)/index.html file, so it will serve each index.html. E.g.:

    We try to access https://example.com/my-cool-page, then Amazon S3 server will be able to respond to this request if it finds one of these files:

      /my-cool-page/index.html
    

Or

  • Generate HTML files without the extension:

      /my-cool-page
    

Also, the Content-type header of extension-less files should be set to text/html after renaming them.

Remove extension from files before deploying

If configuring Jekyll to generate posts/pages in subdirectories is not an option, then we can remove the .html extension to all the files, except those named index.html, just before deploying them to the server.

An easy way to do it is to have a shell script that removes them after building the site.

find _site/ -type f ! -iname 'index.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done

Command explanation:

-type f
File is of type "regular file"
! -iname 'index.html'
Avoid matching index.html files.
-iname pattern
Base of file name (the path with the leading directories removed) matches shell pattern pattern.
Case insensitive
-print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that -print uses). This allows file names that contain newlines or other types of white space to be correctly interpreted by programs that process the find output. This option corresponds to the -0 option of xargs.
read -d delimiter
The first character of DELIM is used to terminate the input line, rather than newline.

Example:


$ jekyll build
Configuration file: /tmp/j/_config.yml
            Source: /tmp/j
       Destination: /tmp/j/_site
 Incremental build: disabled. Enable with --incremental
      Generating... 
                    done in 1.433 seconds.
$ tree _site/
_site/
|-- about
|   `-- index.html
|-- assets
|   `-- main.css
|-- feed.xml
|-- index.html
`-- jekyll
    `-- update
        `-- 2017
            `-- 04
                `-- 24
                    `-- welcome-to-jekyll.html

7 directories, 5 files
$ find _site/ -type f ! -iname 'index.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done
$ tree _site/
_site/
β”œβ”€β”€ about
β”‚Β Β  └── index.html
β”œβ”€β”€ assets
β”‚Β Β  └── main.css
β”œβ”€β”€ feed.xml
β”œβ”€β”€ index.html
└── jekyll
    └── update
        └── 2017
            └── 04
                └── 26
                    └── welcome-to-jekyll

7 directories, 5 files

Upload files

When uploading the files to the server we must set the correct MIME type to Content-Type: text/html to the files that does not have .html extension, if we don’t set them, then the server will interpret them as Content-Type: binary/octet-stream. Other files will get the correct Content-Type.

The Amazon S3 Command Line Interface has a special parameter to set the correct Content-Type for each file when copying them: aws s3 cp local_directory bucket-name --content-type text/html

In this approach, we are going to copy the files without extension, setting the right Content-Type, and then just copy the rest of the files, leaving that task to the server.

Copy files without extension

Copy local files to the S3 bucket.

aws s3 cp _site/ s3://cachedpage.co/ --content-type text/html --recursive --exclude "*.*"

Copy the rest of the files

aws s3 cp _site/ s3://cachedpage.co/ --recursive --exclude "*" --include "*.*"

Note that, by default, all files are included. This means that providing only an –include filter will not change what files are transferred. –include will only re-include files that have been excluded from an –exclude filter. If you only want to upload files with a particular extension, you need to first exclude all files, then re-include the files with the particular extension.

Final script

The complete process is reflected in the following deploy.sh script:

#!/usr/bin/env bash
#
# Copy Jekyll site to S3 bucket
#
####################################
#
# Custom vars
#
s3_bucket="s3://example.com/"
####################################

set -e # halt script on error
set -v # echo on

echo "Building site..."
JEKYLL_ENV=production bundle exec jekyll build

echo "Removing .html extension"
find _site/ -type f ! -iname 'index.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done

echo "Copying files to server"
aws s3 cp _site/ s3://cachedpage.co/ --recursive --exclude "*" --include "*.*"

echo "Copying files with content type..."
aws s3 cp _site/ $s3_bucket --content-type text/html --recursive --exclude "*.*"

Conclusion

This approach will work in most situations, you have to be careful if you have any other files without extension to avoid setting the wrong media type.

References

  1. The Content-Type entity header is used to indicate the media type of the resource, a string sent in the headers of a file to indicate its type

    The purpose of the Content-Type field is to describe the data contained in the body fully enough that the receiving user agent can pick an appropriate agent or mechanism to present the data to the user, or otherwise deal with the data in an appropriate manner.

Uruguay
Marcelo Canina
I'm Marcelo Canina, a developer from Uruguay. I build websites and web-based applications from the ground up and share what I learn here.