Host a Jekyll Website With Pretty Urls In Amazon S3 and Cloudfront
Overview
One of the problems in having pretty URLs when hosting a Jekyll site
in Amazon S3, is that even we set permalinks
URLs without .html
extensions, the files generated by Jekyll include this extension.
It relies on the server configuration to be able to detect its
content-type1 so it can handle URLs that does not include the .html
extension, and serve the corresponding file.
Amazon S3 isn’t able to make this translation, so it leaves us with
two options to have URLs without ending in .html
:
Create each post in Jekyll as a
(directory)/index.html
file, so it will serve eachindex.html
. E.g.:We try to access
https://example.com/my-cool-page
, then Amazon S3 server will be able to respond to this request if it finds one of these files:/my-cool-page/index.html
Or
Generate HTML files without the extension:
/my-cool-page
Also, the Content-type header of extension-less files should be
set to text/html
after renaming them.
Remove extension from files before deploying
If configuring Jekyll to generate posts/pages in subdirectories is not
an option, then we can remove the .html
extension to all the files,
except those named index.html
, just before deploying them
to the server.
An easy way to do it is to have a shell script that removes them after building the site.
find _site/ -type f ! -iname 'index.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done
Command explanation:
- -type f
- File is of type "regular file"
- ! -iname 'index.html'
- Avoid matching index.html files.
- -iname pattern
Base of file name (the path with the leading directories removed) matches shell pattern pattern.
Case insensitive- -print0
True; print the full file name on the standard output, followed by a null character (instead of the newline character that -print uses). This allows file names that contain newlines or other types of white space to be correctly interpreted by programs that process the find output. This option corresponds to the -0 option of xargs.
- read -d delimiter
The first character of DELIM is used to terminate the input line, rather than newline.
Example:
$ jekyll build Configuration file: /tmp/j/_config.yml Source: /tmp/j Destination: /tmp/j/_site Incremental build: disabled. Enable with --incremental Generating... done in 1.433 seconds. $ tree _site/ _site/ |-- about | `-- index.html |-- assets | `-- main.css |-- feed.xml |-- index.html `-- jekyll `-- update `-- 2017 `-- 04 `-- 24 `-- welcome-to-jekyll.html 7 directories, 5 files $ find _site/ -type f ! -iname 'index.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done $ tree _site/ _site/ βββ about βΒ Β βββ index.html βββ assets βΒ Β βββ main.css βββ feed.xml βββ index.html βββ jekyll βββ update βββ 2017 βββ 04 βββ 26 βββ welcome-to-jekyll 7 directories, 5 files
Upload files
When uploading the files to the server we must set the correct MIME
type to Content-Type: text/html
to the files that does not have
.html
extension, if we don’t set them, then the server will
interpret them as Content-Type: binary/octet-stream
. Other files
will get the correct Content-Type.
The Amazon S3 Command Line Interface has a special parameter to set the correct Content-Type for each file when copying them: aws s3 cp local_directory bucket-name –content-type text/html
In this approach, we are going to copy the files without extension, setting the right Content-Type, and then just copy the rest of the files, leaving that task to the server.
Copy files without extension
Copy local files to the S3 bucket.
aws s3 cp _site/ s3://cachedpage.co/ --content-type text/html --recursive --exclude "*.*"
or synchronize the directory with the S3 bucket checking file difference by size not timestamps.
aws s3 sync _site/ $s3_bucket --size-only --exclude "*" --include "*.*" --delete
Copy the rest of the files
aws s3 cp _site/ s3://cachedpage.co/ --recursive --exclude "*" --include "*.*"
or with aws sync:
aws s3 sync _site/ $s3_bucket --size-only --content-type text/html --exclude "*.*" --delete
Note that, by default, all files are included. This means that providing only an --include filter will not change what files are transferred. --include will only re-include files that have been excluded from an --exclude filter. If you only want to upload files with a particular extension, you need to first exclude all files, then re-include the files with the particular extension.
Invalidate uploaded files in Cloudfront
If we are using a Content Delivery Network, chances are that your files has been cached and you need to refresh them. To remove an object from CloudFront edge caches before it expires we need to invalidate them.
The next time a viewer requests the object, CloudFront returns to the origin to fetch the latest version of the object.
For this we copy the modified file names to a temporal file, and then create a new invalidation with these names as they are access in our website.
AWS CLI support for this service is only available in a preview stage. You can enable this service by running: aws configure set preview.cloudfront true
tempfile=$(mktemp)
distribution_id=ASDFHDFSAF45234
echo "Copying files to server..."
aws s3 sync _site/ $(s3_bucket) --size-only --exclude "*" --include "*.*" --delete | tee -a $(tempfile)
echo "Copying files with content type..."
aws s3 sync _site/ $(s3_bucket) --size-only --content-type text/html --exclude "*.*" --delete | tee -a $(tempfile)
#invalidate only modified files
grep "upload\|deleted" $(tempfile) | sed -e "s|.*upload.*to $(s3_bucket)|/|" | sed -e "s|.*delete: $(s3_bucket)|/|" | sed -e 's/index.html//' | sed -e 's/\(.*\).html/\1/' | tr '\n' ' ' | xargs aws cloudfront create-invalidation --distribution-id $(distribution_id) --paths
Script explanation:
First we create the temporal file that will hold modified files with
tempfile=$(mktemp)
.
Then we synchronize the local directory with the remote one in S3, redirecting the output to standard output and to the temporal file with:
aws s3 sync _site/ $(s3_bucket) --size-only --exclude "*" --include "*.*" --delete | tee -a $(tempfile)
aws s3 sync _site/ $(s3_bucket) --size-only --content-type text/html --exclude "*.*" --delete | tee -a $(tempfile)
After that we process the file names that were uploaded or deleted by
the aws
command:
grep "upload\|deleted" $(tempfile)
Then discard the string between upload
or delete
and the name of
the bucket:
sed -e "s|.*upload.*to $(s3_bucket)|/|" | sed -e "s|.*delete:
$(s3_bucket)|/|"
As our URLs are accessed only with the URLs like /
instead of
/index.html
we remove them
sed -e 's/index.html//'
Our URLs doesn’t have the .html
extension also:
sed -e 's/\(.*\).html/\1/'
Lastly we put all the URLs like names in a single line separated with
a space to comply with the aws cloudfront create-invalidation --paths
command:
tr '\n' ' ' | xargs aws cloudfront create-invalidation --distribution-id $(distribution_id) --paths
Final script
The complete process is reflected in the following deploy.sh
script,
you probably want to adapt it to a Makefile
, Grunt
or some other
program but I will leave it as a bash
script to reflect its usage:
#!/usr/bin/env bash
#
# Copy Jekyll site to S3 bucket
#
####################################
#
# Custom vars
#
s3_bucket="s3://example.com/"
distribution_id=ASDFHDFSAF45234
####################################
set -e # halt script on error
set -v # echo on
tempfile=$(mktemp)
echo "Building site..."
JEKYLL_ENV=production bundle exec jekyll build
echo "Removing .html extension"
find _site/ -type f ! -iname 'index.html' -iname '*.html' -print0 | while read -d $'\0' f; do mv "$f" "${f%.html}"; done
echo "Copying files to server..."
aws s3 sync _site/ $(s3_bucket) --size-only --exclude "*" --include "*.*" --delete | tee -a $(tempfile)
echo "Copying files with content type..."
aws s3 sync _site/ $(s3_bucket) --size-only --content-type text/html --exclude "*.*" --delete | tee -a $(tempfile)
#invalidate only modified files
grep "upload\|deleted" $(tempfile) | sed -e "s|.*upload.*to $(s3_bucket)|/|" | sed -e "s|.*delete: $(s3_bucket)|/|" | sed -e 's/index.html//' | sed -e 's/\(.*\).html/\1/' | tr '\n' ' ' | xargs aws cloudfront create-invalidation --distribution-id $(distribution_id) --paths
Conclusion
This approach will work in most situations, you have to be careful if you have any other files without extension to avoid setting the wrong media type.
References
- Index documents in S3 http://docs.aws.amazon.com/AmazonS3/latest/dev/IndexDocumentSupport.html
- Caching user input in bash http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_08_02.html
- thkala answer in Linux: remove file extensions for multiple files
- S3 cli docs http://docs.aws.amazon.com/cli/latest/reference/s3/
- S3 Exclude and include filters http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#use-of-exclude-and-include-filters
- Complete list of MIME types https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Complete_list_of_MIME_types
- Content Type Header field https://www.w3.org/Protocols/rfc1341/4_Content-Type.html
- MIME media type name : Text https://www.iana.org/assignments/media-types/text/html
The Content-Type entity header is used to indicate the media type of the resource, a string sent in the headers of a file to indicate its type
The purpose of the Content-Type field is to describe the data contained in the body fully enough that the receiving user agent can pick an appropriate agent or mechanism to present the data to the user, or otherwise deal with the data in an appropriate manner.
↩︎
- Multilingual Jekyll Without PluginsMay 8, 2017
- Host a Jekyll Website With Pretty Urls In Amazon S3 and Cloudfront
- Get A List Of Categories Based In Subfolders In JekyllMarch 3, 2017
- 5 Steps To Add Bootstrap 4 To Jekyll The Right WayFebruary 27, 2017
- Automated Deployment Of Jekyll Websites To Github Pages With A Git Push To GithubNovember 8, 2016
- How To Use Bower Scss With JekyllJune 18, 2016
- How to implement breadcrumbs on a Jekyll site with nested categoriesJune 7, 2016
- How To Handle Adsense In A Jekyll Development EnvironmentJune 6, 2016
- How To Prevent Content Displaying In A Jekyll Development EnvironmentJune 6, 2016
Articles
Except as otherwise noted, the content of this page is licensed under CC BY-NC-ND 4.0 . Terms and Policy.
Powered by SimpleIT Hugo Theme
·