Deploying a Django App to Elastic Beanstalk: All of the Gotchas
Recently I had to migrate my Django app out from Heroku, and after some research, I opted for AWS Elastic Beanstalk. In the end, something that should have taken me at most a few hours ended up taking two whole days.
Though, to be fair, this was my first time working with any of the AWS services. Here I will be describing the whole process, focusing on the many little issues that made me want to tear my hair out.
It’s possible that you’ll not find this guide to be all that useful if what you are building is too different from what I needed to, but as long as you check one of the boxes, I think you should stay:
- A Django app that works as an API for a frontend running elsewhere.
- Long running processes that should run and scale sepparately.
- A PostgreSQL database that is accesible from more than one environment.
- An SSL API endpoint with a domain hosted in Cloudflare.
Getting Django up and running
Here I’m going to assume that you already have a working Django app, and jump straight ahead to deploying that on Elastic Beanstalk. You should, of course, also have an AWS account for this.
The first thing you’re going to want to do is install the EB CLI. Amazon gives you the choice to install it using setup scripts, or manually. If you already have Python installed and properly configured in your machine, I say go with a manual installation. It’s just installing a package through pip:
pip install awsebcli
Next, we’re going to create an application and environment for your app to run in. You could do this through the web console, but it’s here where I found the first gotcha: The default value for the WSGIPath is set to application, which does not work for us. This should not be an issue, as it’s possible to change it later on. Only it wasn’t. At least, not for me.
Every time I tried to change the WSGIPath, whether through the web console or the configuration files in the .ebextensions folder (I’m getting to that), the changes to the software configuration would revert. In the end, I decided to just start over, and make sure to set that variable properly from the get go.
To do so, you need the .ebextensions folder in the root of our project. Inside, create a file named django.config with the following content, where appname should be replaced by the name you gave to your Django app.
option_settings:
aws:elasticbeanstalk:container:python:
WSGIPath: appname.wsgi:application
Now open a console window, cd into your project’s root directory, and run:
eb init -i
It will ask you for a number of things, but the AWS documentation covers this step pretty well. You should now have a new folder named .elasticbeanstalk inside of your project, with a config.yml file within. If you had a .gitignore file, it should now contain rules to prevent you from committing these.
Now that we have our application, lets create the environment:
eb create
It will ask you some more questions, such as the name for the environment, or the load balancer type. Choose classic for that one. Once it’s done, your environment will be ready and, if everything goes well, your app should be online. You can check your environments here, or use the following command to open the web console of your current project:
eb console
Do remember to add the URL of your app to the Django ALLOWED_HOSTS list, if you use that feature. You can open said URL in your browser with:
eb open
Then commit your changes to your VCS, as the EB CLI will not detect them otherwise, and deploy the new changes with:
eb deploy
Depending on your app’s specifics, you might find that the environment was not deployed successfully. There can be any number of causes for this.
Figuring out what’s wrong
Before moving forward, let’s see some of the ways we can debug issues in Elastic Beanstalk. That way, if something goes wrong while following this guide, you’ll have the means to identifying, and hopefully fixing, the problem.
The first thing you should probably always do is check the logs. You can do so from the web console, or with the following command:
eb logs
If the info in the logs is not enough, we can always SSH into the server. If you followed the steps in the AWS documentation that I linked to, when creating your application, you should be golden. If not, then follow these instructions. Now we can access the machine where our app lives using:
eb ssh
If the command raises a NotFoundError saying it can’t find your SSH key file, it’s likely because you decided to use a keypair that alredy existed, instead of creating a new one. In that case, you would have to manually place the file inside of the ~/.ssh directory, then try again.
There are some common things you might want to do while browsing the server. For example, to navigate to the current app directory you can use:
cd /var/app/current/
Or if the deployment of the environment failed, and you want to browse the files in the staging app directory:
cd /var/app/staging/
In both cases, you will probably want to activate the virtual environment, to have access to the Python packages listed in your requirements:
source $(find /var/app/venv/*/bin/activate)
And while you’re at it, you could also load your environment variables:
export $(sudo cat /opt/elasticbeanstalk/deployment/env | xargs)
Now you should be able to run any of the usual Django management commands, such as makemigrations or createsuperuser. I’m mentioning this now because later on, when we configure the deployment to automatically run new migrations for us, you might find it handy to be able to read the console output of those commands, in case something goes wrong.
On that note, Django has a feature that allows us to receive an email when an exception is raised and debug mode is off. They have every bit of information one could ever ask for, and are very useful for figuring out those “it works on my machine” issues. Check it out.
The Elastic Beanstalk health reports
This is a short one, but it might very well save you from a scare. Elastic Beanstalk environments are configured by default to ping the root of your application, to check if the environment is “healthy” or not.
In my case, I was using a Django app just as an API, without a frontend. Getting the root of my app returned a 404, which was interpreted as not being healthy by this system, making the web console display an angry red color.
Don’t get me wrong, it’s a useful feature, just a bit melodramatic. It’s made even more useful by having the django-health-check library in your app. It allows you to configure several prepackaged health checks, which can then all be consulted at once on the same endpoint, which behaves exactly as Elastic Beanstalk expects. Plug and play.
You can configure how the health check behaves in your django.config:
option_settings:
aws:elasticbeanstalk:container:python:
WSGIPath: appname.wsgi:application
aws:elasticbeanstalk:environment:process:default:
HealthCheckPath: "/ht/"
MatcherHTTPCode: "200"
Pluging in the database
Picking up where we left off, you should now have a Django app deployed in a URL from Elastic Beanstalk. Next we are going to configure a PostgreSQL database in Amazon RDS, directly from the web console.
Inside of the Elastic Beanstalk web console, navigate to Configuration inside of the collapsible menu named after you environment, in the left sidebar. There, scroll to the very bottom and click on the Edit button next to Database.
This will open up a form with several fields. From top to bottom, in the Snapshot field, leave it as None. In the Engine field, change it to postgres. For now, leave the rest of the fields as they are, except for Username and Password. Then click on Apply, and wait. That’s it, you now have a database.
Of course, there’s a gotcha here: This database instance is tied to the lifecycle of your environment. You can’t remove it from your environment, and if you terminate it, the database is terminated as well. By default, a snapshot is created in such a case, so you don’t have to worry about losing your data.
For a production app, it’s recommended to create an instance that’s not tied to you app’s environment. This process is a tad more complex, but very well documented in the official documentation.
All the necessary environment variables for us to connect to the database have been automatically configured. Inside of your Django app settings, modify your database configuration to look something like this:
if 'RDS_HOSTNAME' in os.environ:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': os.environ.get('RDS_DB_NAME'),
'USER': os.environ.get('RDS_USERNAME'),
'PASSWORD': os.environ.get('RDS_PASSWORD'),
'HOST': os.environ.get('RDS_HOSTNAME'),
'PORT': os.environ.get('RDS_PORT'),
}
}
else:
# ...
Here we are checking if one of said environment variables is present, and defining the connection to the RDS database if it is. If it isn’t, you’ll probably want to keep the database connection you had been using for development.
It’s possible you’ll need to install some packages to make PostgreSQL work in your environment. You can configure this by adding a packages.config file to the .ebextensions folder. Here’s mine:
packages:
yum:
git: []
gcc: []
postgresql-libs: []
postgresql-devel: []
python3-devel: []
libcurl-devel: []
openssl-devel: []
openssl-static.x86_64: []
Something of note is that how you name these files usually does not matter, except when one configuration depends on another. In that case you should know that they are applied in alphabetical order, according to their names.
Now that we have a working database, we have to apply the Django migrations. You can do this through SSH, as seen before, but it’s more convenient to have them be applied every time you deploy to your environment. You can archive this by updating your django.config file:
option_settings:
aws:elasticbeanstalk:application:environment:
DJANGO_SETTINGS_MODULE: appname.settings
aws:elasticbeanstalk:container:python:
WSGIPath: appname.wsgi:application
aws:elasticbeanstalk:environment:process:default:
HealthCheckPath: "/ht/"
MatcherHTTPCode: "200-499"
container_commands:
01_migrate:
command: "export $(cat /opt/elasticbeanstalk/deployment/env | xargs) && source /var/app/venv/*/bin/activate && python3 manage.py migrate --noinput"
leader_only: true
Two things to notice here. The first is how the container commands are named, with a number at the very beggining. This is because they are also run in alphabetical order. The second is the leader_only parameter, which is set to true to show that we only want this command to be run on a single instance.
You should now have a fully working Django app deployed in Elastic Beanstalk. Congrats! You might have to take some extra steps if you serve custom static content from your app. I wouldn’t know.
Managing long running tasks
It’s considered bad practice to have API calls that take too long to respond. Too long being more than a few seconds. It’s just not cost-effective. That’s where workers come in. They do the heavy lifting.
They can be an entirely different code base, and will communicate with your web server through a message queuing system. In this case, we’ll be using Amazon SQS for this purpose, and will luckily not have to deal with another code base, thanks to the nifty django-eb-sqs-worker library.
The first thing you are going to have to do is create an environment for your worker. Run the following, replacing nameofworker with your choice name:
eb create -t worker nameofworker
Next, if you want your worker to have access to the same database as your web environment, you’ll want to give the worker permission to do so. Open the web console and navigate to the Configuration tab of your new worker. Once inside, click on the Edit button of the Instances section.
Here, in the EC2 security groups section, we have to select the one from our web environment. It should be simple enough to figure out which one of them it is, but if it isn’t, you can always open this same window for your web environment, and check which one it is over there.
Now that the worker has the necessary permissions, we need to define the environment variables, as this time they are not available by default. To define environment variables in your worker, go once again to Configuration, and this time click the Edit button of the Software section. They’re at the bottom of the page, and these are the variables you need, and where to find them.
- RDS_USERNAME: The one set when creating the database. Can be found in the Database section of the Configuration of your web environment.
- RDS_PASSWORD: The one set when creating the database. Can no longer be easily found. You will have to SSH into the server and run the following.
cat /opt/python/current/env
- RDS_HOSTNAME: Can be found in the Connectivity & security section of the RDS web console for the database. Said console can be accessed by clicking on the link found in the Database section of the Configuration. Actually, the text of the link itself is what we are after, without the port.
- RDS_PORT: Can be found in the Connectivity & security section of the RDS web console for the database. Should be 5432 by default.
- RDS_DB_NAME: Can be found in the Configuration section of the RDS web console for the database. Should be ebdb by default.
The only thing left to do is install and configure the library itself. For that, you should follow the installation instructions found in the Github page.
On top of what is explained in those installation instructions, I recommend adding the following lines to the end your Django settings file:
AWS_EB_HANDLE_SQS_TASKS = 'ENV_WORKER' in os.environif 'RUN_TASKS_LOCALLY' in os.environ:
AWS_EB_RUN_TASKS_LOCALLY = os.environ.get('RUN_TASKS_LOCALLY')
This way, if you add an ENV_WORKER environment variable to the worker environment, but not to the web one, the library will be properly configured to run as it should in either type of environment.
You can also use an environment variable named RUN_TASKS_LOCALLY in your local machine, to control whether or not the tasks will run locally, which you will probably want them to, as by default you won’t have access to the SQS queue outside of the worker environment.
Everything is set up, and now you can follow the usage instructions in the Github page to define jobs. These are called in your code as any other function, but should not be expected to return anything that you’d want to send back as a response to a request. Instead, they will be triggered by said requests, and you can use other mechanisms, such as an entry in the database, to know if the job has finished running, and what was the result.
Securing your endpoints
Right now we have a fully working Django app, with its own database and the ability to handle long running tasks. The only issue remaining is that it is not secured using SSL. In my case, that meant I could not use it as an API for my frontend, which already had a SSL certificate, because that caused browsers to throw a mixed content error when calling the API.
I’ll be assuming that you already have a domain with some provider, and that you are using Cloudflare to manage things. If that is not the case, some of this will not be relevant to you. I’ll let you know.
The fist step is to get a SSL certificate configured in AWS Certificate Manager. Inside said web console, you can either import or create a new certificate. We will be importing a Cloudflare Origin CA certificate. If you use some other provider, refer to their documentation to find out how to proceed.
Log in into your Cloudflare account and select your domain. Then click on the SSL/TLS button, open the Origin Server tab, and click on Create Certificate. Use the default options and click on Create. You will get an Origin Certificate and a Private Key. Store them somewhere safe, or just keep the tab open.
Inside the AWS Certificate Manager console, click on Import a certificate. Copy the Origin Certificate into the Certificate body field, and the Private Key into the Certificate private key field. For the Certificate chain field, you have to download this file, and copy its contents.
After clicking on the Next button, you’ll be able to add some tags to the certificate. Their only use is to make it easier for you to tell it apart from other certificates in your account. When you’re done, click on Review and import. Make sure everything looks good, and then click on the Import button.
Now back to the Elastic Beanstalk web console. Go to the Configuration of your web environment, and scroll down until you see the Load balancer. As always, click on the Edit button. In the Listeners section, at the top of the page, click on the Add listener button. Set the Listener port to 443, the Listener protocol to HTTPS, the Instance port to 80, and the Instance protocol to HTTP.
For the SSL certificate, choose the one you just imported. Make sure that everything is configured as I’ve just indicated. The fields automatically change around on their own, so you have to double check. Once you’re sure, click on the Add button. We’re not done yet! You might have the impression that these changes you just made have already been saved, but they have not. This awful UX cost me nearly an hour of my time. Scroll down and click on the Apply button. Now they have been saved.
Next we need to add a new record to the DNS, but first, lets find out what we are going to be pointing that record to. Open the Amazon EC2 web console, which is the service that is behind Elastic Beanstalk, and click on the Load balancers button in the Resources section, located in the middle of the page.
Select the load balancer for your app, scroll down a bit, and copy the contents of the DNS name field. It should be a domain ending on .amazonaws.com
Back to the Cloudflare control panel, click on the DNS button in the navigation bar, then click on Add record. For Type, select CNAME. For Target, use the domain we just copied. The Name is up to you.
We’re basically done now. Remember to add the new domain to the Django ALLOWED_HOSTS list, or you’ll get an error. Some people recommend only allowing connections from Cloudflare IP addresses, as a security measure. I won’t be opening that can of worms, but you can look up how to configure an Elastic Beanstalk security group to add those rules.
Talking about security, and as a final extra tip: If you have configured Django to send you emails when something goes wrong, you will likely be flooded with hundreds of said emails, saying Invalid HTTP_HOST header. These are caused by bots testing for vulnerabilities in your app. Not only are they a nuisance, but they can also represent a security risk.
Some will tell you to simply disable the alert, but the correct way to deal with them is to prevent them from reaching your Django app in the first place. You can archive this by adding a .conf file to a new folder structure in the root of your project, like so: .platform/nginx/conf.d/elasticbeanstalk/custom.conf
Take note of the dot before “platform”. That’s part of the folder’s name. The contents of the custom.conf file should be as follows:
if ($host !~* ^((127\.0\.0\.1)|(yourdomain.com))$) {
return 444;
}
Replace yourdomain.com with your own. The localhost address is there to allow the web environment to communicate with the worker. Notice how I have not added the Elastic Beanstalk environment default URL. You can if you want to, it just wasn’t needed in my case.
Now you won’t be getting any more of those meaningless emails, and malicious bots probing your site will get a big fat nothing for their efforts.
That’s all from me for now. Thanks for reading!