Using AWS S3

Warning

Some content may be outdated.

What is AWS S3?

Amazon Simple Storage Service, or S3, is one of the many services available through Amazon Web Services. S3 can very basically be described as a limitless (for all practical purposes) external hard-drive in the cloud. It’s a simple system that is secure and also cheap. To read more about S3, check out the official documentation.

To use S3, you will need an AWS account, which is just an extension of a normal Amazon account. Here’s some information on how to get started.

Why Use S3 with Arches?

By the time you are in a production environment, you will have configured Arches with a web server, such as Apache or nginx. While you need a web server to serve the app itself, there are two pieces of the app that can be separated from the web server and served independently. These are the ‘static’ files (the css, javascript, and logos that are used throughout the app) and the ‘media’ files (any user uploaded files, such as images or documents).

Many of the existing tutorials on this matter are concerned with serving both static and media files, because the more load you can take off of your web server the better. However, for the purposes of this tutorial, we are only dealing with media files. This is because S3 is especially suited to storing a large (and growing) amount of files:

  • S3 is cheap: As per the price chart, it costs just $.03 per gb/month. So a database with 10gb of photos will have a media storage cost of $3/month, plus a small amount per transaction ($0.004 per 10,000 GET requests, e.g.).

  • S3 is scalable: You only pay for the amount of data you have stored, and you have no real limit on how much you can store. This allows for an Arches deployment on a small server, either in-house or a small AWS EC2 or DigitalOcean instance to store hundreds of gigabytes of media–photos, audio, video, documents–without having to restructure to accommodate more data.

You should be able to use S3 regardless of where your app is hosted, whether on an internal server, an AWS EC2 instance, a DigitalOcean droplet, etc.

Note

We’ve found that by following the steps below, deleting an Information Resource from within Arches will not automatically remove the file from your S3 bucket. You can manually delete files from the bucket for now, or the intrepid developer may check out the answer to this question on the Arches forum.

Steps to Follow

Having worked through a number of existing tutorials (mostly dylanbfox.blogspot.com, martinbrochhaus.com, www.caktusgroup.com, and www.holovaty.com), we’ve distilled these steps to show how you can use S3 in conjunction with your Arches app. Before beginning, you will need to have set up and logged into your AWS account.

  1. Create credentials for your Arches app

    These new credentials will allow your Arches app to access the S3 bucket.

    1. Access the AWS Identity and Access Management (IAM) Console.

    2. Create a new user (named something like “arches_media”), and download the new credentials. This will be a small .csv file that includes an Access Key ID and a Secret Key.

    3. Also, go to the new user’s properties, and record the User ARN.

  2. Create a new bucket on S3

    Next, you’ll need to create a new bucket and give it the appropriate settings.

    1. Create a bucket, named something like “my_app-media”.

    2. In the new bucket properties, under Permissions, create a new bucket policy

    3. Paste the following text into your new policy, inserting your own BUCKET-NAME and the your new User ARN

    {
        "Statement": [
            {
                "Sid":"PublicReadForGetBucketObjects",
                "Effect":"Allow",
                "Principal": {
                    "AWS": "*"
                    },
                "Action":["s3:GetObject"],
                "Resource":["arn:aws:s3:::BUCKET-NAME/*"
                    ]
            },
            {
                "Action": "s3:*",
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:s3:::BUCKET-NAME",
                    "arn:aws:s3:::BUCKET-NAME/*"
                ],
                "Principal": {
                    "AWS": [
                        "USER-ARN"
                    ]
                }
            }
        ]
    }
    
    1. Also, make sure that the CORS configuration (click “Add CORS Configuration”) looks like this

      <CORSConfiguration>
          <CORSRule>
              <AllowedOrigin>*</AllowedOrigin>
              <AllowedMethod>GET</AllowedMethod>
              <MaxAgeSeconds>3000</MaxAgeSeconds>
              <AllowedHeader>Authorization</AllowedHeader>
          </CORSRule>
      </CORSConfiguration>
      
  3. Update the Virtual Environment

    In order to configure Arches to use your new bucket, you need to install a couple of extra Django modules in your virtual environment. These will augment Django’s flexibility in how it stores uploaded media.

    Activate your virtual environment and run this command

    (ENV) $: pip install boto django-storages
    
  4. Update settings.py

    Finally, you need to tell your app to use these new modules, give it the necessary credentials, and tell it where to store (and find) the uploaded media. Open the your settings.py file…

    1. Find the line that defines the settings “INSTALLED_APPS” and add ‘storages’ to it. It should look like this

      INSTALLED_APPS = INSTALLED_APPS + (PACKAGE_NAME, 'storages',)
      
    2. Next, add the following lines, replacing the AWS settings values with information from earlier steps (remember the credentials.csv file you downloaded?)

      DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
      AWS_STORAGE_BUCKET_NAME = 'aws_bucket_name'
      AWS_ACCESS_KEY_ID = 'aws_access_key_id'
      AWS_SECRET_ACCESS_KEY = 'aws_secret_access_key'
      S3_URL = 'http://%s.s3.amazonaws.com/' % AWS_STORAGE_BUCKET_NAME
      MEDIA_URL = S3_URL
      
    3. Restart your web server.

You should be good to go! To test, create a new Information Resource in your installation and upload a file. Now go back to check out your S3 bucket through the AWS console. Your file should show up in a new folder called files within the bucket. If you are encountering issues, be sure to let us know on the [forum](https://groups.google.com/forum/#!forum/archesproject).