DEV Community

Varun Jewalikar
Varun Jewalikar

Posted on

Download S3 bucket

Originally published at https://gist.github.com/neo01124/dc31d0b08bd7ac6906d06197e20dc9b6

This must be at least the 5th time I've written this kind of code for different projects and decided to make a note of it for good.

This might seem like a very trivial task until you realise that S3 has no concept of folder hierarchy. S3 only has the concept of buckets and keys. Buckets are flat i.e. there are no folders. The whole path (folder1/folder2/folder3/file.txt) is the key for your object. S3 UI presents it like a file browser but there aren't any folders. Inside a bucket there are only keys. From the S3 docs

The Amazon S3 data model is a flat structure: you create a bucket, and the bucket >stores objects. There is no hierarchy of subbuckets or subfolders; however, you >can infer logical hierarchy using key name prefixes and delimiters as the Amazon >S3 console does.

The challenge in this task is to essentially create the directory structure (folder1/folder2/folder3/) in the key before downloading the actual content of the S3 object.

Option 1 - Shell command

Aws cli will do this for you with a sync operation

aws s3 sync s3://yourbucket /local/path
Enter fullscreen mode Exit fullscreen mode

Option 2 - Python

  • Install boto3
  • Create IAM user with a similar policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:GetObject",
                "s3:GetBucketLocation",
            ],
            "Resource": [
                "arn:aws:s3:::your_bucket_name"
            ]
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode
  • Create a profile in ~/.aws/credentials with access details of this IAM user as explained in the boto documentation
  • Code
import boto3, errno, os

def mkdir_p(path):
    # mkdir -p functionality from https://stackoverflow.com/a/600612/2448314
    try:
        os.makedirs(path)
    except OSError as exc:  # Python >2.5
        if exc.errno == errno.EEXIST and os.path.isdir(path):
            pass
        else:
            raise

def get_s3_path_filename(key):
    key = str(key)
    return key.replace(key.split('/')[-1],""),  key.split('/')[-1]

def download_s3_bucket(bucket_name, local_folder, aws_user_with_s3_access):
    session = boto3.Session(profile_name=aws_user_with_s3_access)
    s3_client = session.resource('s3')
    s3_bucket = s3_client.Bucket(bucket_name)
    for obj in s3_bucket.objects.all():
        s3_path, s3_filename = get_s3_path_filename(obj.key)
        local_folder_path = os.path.join(*[os.curdir,local_folder, s3_path])
        local_fullpath = os.path.join(*[local_folder_path, s3_filename])
        mkdir_p(local_folder_path)
        s3_bucket.download_file(obj.key, local_fullpath)

download_s3_bucket(bucket_name = your_bucket_name, local_folder = "/tmp/s3_bucket", aws_user_with_s3_access = profile_name)
Enter fullscreen mode Exit fullscreen mode

I'd make a package, if there is enough interest :)

Top comments (1)

Collapse
 
sergedegosson70 profile image
Serge Gosson de Varennes

Hi! I know that I might be a little late on this one, but i am new to aws. What is my profile_name supposed to be? Is it my user name on aws? How do I retrieve it? Sorry for the (probably) dumb question.