s3 – A Software Developer's Journey

AWS with Python and Boto3: Implementing Solutions with S3 is out!

S3 is Simple Storage Service from AWS and has many great features you can make use of in your applications and even in your daily life! You can use S3 to host your memories, documents, important files, videos and even host your own website from there!

Join me in this journey to learn ins and outs of S3 to gain all the necessary information you need to work with S3 using Python and Boto3!

Let’s take a closer look at what we’re going to cover in this course step-by-step.

• In this course, we’ll start of with what we’ll build throughout the course and what you need to have on your computer to follow along with me.

• Don’t worry; I’ll explain everything you need very clearly and I’ll show you what you need to install and setup on your computer to work with S3. There will be two different sections for Windows and MacOS users. These sections are basically identical and show how you can prepare your computer environment to be ready to work with S3! I’ll show you how to install Python, Boto3 and configure your environments for these tools. I’ll also show you how you can create your own AWS account step-by-step and you’ll be ready to work AWS in no time!

• When we’re done with preparing our environment to work AWS with Python and Boto3, we’ll start implementing our solutions for AWS.

• First and foremost, we’ll create a Bucket; Bucket is the fundamental part of S3 and its designed all-around buckets. We’ll build on top of that by adding a Bucket Policy. With bucket policies, you can decide who accesses and can do what on your bucket and objects inside it. Then we’ll learn how to do basic operations around buckets like listing the buckets, getting bucket properties, Encrypting Bucket Objects with Server-Side Encryption and much more!

• Then we’ll move on to another important part of working with S3 and that is uploading. We’ll start of with learning how to upload a small file to S3. You’ll learn how easy it is to do so. Next up is Multi-Part Upload for large files! I’ll show you how to implement Multi-Part Uploads and make use threading and parallelization so you can boost the upload speeds for your objects!

• Versioning is another key aspect of S3 and it has various benefits. For example, with versioning enabled, your objects are near impossible to delete so you won’t lose them with accidental deletes! Versioning provides a safe-way to version your files so you upload onto to the same object over and over again and keep track of all along the process! I’ll show you how you can enable versioning on your buckets and how you can upload new versions for your objects.

• We’ll also configure lifecycle policies for our buckets to manage our objects present and the future. With lifecycle policies, you can decide when and what to do with your objects. For example, you can decide to move your unused files to a cheaper storage class like Glacier. So I’ll show you how you can design your own lifecycle policies on your buckets and objects and apply them with your code.

• And here comes S3 Static Website Hosting! Apart from many other great benefits of using S3, you can use it to store your static HTML, Javascript and CSS based websites on it! S3 even gives you a URL so everyone can access it anywhere in the world! We’ll not stop there of course; we’ll learn that Route53 can be used to route traffic to our S3-Hosted Website using our own custom domain!

• So we’ll implement our S3 Static Website Hosting from scratch. We’ll design a simple website and configure it as a website inside our Bucket. Once we have our website up and running and accessible via a URL, we’ll move on to Route53 to configure our own Domain Name or DNS to route traffic to our S3 Hosted Website from our own custom domain!

Again, S3 is an amazing service from AWS and there are hundreds of ways you can make use of it. Let’s not lose more time and jump right into the implementation with S3 so I’ll see you in the course!

AWS S3 MultiPart Upload with Python and Boto3

In this blog post, I’ll show you how you can make multi-part upload with S3 for files in basically any size. We’ll also make use of callbacks in Python to keep track of the progress while our files are being uploaded to S3 and also threading in Python to speed up the process to make the most of it. And I’ll explain everything you need to do to have your environment set up and implementation you need to have it up and running!

aws_s3_course_cover

Hi,

In this blog post, I’ll show you how you can make multi-part upload with S3 for files in basically any size. We’ll also make use of callbacks in Python to keep track of the progress while our files are being uploaded to S3 and also threading in Python to speed up the process to make the most of it. And I’ll explain everything you need to do to have your environment set up and implementation you need to have it up and running!

This is a part of from my course on S3 Solutions at Udemy if you’re interested in how to implement solutions with S3 using Python and Boto3.

First things first, you need to have your environment ready to work with Python and Boto3. If you haven’t set things up yet, please check out my blog post here and get ready for the implementation.

I assume you already checked out my Setting Up Your Environment for Python and Boto3 so I’ll jump right into the Python code.

First thing we need to make sure is that we import boto3:

import boto3

We now should create our S3 resource with boto3 to interact with S3:

s3 = boto3.resource('s3')

Ok, we’re ready to develop, let’s begin!

Let’s start by defining ourselves a method in Python for the operation:

def multi_part_upload_with_s3():

There are basically 3 things we need to implement: First is the TransferConfig where we will configure our multi-part upload and also make use of threading in Python to speed up the process dramatically. So let’s start with TransferConfig and import it:

from boto3.s3.transfer import TransferConfig

Now we need to make use of it in our multi_part_upload_with_s3 method:

config = TransferConfig(multipart_threshold=1024 * 25, max_concurrency=10,
                        multipart_chunksize=1024 * 25, use_threads=True)

Here’s a base configuration with TransferConfig. Let’s brake down each element and explain it all:

multipart_threshold: The transfer size threshold for which multi-part uploads, downloads, and copies will automatically be triggered.

max_concurrency: The maximum number of threads that will be making requests to perform a transfer. If use_threads is set to False, the value provided is ignored as the transfer will only ever use the main thread.

multipart_chunksize: The partition size of each part for a multi-part transfer.

use_threads: If True, threads will be used when performing S3 transfers. If False, no threads will be used in performing transfers: all logic will be ran in the main thread.

This is what I configured my TransferConfig but you can definitely play around with it and make some changes on thresholds, chunk sizes and so on. But let’s continue now.

Now we need to find a right file candidate to test out how our multi-part upload performs. So let’s read a rather large file (in my case this PDF document was around 100 MB).

First, let’s import os library in Python:

import os

Now let’s import largefile.pdf which is located under our project’s working directory so this call to os.path.dirname(__file__) gives us the path to the current working directory.

file_path = os.path.dirname(__file__) + '/largefile.pdf'

Now we have our file in place, let’s give it a key for S3 so we can follow along with S3 key-value methodology and place our file inside a folder called multipart_files and with the key largefile.pdf:

key_path = 'multipart_files/largefile.pdf'

Now, let’s proceed with the upload process and call our client to do so:

s3.meta.client.upload_file(file_path, BUCKET_NAME, key_path,
                            ExtraArgs={'ACL': 'public-read', 
                                       'ContentType': 'text/pdf'},
                            Config=config,
                            Callback=ProgressPercentage(file_path))

Here I’d like to attract your attention to the last part of this method call; Callback. If you’re familiar with a functional programming language and especially with Javascript then you must be well aware of its existence and the purpose.

What basically a Callback does to call the passed in function, method or even a class in our case which is ProgressPercentage and after handling the process then return it back to the sender. So with this way, we’ll be able to keep track of the process of our multi-part upload progress like the current percentage, total and remaining size and so on. But how is this going to work? Where does ProgressPercentage comes from? Nowhere, we need to implement it for our needs so let’s do that now.

Either create a new class or your existing .py, it doesn’t really matter where we declare the class; it’s all up to you. So let’s begin:

class ProgressPercentage(object):

In this class declaration, we’re receiving only a single parameter which will later be our file object so we can keep track of its upload progress. Let’s continue with our implementation and add an __init__ method to our class so we can make use of some instance variables we will need:

self._filename = filename
self._size = float(os.path.getsize(filename))
self._seen_so_far = 0
self._lock = threading.Lock()

Here we are preparing our instance variables we will need while managing our upload progress. filename and size are very self-explanatory so let’s explain what are the other ones:

seen_so_far: will be the file size that is already uploaded in any given time. For starters, its just 0.

lock: as you can guess, will be used to lock the worker threads so we won’t lose them while processing and have our worker threads under control.

Here’s the most important part comes for ProgressPercentage and that is the Callback method so let’s define it:

def __call__(self, bytes_amount):

bytes_amount is of course will be the indicator of bytes that are already transferred to S3. What we need is a way to get the information about current progress and print it out accordingly so that we will know for sure where we are. Let’s start by taking thread lock into account and move on:

with self._lock:

After getting the lock, let’s first set seen_so_far to an appropriate value which is the cumulative value for bytes_amount:

self._seen_so_far += bytes_amount

Next is that we need to know the percentage of the progress so to track it easily:

percentage = (self._seen_so_far / self._size) * 100

We’re simply dividing the already uploaded byte size to the whole size and multiplying it by 100 to simply get the percentage. Now, for all these to be actually useful, we need to print them out. So let’s do that now. I’m making use of Python sys library to print all out and I’ll import it; if you use something else than you can definitely use it:

import sys

Now let’s use it to print things out:

sys.stdout.write("\r%s  %s / %s  (%.2f%%)" % 
                (self._filename, self._seen_so_far, self._size,percentage))

As you can clearly see, we’re simply printing out filename, seen_so_far, size and percentage in a nicely formatted way.

One last thing before we finish and test things out is to flush the sys resource so we can give it back to memory:

sys.stdout.flush()

Now we’re ready to test things out. Here’s a complete look to our implementation in case you want to see the big picture:

import threading

import boto3
import os
import sys

from boto3.s3.transfer import TransferConfig

BUCKET_NAME = "YOUR_BUCKET_NAME"


def multi_part_upload_with_s3():
    # Multipart upload
    config = TransferConfig(multipart_threshold=1024 * 25, max_concurrency=10,
                            multipart_chunksize=1024 * 25, use_threads=True)
    file_path = os.path.dirname(__file__) + '/largefile.pdf'
    key_path = 'multipart_files/largefile.pdf'
    s3.meta.client.upload_file(file_path, BUCKET_NAME, key_path,
                            ExtraArgs={'ACL': 'public-read', 'ContentType': 'text/pdf'},
                            Config=config,
                            Callback=ProgressPercentage(file_path)
                            )


class ProgressPercentage(object):
    def __init__(self, filename):
        self._filename = filename
        self._size = float(os.path.getsize(filename))
        self._seen_so_far = 0
        self._lock = threading.Lock()

    def __call__(self, bytes_amount):
        # To simplify we'll assume this is hooked up
        # to a single filename.
        with self._lock:
            self._seen_so_far += bytes_amount
            percentage = (self._seen_so_far / self._size) * 100
            sys.stdout.write(
                "\r%s  %s / %s  (%.2f%%)" % (
                    self._filename, self._seen_so_far, self._size,
                    percentage))
            sys.stdout.flush()

Let’s now add a main method to call our multi_part_upload_with_s3:

if __name__ == '__main__':
multi_part_upload_with_s3()

Let’s hit run and see our multi-part upload in action:

2018-09-18 12_07_32-AWS with Python and Boto3_ Implementing Solutions with S3 _ Udemy

As you can see we have a nice progress indicator and two size descriptors; first one for the already uploaded bytes and the second for the whole file size.

So this is basically how you implement multi-part upload on S3. There are definitely several ways to implement it however this is I believe is more clean and sleek.

Make sure to subscribe my blog or reach me at niyazierdogan@windowslive.com for more great posts and suprises on my Udemy courses

Have a great day!

Working with AWS using Python and Boto3: Setting Up Your Environment

aws_boto3_cover

Hi,

In this blog post, I’d like to show you how you can set up and prepare your development environment for AWS using Python and Boto3.

I’m assuming you’re familiar with AWS and have your Access Key and Secret Access Key ready; if that’s the case than great, either set them to your environment variables or wait up for me to show you how you can do that.

1. Python 3

If you already have Python 3 on your computer then you can skip this part entirely. For those who don’t; please go to Python Downloads Page and grab the latest Python 3 version for your operating system. And to make things simpler, get the standalone installer version which makes things a lot easier.

Once you have it, just launch the installer and follow the steps. It’s almost the same for Mac/Windows operating systems and once the installation completes make sure to check the box where it says Add Python to PATH (for Windows users). This is important; otherwise you’d have to do it manually.

Once you’re done with the installation, open a new Terminal window (for MacOS) or Command Prompt (for Windows) and type:

python --version

You should see a similar output like this (your version will probably be different):

2018-09-18 14_24_00-MINGW64__

Now we’re with Python and next up is to get boto3.

2. Boto3

Python comes with the pip, package manager for Python by default. Check if you already have it like below:

pip --version

You should have a similar output to this:

2018-09-18 14_29_07-MINGW64__

Once we verify that we have pip, we can install boto3 as follows:

pip install boto3

Once you run the command, it should install latest version of boto3. Since I already have it installed, my output looks like below but yours will take some time to collect packages and eventually install it:

2018-09-18 14_32_44-MINGW64__

3. IDE Configuration

IDEs are varying in features and functionalities and since you’re reading this, you probably have one that you like most. Mine is PyCharm and I think its the best Python IDE out there. If you want to use it or give it a try, go check it out at Jetbrains PyCharm Website , there’s also a free Community Edition available.

So for my IDE, PyCharm, I’ll open up a new project and select the existing Python interpreter as below:

2018-09-18 14_39_58-New Project

Once you have a similar configuration, now hit Create.

Project window opens up and now we can create our Python package to add our Python files in it. To do that, right-click on your project name and then New -> Python Package:

2018-09-18 14_42_58-my_python_project [D__my_python_project] - PyCharm (Administrator)

Name your package anything you want, I used src in my case. So let’s now create a new Python file following the same procedure by right-clicking and again name it anything you want. I named it boto3_test.py

Once you create your Python file, try to type in the following import statement:

2018-09-18 14_47_50-my_python_project [D__my_python_project] - ..._src_boto3_test.py [my_python_proj

If you could import boto3 like this than that’s great! We can move on to configure our IDE and write our first lines with boto3.

3.a AWS Credentials

If you have your AWS credentials ready on your environment then you can skip this part to the next. But if you don’t than let me show you how you can do that only for your specific project with PyCharm IDE. To configure your AWS credentials, click Add Configuration.. button on your IDE as below:

2018-09-18 14_54_40-

Then under Templates section, you’ll see Python when you expand it. Select it and add your AWS credentials under Environment Variables section like in the image above.

Save it preferrably and close the window when you’re done.

Now let’s type in our first line of code and get ready to work with AWS. To do that, you have couple of options with boto3. You can either make use of low-level client or higher-level resource declaration. I’ll show you either way.

In order to use low-level client for S3 with boto3, define it as follows:

s3_client = boto3.client('s3')

Instead, to use higher-level resource for S3 wih boto3, define it as follows:

s3_resource = boto3.resource('s3')

That’s it, you have your environment set up and running for Python Boto3 development. You’re ready to rock on with it!

Have a great day!

Share this:

Share this:

1. Python 3

2. Boto3

3. IDE Configuration

3.a AWS Credentials

Share this: