AWS EMR

gProfiler can be added to your EMR cluster's bootstrap actions from either the AWS console or the AWS CLI, in order to automatically include it when deploying clusters. Note that gProfiler bootstrap actions install and load gProfiler before your cluster's core services such as Spark or Hadoop. To set up a gProfiler bootstrap action, follow these instructions.

In the AWS EMR Console

You have the option to specify a bootstrap action when creating a cluster. When creating a cluster, choose Advanced Options and then navigate to General Cluster Settings. Under Bootstrap Action select Configure and add under the Advanced Options.

In the AWS CLI

You can pass references to bootstrap action scripts on Amazon EMR by adding the --bootstrap-actions parameter when you create the cluster using the create-cluster command. It will look like this:

--bootstrap-actions Path="s3://mybucket/filename",Args=[arg1,arg2]

What You Need to Install gProfiler on EMR

To install gProfiler on your EMR cluster you will need the following arguments:

  • Token: Your user's unique API key, or the API token which is given on gProfiler's “Install Service” page:

  • EMR Cluster Name: The name of the EMR cluster you would like to install gProfiler on.

  • Service ID: The service identifier you defined, which is used to correlate all the agents that are installed on the machines in your service.

  • Download Bucket: The name of the S3 bucket containing your cluster's bootstrap scripts.

Installing gProfiler on EMR

With the AWS CLI

When creating a cluster, add gProfiler installation as a bootstrap action using the following command, but remember to choose the relevant cluster name and fill in the appropriate Token and Service ID, preserving the TOKEN= and SERVICE= prefixes:

aws emr create-cluster --name "MY-Cluster" ...

--bootstrap-actions "Path=s3://download.granulate.io/granulate_generic_env_wrapper.sh,Args=[<https://s3.amazonaws.com/download.granulate.io/granulate_run_gprofiler.sh,TOKEN=<TOKEN>,SERVICE=<SERVICE>]"

With the AWS Console

  1. Choose Create cluster

  2. Click Go to Advanced Options

  3. On the Create Cluster - Advanced Options screen, in steps 1 and 2 choose the options you prefer and proceed to Step 3: General Cluster Settings

  4. Under Bootstrap Actions select Configure and Add, choose Custom Actions, and fill in the following:

    1. Name: Granulate gProfiler

    2. Script Location: s3://download.granulate.io/granulate_generic_env_wrapper.sh

    3. Optional Arguments (note the line breaks between the arguments, and remember to fill in the appropriate TOKEN and SERVICE_ID values, preserving the TOKEN= and SERVICE= prefixes ):

      <https://s3.amazonaws.com/download.granulate.io/granulate_run_gprofiler.sh>

      TOKEN=<TOKEN>

      SERVICE=<SERVICE_ID>

  5. The filled form will look like this:

7. Click Add

8. Proceed to create the cluster. Your bootstrap action(s) will be performed after the cluster has been provisioned and initialized.

Last updated