Apache Superset Installation & Configuration

Updated: 8 Jun 2023

Introduction

If anyone thinks open source is free, it's not. What you don't pay in $$$ you pay in time (not meant disparagingly). Most open source documentation is written by tech people who, likely unconsciously, seem to assume those who will read it, already understand it.

Apache Superset is no different. I spent quite some time researching, figuring out and cobbling together how to set up Apache Superset from various sources (see references). Although there are some good initial sources (particularly Superset's own documentation), nothing got me to the finish line on its own. Nothing seemed to have everything I was looking for or it was lacking helpful examples of what the code/config should look like. So, here's what I figured out and what worked for me. Hopefully, you find it helpful.

Dev/Test Setup

Helpful Notes

  • This method is not meant for production nor is it secure. It should only be used internally and not be accessible to the outside world.

  • Because this method runs Superset under your current user, when you logout or disconnect from your ssh session, Superset will stop. To leave Superset running upon disconnect, use a screen or tmux session.

Dev/Test Environment

  • Ubuntu 20.04LTS (fully updated) on an AWS EC2 t3.small instance

  • Running as a local user or root

  • Python virtual environment

  • Not running as a service

How To

  1. Install dependencies

    sudo apt install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev

  2. Install Python virtual environment (recommended)

    apt install python3.8-venv

  3. Update pip

    pip install --upgrade setuptools pip

  4. I had to downgrade sqlparse otherwise, there were errors during Superset's db upgrade (this took quite awhile to figure out)

    pip install sqlparse=='0.4.3'

  5. Create the Python virtual environment (this keeps all Python packages installed/uninstalled within this environment)

    python3 -m venv venv

    . venv/bin/activate (this puts you in the venv - note the command starts with the "."

    • To exit the venv, type deactivate on the command line
  6. Install Superset

    pip install apache-superset

  7. Init the DB

    superset db upgrade

  8. Before creating the admin user, you need to override the default secret key

    1. Create key:

      openssl rand -base64 42

    2. Copy the key (using your preferred copy/paste method)

    3. Create a file in <your user>/venv/bin called superset_config.py

    4. Enter the following as it's own line in the config file:

      SECRET_KEY = 'paste key here' (note the single ticks are required)

  9. Create the admin user

    export FLASK_APP=superset

    superset fab create-admin

  10. Load some play data

    superset load_examples

  11. Create default roles (for other users)

    superset init

  12. (Optional) If you will be connecting to a MySQL db for your BI data, you'll need to load the MySQL drivers.

    1. Linux dependencies

      apt install python3-dev default-libmysqlclient-dev build-essential

      apt install mysql-client (for testing connection to db from command line)

    2. Python dependencies (possible not all of these are needed)

      pip install pymysql

      pip install mysql-connector-python (pretty sure this is the key item)

      pip install mysql-python

      pip install mysqlclient

    3. Superset will automatically import/use these. You'll know it's working if you get the MySQL option in the database connection setup within Superset. If you've already started Superset, you'll need to restart it.

  13. Start the Superset server

    superset run -h 0.0.0.0 -p 5000

    1. -h: host ip. Using 0.0.0.0 binds to all IPs

    2. -p: port

  14. Connect to server

    • In your web browser, connect to the IP address of your server at port 5000 (example: http://192.168.1.123:5000)

Connect to Database (sdb)

Once in Superset using your web browser, navigate to:

  • Settings

  • +Database

  • Choose database type

  • Enter the required parameters (see examples below)

The following entry is what worked when connecting to a MySQL database on SingleStore (singlestore.com). It should also work for a local MySQL db.

mysql+mysqlconnector://<user>:<password>@<db url or IP>/<db name>

Example: mysql+mysqlconnector://readonlyuser:password123@mydb.example.com/mydatabase

SingleStore note:

For production cloud environments, I can't say enough good about SingleStore. It increased the performance of our web app (not related to Superset) quite literally 1000% over AWS RDS (which served us well until we outgrew it). Their customer service is top-notch. (No, I was not compensated for saying this!)

Database Connections (examples):

  • mysql://:@127.0.0.1:3306/

  • mysql://:@localhost:3306/

  • mysql+pymysql://:@127.0.0.1:3306/

  • mysql+pymysql://:@localhost:3306/

  • mysql+pymysql://:@<127.0.0.1>:3306/

  • mysql+pymysql://:@localhost:3306/

  • mysql+mysqlconnector://:@<127.0.0.1>/

  • mysql+mysqlconnector://:@/

Dev Setup References/Credits:

Superset Production Setup

Notes:

  • This uses Docker with a "production" (read: more secure) configuration. That's not to say it's perfectly secure (nothing is). You will need to take appropriate steps to further protect it. If it's going to be made publicly available, I would recommend stuffing it behind a load balancer with a WAF and other appropriate protections, etc.

  • Security software (antivirus) may break some Superset functionality. Specifically, it prevented new tabs in the SQL Lab as well as displaying search results with errors such as those shown below. This is just an antivirus configuration issue that still needs to be chased down (I'll update this when I do). If you see something similar, try disabling the antivirus to see if it fixes it.

    • "An error occurred while storing the latest query id in the backend."

    • "An error occurred while storing your query in the backend."

    • "An error occurred while setting the tab schema."

    • "An error occurred while setting the tab database ID."

    • "Unable to add a new tab to the backend."

Production Environment Used

  • Ubuntu 20.04LTS (fully updated) on an AWS EC2 M6i.xlarge instance

    • 4 vCPU

    • 16GB ram

  • Docker

  • Superset git repo

  • Postgresql (default) for Superset database

  • SingleStore for BI source data

  • SMTP sending service (for reports, alerts)

Install Docker

  1. Update repos & install certs:

    1. Update package index; install packages to allow repo over https:

      sudo apt update

      sudo apt install ca-certificates curl gnupg

    2. Add Docker’s GPG key:

      sudo install -m 0755 -d /etc/apt/keyrings

      curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

      sudo chmod a+r /etc/apt/keyrings/docker.gpg

    3. Setup Docker repo:

      1.  echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
        
    4. Install Docker Engine:

      sudo apt update

      sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

    5. Verify Docker installation:

      sudo docker run hello-world

      • This will download a test image, run the container and print to screen a confirmation message.

Apache Superset Setup

  1. Clone Superset’s git repo

    1. cd into the directory where you want to repo to reside.

      • git clone will create a “superset” directory here - so if you create your own "superset" directory, you'll end up with another one inside that (which will have yet another as part of the repo)). I just cloned if from within my user directory.
    2. Clone Superset

      git clone https://github.com/apache/superset.git

      • Note: if repo already exists on the local machine (such as if you're trying to update an existing repo), then:

        • cd into the repo directory

        • Do a pull to update it:

          git pull https://github.com/apache/superset.git

  2. Configure Superset

    1. Change the secret key in /superset/docker/.env-non-dev

    2. If any other passwords are changed (for redis, PostgreSQL, etc.), they need to match all the way through.

      • Note that other config files are involved, so there's a high probability of breaking something - and yet, it's still wise to do.

      • First time through, best to leave the defaults.

  3. Launch Superset (from the superset directory)

    • docker compose -f docker-compose-non-dev.yml pull

      • Only needed first time
    • docker compose -f docker-compose-non-dev.yml up -d

      • -d: Detaches the run from your current shell.

        • If you omit the -d, then you'll see all the logs from all containers during the startup which can be handy to watch.

        • If you omit the -d, then when you quit the shell (ctrl-c), all containers will stop.

  4. Connect to Superset in your browser

    • http://<server ip>:<port

      • default port is 8088
    • example: http://10.10.10.123:8088

    • The default login is: admin - admin

      • You should change this ASAP
  5. You should now be at the Superset home screen with example dashboards and charts.

Setup Your BI Data Connection

This is to set up a connection to your source BI/Analytics data. It is recommended to create a user in your db with read-only access and limit the access to only the data needed by your Superset users.

Out of the box, the Docker Superset will support PostgreSQL, MySQL, Presto and SQLite compatible databases. No need to load extra drivers.

  1. In Superset, go to Settings, Database Settings

  2. Click “+ Database”

  3. Choose DB type (either from the panels, or the dropdown box)

  4. Enter all the appropriate fields (be sure to name it something recognizable to you and your users)

  5. Click Connect

  6. Click Finish

Test DB Connection (sample queries)

  1. Click the SQL dropdown menu, choose SQL Lab

  2. In the Database dropdown, choose the database you added previously

  3. In the query box enter a sample query appropriate to your db such as (example only) and click the RUN button

    SELECT * FROM users

  4. Run the query multiple times in a row to see if Redis is kicking in. The search result time should decrease. Note: the time it takes to DISPLAY the results may not change, just the time it took to run the query.

Create Custom Superset Configuration

(superset_config.py)

Notes:

  • With docker compose, you can’t just create/insert your own superset_config.py file.

    • If your permissions/owner is incorrect, then the container will not start. It will constantly reboot.

    • When you down the server, the config is lost (because docker compose up rebuilds the container every time).

    • If you were to stop the server rather than down it, you could potentially do it, but you risk losing it if you forget to use stop instead of down or when updating the container in the future.

Custom Configuration How To:

  1. In your server in the superset directory (from git), go to docker/pythonpath_dev

    • The full path will look something like: <user>/superset/docker/pythonpath_dev

    • This gets picked up by docker compose up and inserted into the container in the appropriate place. Then config.py will pick it up when Superset starts.

  2. Create a file called superset_config_docker.py

    1. Note - there should be an example file that you can copy/rename if desired.

      1. If you copy/rename the example file, be sure to comment out any existing configuration lines. There will be some lines for sqlalchemy.
  3. Add your configuration lines as usual.

    1. These will override anything in the default config.py file (located in superset/superset/config.py)

    2. For an initial test, simply increase the row limits in the SQL searches:

      1. Note: these will impact performance, adjust as desired.

      2.  SQL_MAX_ROW = 1000000
         DISPLAY_MAX_ROW = 1000000
         ROW_LIMIT = 1000000
         VIZ_ROW_LIMIT = 1000000
         QUERY_SEARCH_LIMIT = 1000000
        
    3. Any typos in this file will likely cause the superset container to fail.

      1. You can watch for that by using docker compose -f docker-compose-non-dev.yml ps to see if any containers are constantly restarting (run this several times in a row).

      2. If you see a container restarting, use docker compose logs --tail=0 --follow to see errors.

        • Note: this shows logs from ALL containers started with compose
  4. Bring your container up (down them if not already) and your new config should be included.

  5. Consult this reference for additional config options: https://superset.apache.org/docs/installation/configuring-superset/

SMTP setup

  1. In your custom config file above, add the following lines:

     # smtp server configuration
     EMAIL_NOTIFICATIONS = True  # all the emails are sent using dryrun
     SMTP_HOST = "smtp.example.com"
     SMTP_STARTTLS = True
     SMTP_SSL = False
     SMTP_USER = "superset"
     SMTP_PORT = 25
     SMTP_PASSWORD = "superset"
     SMTP_MAIL_FROM = "superset@superset.com"
     # If True creates a default SSL context with ssl.Purpose.CLIENT_AUTH using the
     # default system root CA certificates.
     SMTP_SSL_SERVER_AUTH = False
     ENABLE_CHUNK_ENCODING = False
    
  2. Change the necessary lines for your SMTP server

    • The SMTP_MAIL_FROM does not need to be a “real” email address but can help you know and/or filter this source in your email client as well as whitelist these emails from your spam filter.
  3. Also add the following lines to disable dry run (which will not actually send anything, but it will log that it would have sent)

     # Enable alerts & notifications
     ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
    
  4. Your complete smtp config section will look like this:

# smtp server configuration
EMAIL_NOTIFICATIONS = True  # all the emails are sent using dryrun
SMTP_HOST = "smtp.example.com"
SMTP_STARTTLS = True
SMTP_SSL = False
SMTP_USER = "superset"
SMTP_PORT = 25
SMTP_PASSWORD = "superset"
SMTP_MAIL_FROM = "superset@superset.com"
# If True creates a default SSL context with ssl.Purpose.CLIENT_AUTH using the
# default system root CA certificates.
SMTP_SSL_SERVER_AUTH = False
ENABLE_CHUNK_ENCODING = False

# Enable alerts & notifications
ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
  1. Bring your container up (down them if not already) and your new config should be included. Again, double-check that no containers are doing the restart dance.

Test Email Notifications:

  1. Log in to Superset

  2. Go to Settings, Alerts & Reports

  3. Click the Reports tab

  4. Click the + Report button

  5. Fill out Report Name

  6. Set Report Schedule to every minute (this is just for testing, as soon as you know it's working you'll want to change this)

  7. Choose any of the example Dashboards from the dropdown list

  8. Under Notification Method, click Add Notification method

  9. Choose email from the drop down

  10. Add your email address

  11. Click the ADD button

  12. Wait 1 minute to see if you get an email

    1. You may need to check your spam and/or company quarantine to make sure the message does not get caught in one of those nets.
  13. If you do not receive any messages:

    1. Check logs

    2. Check antivirus / firewall / ACL / Security Group policies

Docker Helpful Commands:

  • Start Containers:

    • docker compose -f docker-compose-non-dev.yml up -d

      • -f - specifies config file

      • -d - detaches from shell (otherwise, when you exit the containers will stop)

  • Stop Containers:

    • docker compose stop <container name> <container name>...

    • (Preferred) docker compose down

      • Data is not lost because it’s stored in external volumes (persistent storage)

      • After a “down”, use “up” to rebuild and start containers. Rebuild should grab updates.

  • Connect to a container:

    docker exec -it <container name> /bin/bash

    • Note: containers have been generally hardened by removing most common packages (even including vim and similar text editors).
  • Access container file system when stopped (careful with this):

    • export LOCAL_DIR=$(docker inspect -f '{{ .GraphDriver.Data.UpperDir }}' <container_name>)

    • echo $LOCAL_DIR

    • Then go to <directory shown>/diff

  • List container volumes:

    docker volume ls

    docker volume inspect <volume name>

  • Container logs:

    docker compose logs --tail=0 --follow

  • Docker Container Stats:

    docker stats

Production Level Superset references: