Interactive sunburst plot of the distribution areas of land mammals

8 minute read

I have decided to explore the D3.js for generating plots. This library allows to generate a great variety of plots (see the gallery) and to add interactive features and animations to better explore the data behind the plot. I have a project in mind for future posts where I will need some graphical features as those provided by D3, so I’m starting to explore it and how to integrate the plots with this webpage.

In this post I briefly address how to make a sunburst plot. This is a derivation of a pie chart (a multilevel pie chart) where subdivisions of the data are packed in outer circles. It shows the relative proportions of categories at different levels of classification in a hierarchical structure. This is commonly used, for instance, to show the disk usage in the file system, by building the hierarchy of folders and sub-folders from a root folder. This directly relates to biological taxonomy where species are classified in nested ranks. I’m taking the advantage of species taxonomy to build a sunburst plot of the distribution areas of terrestrial mammals. For that I need the species’ distributions and they can be found at the spatial download section of the IUCN Red List as shapefiles. I need to extract the relevant information from the spatial polygons and build the plot.

But first, the plot:

It has some interactive features: you can use the mouse cursor to get a more detailed label, and click on each slice to zoom a level. By clicking in the center you zoom out to the previous level.

The data

I downloaded the distribution polygons for terrestrial mammals from the IUCN Red List webpage. These are provided in the shapefile format as a series of polygons per species. Each species might have more than one polygon indicating different attributes, such as presence or origin. I don’t filter for any of the attributes, so I’m mixing extant and extinct areas, etc.

The conversion

I’m using python with some modules to import the spatial data, extract the relevant information and build a JSON hierarchy that will be used by D3 to plot the sunburst. I tried to simplify the script as far as I could so it should be easy to follow for everyone starting to code in Python with spatial data. This script has a single purpose here of providing data in a suitable format to produce the plot with D3.js, but it can be useful elsewhere with simple modifications.

I’m using ogr and osr from osgeo module included in the python GDAL package. The ogr provides the necessary infrastructure to open spatial vector data and the osr provides tools to manipulate the spatial reference systems. The json module allows to convert a python dictionary to a JSON string which facilitates the importation later in the javascript code. There are modules with simpler interfaces (e.g. geopandas) that might make this process easier and with fewer lines of codes. I got used to the osgeo that allows a more basal access to the data.

The script opens the vector data in the shapefile and iterates over all available features. It extracts each polygon found in each feature, along with taxonomic information, and projects the original coordinates (geographic) to a Lambert Azimuthal Equal Area centered at polygon’s centroid before extracting the area in km². Note that some features are multipolygons (species with disjoint distributions), thus the area for that feature (species) is the sum of all areas found.

The script constructs the JSON hierarchy from the data collected as a python dictionary. The hierarchy is based on the taxonomic levels (Order, Family, Genus and Species binomial) and each node has a name and a list of children. The leaf nodes (species level) do not have children an store instead the area as value. The code to build the hierarchy is slight more complex than the rest but it basically iterates over the taxonomic levels, adding nodes and children when needed. For example Vulpes vulpes and Vulpes zerda share the same taxonomic hierarchy down to genus level. Thus if one is already added to the hierarchy, when adding the second there is no need to add other nodes in JSON except a leaf node children at Vulpes node.

The JSON is properly formatted at the end of script with the json module. The data is saved as “data.json”

import os
from osgeo import ogr
from osgeo import osr
import json

mammalsFile = "TERRESTRIAL_MAMMALS.shp"

driver = ogr.GetDriverByName('ESRI Shapefile')

dataSource = driver.Open(mammalsFile, 0)
layer = dataSource.GetLayer()
layerDefn = layer.GetLayerDefn()
srcRef = layer.GetSpatialRef()

proj = "+proj=laea +lon_0={} +lat_0={}"

n = layer.GetFeatureCount()

root = {"name":"Mammalia", "children":[]}
for feature in layer:
    print(n, end="      \r")
    taxa = [feature.GetField("order_").capitalize(),
            feature.GetField("family").capitalize(),
            feature.GetField("genus").capitalize(),
            feature.GetField("binomial").capitalize()]

    # Calculate Area
    tgtRef = osr.SpatialReference()
    geom = feature.GetGeometryRef()

    # Get all geometries
    geoms = []
    if geom.GetGeometryName() == 'MULTIPOLYGON':
        for geom_part in geom:
            geoms.append(geom_part.Clone())
    else:
        geoms.append(geom.Clone())

    # Transform geometries and get area in km²
    area = 0
    for g in geoms:
        ctr = g.Centroid()
        ok = tgtRef.ImportFromProj4(proj.format(*ctr.GetPoint()))
        transform = osr.CoordinateTransformation(srcRef, tgtRef)
        ok = g.Transform(transform)
        area += g.GetArea()  / 1000000

    # Create hierarchy
    node = root
    for i in range(len(taxa)):
        found = False
        for child in node["children"]:
            if child["name"] == taxa[i]:
                node = child
                found = True
                break
        if not found:
            if i == len(taxa) - 1:
                node["children"].append({"name":taxa[i], "value":area})
            else:
                node["children"].append({"name":taxa[i], "children":[]})
                node = node["children"][-1]
    n -= 1

outFile = "data.json"
datajson = json.dumps(root) #, indent=1)
with open(outFile, "w") as stream:
    stream.write(datajson)

The plot

I do not enjoy programming in javascript as I do in other languages but maybe this is just related to my lack of experience. However, not liking a programming language doesn’t make it less ubiquitous and a useful language to explore. And javascript seems to be nearly everywhere these days!

This experiment with javascript and D3.js had two purposes: first gain some working knowledge on how to draw with D3 and second, check if it is adequate to a project I have in mind. I did like the very raw access to the svg canvas. The code flow, however, did not seem so easy but this is not the simplest example to start with…

This is basically a copy of the original zoomable sunburst that I adapted. I’m using D3.js v.6 for this example. It starts by defining some canvas proprieties and some needed auxiliary functions. As we only computed areas at species level before we need to compute for all other taxonomic ranks for plotting the sunburst. The area for an higher rank is the sum of the areas of the species within. This, of course, is a problem because many species have sympatric areas, thus the sum of individual areas does not correspond to the real area they occupy together. To consider it, I would have to perform spatial intersections on the python script because the JSON does not have any spatial information to perform geoprocessing tasks. The parsing of JSON data triggers the render function where all magic happens. The root object is the <div id="chart"> I have created in the beginning of the post. It then appends several groups to the <div> with the sunburst, labels and others. It then detects mouse clicks on the sunburst and proceeds with the change to display lower or higher levels with smooth transitions.

const width = 600,
  height = 600,
  radius = Math.min(width, height) / 6;

const color = d3.scaleOrdinal(d3.quantize(d3.interpolateRainbow, 27));

function partition(data) {
  const root = d3.hierarchy(data)
    .sum(d => d.value)
    .sort((a, b) => b.value - a.value);
  return d3.partition()
    .size([2 * Math.PI, root.height + 1])
    (root);
};

const arc = d3.arc()
  .startAngle(d => d.x0)
  .endAngle(d => d.x1)
  .padAngle(d => Math.min((d.x1 - d.x0) / 2, 0.005))
  .padRadius(radius * 1.5)
  .innerRadius(d => d.y0 * radius)
  .outerRadius(d => Math.max(d.y0 * radius, d.y1 * radius - 1));

const format = d3.format(",d");

// Parses json to build and renders
d3.json("https://ptarroso.github.io/assets/resources/MammalsArea.json")
  .then((data) => {
    render(data);
  });

function render(data) {
  const root = partition(data);

  root.each(d => d.current = d);

  const svg = d3.select("#chart").append("svg:svg")
    //.attr('viewBox', [0, 0, width, height])
    .attr("width", width)
    .attr("height", height)
    .style("font", "8px sans-serif");

  const g = svg.append("g")
    .attr("transform", `translate(${width / 2},${height / 2})`);

  const path = g.append("g")
    .selectAll("path")
    .data(root.descendants().slice(1))
    .join("path")
    .attr("fill", d => {
      while (d.depth > 1) d = d.parent;
      return color(d.data.name);
    })
    .attr("fill-opacity", d => arcVisible(d.current) ? (d.children ? 0.6 : 0.4) : 0)
    .attr("d", d => arc(d.current));

  path.filter(d => d.children)
    .style("cursor", "pointer")
    .on("click", clicked)

  path.append("title")
    .text(d => `${d.ancestors().map(d => d.data.name).reverse().join("/")}\n${format(d.value)}`);

  const ctext = g.append("g")
  ctext.append("text")
    .attr("text-anchor", "middle")
    .style("font", "14px sans-serif")
    .style("font-weight", "bold")
    .text(root.current.data.name);

  const label = g.append("g")
    .attr("pointer-events", "none")
    .attr("text-anchor", "middle")
    .attr("user-select", "none")
    .selectAll("text")
    .data(root.descendants().slice(1))
    .join("text")
    .attr("dy", "0.35em")
    .attr("fill-opacity", d => +labelVisible(d.current))
    .attr("transform", d => labelTransform(d.current))
    .text(d => d.data.name);

  const parent = g.append("circle")
    .datum(root)
    .attr("r", radius)
    .attr("fill", "none")
    .attr("pointer-events", "all")
    .on("click", clicked);

  function clicked(event, p) {
    parent.datum(p.parent || root);

    root.each(d => d.target = {
      x0: Math.max(0, Math.min(1, (d.x0 - p.x0) / (p.x1 - p.x0))) * 2 * Math.PI,
      x1: Math.max(0, Math.min(1, (d.x1 - p.x0) / (p.x1 - p.x0))) * 2 * Math.PI,
      y0: Math.max(0, d.y0 - p.depth),
      y1: Math.max(0, d.y1 - p.depth)
    });

    const t = g.transition().duration(750);

    path.transition(t)
      .tween("data", d => {
        const i = d3.interpolate(d.current, d.target);
        return t => d.current = i(t);
      })
      .filter(function(d) {
        return +this.getAttribute("fill-opacity") || arcVisible(d.target);
      })
      .attr("fill-opacity", d => arcVisible(d.target) ? (d.children ? 0.6 : 0.4) : 0)
      .attrTween("d", d => () => arc(d.current));

    label.filter(function(d) {
        return +this.getAttribute("fill-opacity") || labelVisible(d.target);
      }).transition(t)
      .attr("fill-opacity", d => +labelVisible(d.target))
      .attrTween("transform", d => () => labelTransform(d.current));

    ctext.selectAll("text")
      .transition(t)
      .text(p.data.name);
  };

  function arcVisible(d) {
    return d.y1 <= 3 && d.y0 >= 1 && d.x1 > d.x0;
  };

  function labelVisible(d) {
    return d.y1 <= 3 && d.y0 >= 1 && (d.y1 - d.y0) * (d.x1 - d.x0) > 0.03;
  };

  function labelTransform(d) {
    const x = (d.x0 + d.x1) / 2 * 180 / Math.PI;
    const y = (d.y0 + d.y1) / 2 * radius;
    return `rotate(${x - 90}) translate(${y},0) rotate(${x < 180 ? 0 : 180})`;
  };
};

Take-home message: I still have much D3.js to explore…