I have decided to explore the D3.js for generating plots. This library allows to generate a great variety of plots (see the gallery) and to add interactive features and animations to better explore the data behind the plot. I have a project in mind for future posts where I will need some graphical features as those provided by D3, so I’m starting to explore it and how to integrate the plots with this webpage.
In this post I briefly address how to make a sunburst plot. This is a derivation of a pie chart (a multilevel pie chart) where subdivisions of the data are packed in outer circles. It shows the relative proportions of categories at different levels of classification in a hierarchical structure. This is commonly used, for instance, to show the disk usage in the file system, by building the hierarchy of folders and sub-folders from a root folder. This directly relates to biological taxonomy where species are classified in nested ranks. I’m taking the advantage of species taxonomy to build a sunburst plot of the distribution areas of terrestrial mammals. For that I need the species’ distributions and they can be found at the spatial download section of the IUCN Red List as shapefiles. I need to extract the relevant information from the spatial polygons and build the plot.
But first, the plot:
It has some interactive features: you can use the mouse cursor to get a more detailed label, and click on each slice to zoom a level. By clicking in the center you zoom out to the previous level.
I downloaded the distribution polygons for terrestrial mammals from the IUCN Red List webpage. These are provided in the shapefile format as a series of polygons per species. Each species might have more than one polygon indicating different attributes, such as presence or origin. I don’t filter for any of the attributes, so I’m mixing extant and extinct areas, etc.
I’m using python with some modules to import the spatial data, extract the relevant information and build a JSON hierarchy that will be used by D3 to plot the sunburst. I tried to simplify the script as far as I could so it should be easy to follow for everyone starting to code in Python with spatial data. This script has a single purpose here of providing data in a suitable format to produce the plot with D3.js, but it can be useful elsewhere with simple modifications.
The script opens the vector data in the shapefile and iterates over all available features. It extracts each polygon found in each feature, along with taxonomic information, and projects the original coordinates (geographic) to a Lambert Azimuthal Equal Area centered at polygon’s centroid before extracting the area in km². Note that some features are multipolygons (species with disjoint distributions), thus the area for that feature (species) is the sum of all areas found.
The script constructs the JSON hierarchy from the data collected as a python dictionary. The hierarchy is based on the taxonomic levels (Order, Family, Genus and Species binomial) and each node has a name and a list of children. The leaf nodes (species level) do not have children an store instead the area as value. The code to build the hierarchy is slight more complex than the rest but it basically iterates over the taxonomic levels, adding nodes and children when needed. For example Vulpes vulpes and Vulpes zerda share the same taxonomic hierarchy down to genus level. Thus if one is already added to the hierarchy, when adding the second there is no need to add other nodes in JSON except a leaf node children at Vulpes node.
The JSON is properly formatted at the end of script with the json module. The data is saved as “data.json”
This is basically a copy of the original zoomable sunburst that I adapted. I’m using D3.js v.6 for this example. It starts by defining some canvas proprieties and some needed auxiliary functions. As we only computed areas at species level before we need to compute for all other taxonomic ranks for plotting the sunburst. The area for an higher rank is the sum of the areas of the species within. This, of course, is a problem because many species have sympatric areas, thus the sum of individual areas does not correspond to the real area they occupy together. To consider it, I would have to perform spatial intersections on the python script because the JSON does not have any spatial information to perform geoprocessing tasks. The parsing of JSON data triggers the render function where all magic happens. The root object is the <div id="chart"> I have created in the beginning of the post. It then appends several groups to the <div> with the sunburst, labels and others. It then detects mouse clicks on the sunburst and proceeds with the change to display lower or higher levels with smooth transitions.
Take-home message: I still have much D3.js to explore…