## Tutorial: Circular Sankey Diagram in Tableau

So you are probably tired of regular old sankey diagrams. What If I told you there is a way to take your Sankey’s to the next level by combining it with something everyone loves: circles.

Before I start the tutorial, I want to provide a bit of code to prep the data (and a link to the data itself it you don’t care about the prep. I used superstore data and summarized it to show the total number of products ordered by customer. Also you may want to go visit my previous tutorial on chord diagrams because some of the same principles apply and I am only going to reference them.

library(tidyverse) library(magrittr) product <- rio::import("/2018.3/en_US-US/Sample - Superstore.xls") %>% as_tibble() product %>% group_by(`Customer Name`, `Product Name`) %>% summarise(Quantity = sum(Quantity)) %>% left_join( product %>% group_by(`Customer Name`) %>% summarise(cust_quantity = sum(Quantity)) %>% ungroup() %>% arrange(desc(cust_quantity)) %>% mutate(cust_rank = row_number(-cust_quantity)) ) %>% left_join( product %>% group_by(`Product Name`, Category) %>% summarise(prod_quantity = sum(Quantity)) %>% ungroup() %>% arrange(desc(prod_quantity)) %>% mutate(prod_rank = row_number(-prod_quantity)) %>% group_by(Category) %>% mutate( prod_cat_rank = row_number(-prod_quantity), tmp1 = max(ifelse(Category == "Office Supplies", prod_cat_rank, 0)), tmp2 = max(ifelse(Category == "Furniture", prod_cat_rank, 0)) ) %>% ungroup() %>% mutate( tmp1 = max(tmp1), tmp2 = max(tmp2), prod_cat_rank = ifelse(Category == "Office Supplies", 0, tmp1) + prod_cat_rank, prod_cat_rank = ifelse(Category == "Technology", tmp2, 0) + prod_cat_rank, tmp1 = NULL, tmp2 = NULL, one = 1 ) ) %>% write_csv("/cust_prod_mapping.csv") data_frame( one = 1, t = c(seq(0,1, by = .01), seq(2,7)) ) %>% write_csv("/generic_data_model.csv")

Here is what the data will look like:

# A tibble: 9,941 x 10 # Groups: Customer Name [?] `Customer Name` `Product Name` Quantity cust_quantity cust_rank Category prod_quantity prod_rank prod_cat_rank one1 Aaron Bergman Akro Stacking Bins 2 13 752 Office Su… 13 1295 777 1 2 Aaron Bergman "Carina 42\"Hx23 3/4\"W … 3 13 752 Office Su… 22 691 441 1 3 Aaron Bergman Global Push Button Manag… 1 13 752 Furniture 44 62 1070 1 4 Aaron Bergman Newell 330 3 13 752 Office Su… 11 1472 866 1 5 Aaron Bergman Samsung Convoy 3 2 13 752 Technology 9 1576 1759 1 6 Aaron Bergman Sauder Facets Collection… 2 13 752 Furniture 10 1529 1387 1 7 Aaron Hawkins "ACCOHIDE 3-Ring Binder,… 3 54 266 Office Su… 44 60 42 1 8 Aaron Hawkins Avery 51 3 54 266 Office Su… 16 1081 649 1 9 Aaron Hawkins Avery Fluorescent Highli… 7 54 266 Office Su… 24 577 369 1 10 Aaron Hawkins EcoTones Memo Sheets 2 54 266 Office Su… 15 1165 705 1 # ... with 9,931 more rows

The most important thing is that you have two sets of ranks: a source rank (customer) and a destination rank (product).

In order to make the Sankey we need to do some basic data modeling, so here’s my data model:

data_frame( one = 1, t = c(seq(0,1, by = .01), seq(2,3)) ) %>% write_csv("/generic_data_model.csv")

The data created has two columns, the first column, labeled one, is a column of the integer 1. We’ll use it for joining to the other dataset. The other column, t, counts up from 0 to 1 by .01, and then has additional values of 2 and 3.

Let’s join the two datasets together

`[separator]`

.// Separator IF [T] <= 1 THEN "chord" ELSEIF [T] = 2 THEN "start point" ELSEIF [T] = 3 THEN "end point" END

`[Select Rank]`

, it’s an integer that run from 0 to 400. Then we have this calculation:[Prod Rank] <= [Rank Select] AND [Cust Rank] <= [Rank Select]

We can put it on filters right away. Now let’s make our Circular Sankey Diagram.

First the setup. This formula will allow us to make the pretty curved lines everyone gets excited about.

// t (logodds) IF [T] > 0 AND [T] < 1 THEN LOG([T]/(1-[T])) END

`[path]`

and the other `[distance]`

. **We’re going to come back to this path formula and make updates so this isn’t final.**

//path (to be updated later) IF [T] <= 1 THEN (([Prod Rank] - [Cust Rank])*[T]) + [Cust Rank] ELSEIF [T] = 2 THEN [Cust Rank] ELSEIF [T] = 3 THEN [Prod Rank] END/[Select Rank]

//distance IF [T] <= 1 THEN 2.5 -((1/(1 + exp([2a. t (logodds)])))) * 2.5 ELSEIF [T] = 2 THEN .23 ELSEIF [T] = 3 THEN 2.3 END

This produces the following sankey diagram. This is nice and all, but I want **CIRCLES**!

`[path]`

is basically our angle. And `[distance]`

is our radius. So we can throw these formulas into a basic formula for a circle//x ([distance] + 1) * COS([path] * 2 * pi())

//y ([distance] + 1) * SIN([path] * 2 * pi())

`[size]`

calculation for sizing the circles and the chords.// size IF CONTAINS([1b. separator], "bar") THEN 20 ELSEIF CONTAINS([1b. separator], "start point") THEN {FIXED [Cust Rank]: SUM(IF [T] = .5 then [Cust Quantity] end)} * 3 ELSEIF CONTAINS([1b. separator], "end point") THEN {FIXED [Prod Rank]: SUM(IF [T] = .5 THEN [Prod Quantity] end)} * 3 ELSEIF [1b. separator] = "chord" THEN 100 ELSE 5 END

With this, its time to build!

Create a line chart. Add `[customer name]`

,

`[separator]`

, `[number of records]`

and `[customer rank]`

all to detail as dimensions. Add `[y]`

to rows and `[x]`

to columns as continuous values add `[t]`

to path and `[size]`

to size.Now let’s class it up. Let’s create a set action on customer rank. Create a set with customer rank, add it to color, make it an action. This will allow you to select a customer and see the products the customer has purchased. Its a nice touch, but if you are like me the one thing that’s bothering me is the “bite” in the chart and the fact that the path from customer to product isn’t taking the most efficient route.

First we need to identify the cases at issue. In some cases we have a path that goes counter-clockwise but it should be going clockwise via a shorter path. The second issue is the the opposite, your classic visa-versa!

So how do we idenifty these cases — its actually really easy. If

`([cust rank] - [prod rank])/[rank select]`

is greater than .5 or `([cust rank] - [prod rank])/[rank select]`

is less than -.5 then the paths should be flipped.Lets first indentify these cases with a

`[Rank Delta]`

calculation: //Rank Delta (([Prod Rank]/[1a. Rank Select]) - ([Cust Rank]/[1a. Rank Select]))

`[Rank Select]`

values to the product ranks. This is the beauty of circle, to change directions we just need to add or subtract the overall total.//path IF [T] <= 1 THEN IF [Rank > 50%] < -.5 THEN ((([Prod Rank] + [1a. Rank Select]) - [Cust Rank])*[T]) + [Cust Rank] ELSEIF [Rank > 50%] > .5 THEN ((([Prod Rank] - [1a. Rank Select]) - [Cust Rank])*[T]) + [Cust Rank] ELSE (([Prod Rank] - [Cust Rank])*[T]) + [Cust Rank] END ELSEIF [T] = 2 THEN [Cust Rank] ELSEIF [T] = 3 THEN [Prod Rank] END/[Select Rank Parameter]

`TRUE`

.//filter duplicated dots IF ([T] = 2 AND [Prod Rank] = {FIXED [Cust Rank] : MIN([Prod Rank])}) THEN TRUE ELSEIF (NOT [Cust Rank | Set] AND [T] = 3 AND [Cust Rank] != {FIXED [Prod Rank] : MIN([Cust Rank])}) THEN TRUE ELSEIF [1b. separator] = "chord" THEN TRUE ELSEIF [1b. separator] = "end point" AND [Cust Rank | Set] THEN TRUE END