Edward Tufte is a avid proponent of small multiple and sparkline graphs. While sparklines, which Tufte invented, are often in the data viz spotlight, small multiples might be the more versatile and helpful tool, because they excel at showing how a single measure, or quantitative variable, varies over multiple dimensions, or categories.

An excellent example is a recent Washington Post graphic on euro zone unemployment rates. Before looking at the Post's graph, consider how you might show the unemployment rate between 2007 and 2013 for 17 countries and 2 age groups (1 measure over 34 discrete dimensions).

My first thought was a line graph, coloring the lines by country. In the graph below, the main trend—rising unemployment rates for most countries, except a handful of northern European countries—is clear. Still, it's difficult to distinguish 17 lines in 17 different colors. The colors and overlap create a visual burden. Moreover, this graph wouldn't work at all for a color blind viewer or if it was printed in gray scale.


Now look at the Post's graph or my version of it below. By using multiple, small graphs, one for each country and age group, the country-specific trends are obvious. A few words and a light gray benchmark line for the US unemployment rate provide a compelling narrative about prospering and lagging countries in the euro zone. It also shows a sobering picture of youth unemployment.

 Small multiple graphs have several key design elements:

  • A consistent—and often a single—quantitative axis for all charts. A single axis connects the charts and reduces the need for viewers to ask, "Is this the same axis?" In the above example, the vertical axis for unemployment rate is shared and because of the labels, only the starting 0% point is necessary.Small multiple graphs have several key design elements:
  • A consistent design across all graphs.
  • A meaningful sequence. The Post arranged the graphs from lowest to highest current unemployment rates. This creates an implicit narrative for the graph, and makes it easier to compare and characterize countries.

Below are the steps I used to create this graph in Tableau.

The data

Getting the data can often be the most time-consuming part of creating a viz. Here it was somewhat easy, because the data was readily available in Excel format, but it still needed several steps to get it ready for Tableau.

Finding the data

Googling for "European unemployment statistics" leads to a nice write-up on the Eurostat web site.

After reading the intro and background, you'll find a link to the "data collected in the European Union Labor force survey (EU LFS), which is a continuous household survey carried out in all Member States in accordance with European legislation and on the basis of harmonized definitions." See the labour market section at
http://epp.eurostat.ec.europa.eu/portal/page/portal/euroindicators/data/database.

Conveniently, the data includes US employment numbers. The Washington Post points out that the US youth age group starts at age 16, where the EU data starts at age 15.

Get all the data now

While I only need two sets of data to reproduce the Post's analysis, I'm going to grab all nine combinations of age (all, less than 25 years, greater than 25 years) and gender (all, male, female). Perhaps the initial analysis will generate a follow-up question, and if I have the data it might just take a couple of clicks in Tableau to answer it.

Cleaning the data

Transform from wide to tall

The data includes the unemployment measure for four dimensions: country, date, gender, and age group. Like most human readable data, it's presented in a crosstab, with a row for each country and column for each date. Separate Excel worksheets segment the age and gender groups. Crosstabs are sometimes called "wide" data. They're also called "pivoted" data after the Microsoft Excel PivotTable feature that creates crosstabs.

Wide, crosstab data for human consumption

Tall data is required, or at least preferred, for most analysis software, including Tableau. Tall data has a single column for each dimension and a row for each member of the dimension's domain. Since this data set has 38 rows (countries) and 367 columns (monthly data from January 1983 to July 2013), the consolidated, tall version will have just two dimension columns for country and date, but 13,946 rows (38*367) for each of the nine members of the age and gender dimensions. Obviously not human-friendly, but perfect for Tableau, R, or Python.

Tall data for Tableau consumption

I used Tableau's Excel plugin to reshape the data from wide to tall. This is a manual tool, but perfect for one-time tasks. If I were updating the data on a regular basis, I could automate the reshaping with SQL or Python.

Finally, I added two columns, for age group and gender, and manually updated the values for these columns.

Download the Eurostat data in CSV format to get tall formatted data directly.

Other data transformations

  • Remove null rows (optional). Eurostat used ":" to indicate the unemployment rate is not available. There are two options: remove these null rows or replace the null character with an empty cell. Removing rows improves performance and simplifies some of the calculations. For example, if there are no null values, IF LAST() == 0 will reliably return the latest unemployment rate value. The main argument for keeping the null values is to break a line graph at points where there is no data. This might be helpful to explore data availability, and I'll keep the null values and filter them when opening the data source in Tableau.
  • Convert unemployment measures to percentages. Eurostat provides the unemployment rate as a percentage; for example, 25% is represented as 25 (rather than 0.25). I converted these back to decimal values, since Tableau does a good job of formatting decimal percentages with a percentage sign.
  • Dates. To get Tableau—and most other programs—to recognize the dates, I converted the year-month format to year-month-day. For example 1998M06 became 1998-06-30. While I could do this in Tableau, it's easy in Excel and I prefer to clean the data as much as possible before importing.

What I didn't change

The country names are all in good shape, except for Germany, which is listed as "Germany (until 1990 former territory of the FRG)." The regions, such as "Euro area (16 countries)" may be overly descriptive. While I could truncate these in Excel, it's just as easy to do it in Tableau and it's nice to have these notes in the data.

Final data format

The final Excel sheet looks like this. You can download it here: http://data.decidingwithdata.com/UnemploymentData.zip

Creating the graph in Tableau

Connecting to the data source

To create a new Tableau project, click Connect to data, select the Excel file created above, and then choose Import some data. This creates a Tableau Data Extract—Tableau's proprietary high-performance data store—with filter options to exclude rows with null unemployment rates.

After selecting Import Some Data, click Add... to add a filter, double-click Unemployment Rate and then select the blank value and click Exclude. (To explore the null values or to create a viz that requires null values, just create a second data connection.)


Checking measures and dimensions

After opening the data source in Tableau, check that Tableau accurately identified the dimensions and measures. I had to make a few corrections.

  • Unemployment Rate. Tableau identified the Unemployment Rate field as a string and, therefore, made it a dimension. To fix this

1. Right-click the Unemployment Rate field and select Change Data Type > Number. Right-click again and select Default Properties > Number Format > Percentage with one decimal point.

2. Move it to the Measures pane in the left Data window.

3. Right-click the measure again and change Default Properties > Aggregation to Average, since when there are multiple rates reported, we want to average them instead of sum them.

  • Date. Right-click Date and change Data Type to Date instead of Date/Time. This will simplify some of the dialog boxes when we do the fine-tuning below.
  • Germany. Right-click Country / Region and select > Default Properties > Aliases, and truncate "Germany (until 1990 former territory of the FRG)" to "Germany".

When you have multiple, similar measures, you can Ctrl-click to select them all and change the formatting simultaneously.

Initial setup

Below are a couple of setup tasks that are helpful regardless of the visualization technique.

Euro zone countries set

The data includes 33 countries and 5 region categories (variations on EU and euro zone). Since the goal is to look at just the 17 countries in the euro zone, I created a set in Tableau with just these countries: Austria, Belgium, Cyprus, Estonia, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, Malta, Netherlands, Portugal, Slovakia, Slovenia, and Spain.

I also included the United States, since the US is used as a benchmark in the graphs. To create the Euro Zone and US set, right-click Country / Region > Create Set and manually add these countries. Now you use this set instead of the Countries / Region set in the view. For example, you can drag the set to the Color card to add the colored lines, instead of dragging Countries / Region and filtering for the euro zone.

Global date filter

Since I'm focusing on unemployment rates between January 2007 to June 2013, I'll create a global date filter. Drag Date to the Filter shelf, select the date range, and click OK. Then right-click the green Date pill on the Filter shelf and select Apply To Worksheets > All Using This Data Source. The database symbol next to the date filter indicates a global, data source filter.

Small multiples

Small multiple graphs are also called trellis graphs, because they resemble the trellis grid on a window or other architectural element, but I like to say window, since Tableau uses "pane" to describe different discrete row or column dimensions. Therefore, our goal is to create panes for each country and age group and show the unemployment measure over time in each pane.

This is easy in Tableau:

  • Drag Country / Region and Date to the Columns shelf—when adding Date hold down the right mouse button and select the bottom, continuous QUARTER(Date). Since we have small graphs, we don't really need monthly data, and reducing the number of marks by two thirds will make the graphs more responsive, especially when publishing to Tableau Server or Tableau Public.
  • Drag the Euro Zone and US set to the Filter shelf.
  • Drag Age and Unemployment Rate to Rows shelf. Right-click the Over 25 years row and select Exclude.

Notice the pill colors. Country / Region and Age Group are blue, while QUARTER(Date) and AVG(Unemployment Rate) are green. Blue pills are discrete and create panes: in this case, two rows for age group and columns for each country. Green pills are continuous and create axes: the vertical axis for Unemployment Rate and horizontal axis for Date.

Adding a benchmark line

Looking at the charts next to each other is interesting, but it's difficult to compare countries separated by more than one or two columns. We'll follow the Post's example and add the US employment rate to each graph to provide both a reference point for comparison and context to help readers in the States.

Create a new calculated field Unemployment Rate US =

After entering this calculation, right-click and select Default Properties > Aggregation > Average.

Drag this to the view, next to Unemployment Rate. Again, since this is a continuous, green pill, it'll create a new horizontal axis. There's a problem, however: the line only appears in the US column, because Tableau is segmenting the data by each dimension. In the Spain column, for example, only the rows for Spain are available and the Unemployment Rate US calculation, therefore, returns null for all of these rows.

If you're used to SQL-based reporting tools, your first thought might be to add the benchmark as a new column in the data, which will make the data available to all segments. But Tableau's WINDOW_xxx() functions provide an alternative by enabling an aggregate of an aggregate calculation—here an average of the average unemployment rate for each country. Create a new calculated field Unemployment Rate US (TC) =

After entering the formula, click the blue Default Table Calculation in the upper right corner and set compute using to Country / Region. Now an average will be calculated for each dimension—the AVG(...) part of the formula—but these values will be averaged over the compute-using (also called addressing) window, in this case all of the countries.

Replace Unemployment Rate US pill with this new table calculation field. Now there is a benchmark line for every country.

To get both lines on the same graph in different colors, replace the two pills on the Rows shelf with Measure Values and put Measure Names on the Color card. I prefer the Post's gray scale color palette. Change the colors by clicking the pull-down menu in the color legend and selecting Edit Colors.

You may also want to reduce the thickness of the lines by clicking Size and moving the slider slightly to the left.

Finally, hide the column for the United States by right-clicking the column and selecting Hide, which will remove it from the visual layer but—unlike the Exclude option—it keeps the data in the worksheet for use by the other fields.

Sort Order

The sequence of the graphs in a small multiple design is essential to reducing the visual burden for the viewer. By default, the countries are sorted alphabetically. Sorting them from lowest to highest unemployment makes it easier on the reader and provides a nice visual narrative.

There are two ways to sort in Tableau:

  1. Specify the sort order in the dimension. This is the easiest approach, but you cannot sort by fields that include table calculations.
  2. Add a blue, discrete pill just left of the dimension you want to sort, which creates a new set of panes sorted from lowest to highest.

I'll show both below, but I used the second method, since I want to sort by the last unemployment value in the view, which requires the LAST() table calculation function.

Specifying sort order in dimension

Dimensions such as Country / Region can be sorted by the value of another field, such as the maximum unemployment rate. Right-click the blue pill Country / Region and select Sort and choose Ascending sort order and Sort by field: Unemployment Rate (Maximum).

Below is the result and shows the problem. While Italy has a higher current unemployment rate than Estonia, Estonia had a higher rate in 2010. The MAX() function returns this higher value. 

Sorting by a table calculation

When specifying the sort order above, you may have noticed that the fields with table calculations, such as Unemployment Rate Last, were not listed. Tableau doesn't support sorting by a field with a table calculation. But there is a workaround. By putting a discrete (blue pill) copy of the table calculation field just to the left of the dimension you want to sort, Tableau will create a new set of panes for this dimension, sorted from lowest to highest.

Before dragging Unemployment Rate Last to the Columns shelf, however, remember that we had to use a WINDOW_xxx() function above to get the US unemployment rate line for each column. We need to do something similar here, since Unemployment Rate Last will return null for every date, except the last. To fix this, create a new field Unemployment Rate Sort =

When adding this, click the Default Compute Using and move both Age and Date over to the addressing side. Now it will find the max value for each Country / Region over all age groups and dates. Click OK and then right-click this field and select Convert to Discrete.

Now add it to the view, just to the left of Country / Region. The pill will be red because Date is not in the view. Click the pill, select Edit Table Calculation > Advanced > Compute Using and move QUARTER of Date over to the addressing side, underneath Age. Click OK and the columns should now be sorted correctly.

Except for the header, the second set of panes is not conspicuous, because the size of this dimension is equivalent to the dimension just to the right. And we can hide the header by clicking Unemployment Rate Sort and deselecting Show Header.

Now we're getting close to the Post's graph, at least it has all of the components now.

Adding labels

Notice that the Post's graphic adds a circle and label for the latest unemployment value for each country. For most viewers, this is probably the most interesting piece of data. The labels also reduce the need for the y-axis labels and grid lines, which is particularly important in this wide graph.

Adding end points and labels would be easy if the graph only had one line—just click the Label card and select Show mark labels and Label end of line. But I have two lines and I don't want to label the gray US line. The workaround is to create a second graph with just the circle and label and combine this with the original graph using Tableau's dual axis feature.

We only want one mark per pane: Unemployment Rate Last =

When entering this formula, select Compute using > Date. Drag the new field to the Rows shelf. On the Marks card for this field, change the mark type to Circle. Click Color and select black for the circles and white for the border. Click Size to increase the size slightly. Finally, click Label and select Show mark labels and set the Alignment to Top Right.

Next, add the circles to the primary graph by right-clicking the Unemployment Rate Last pill and selecting Dual Axis. Finally, right-click the right axis and select Synchronize Axis and then right-click again and deselect Show Header. You may also need to reset the colors by clicking the color legend.

Finishing details

Getting this graph in Tableau takes maybe thirty minutes, which is one of the strengths of Tableau. You can and should try many variations on a theme before working on the fine details. These details may require more time than the initial draft, but can make a big difference in the final product. Below are the changes I made, organized by benefit over cost measured in terms of time and complexity.

  • Change font default to 9pt Segoe UI. For screen displays, I prefer 9pt Segoe UI to Tableau's default 8pt Arial. The easiest way to change the default font is from the top menu bar select Format > Font and on the left pane, select Sheet > Default > Worksheet. 

  • Change Country / Region header font to 9pt Segoe UI Semibold. While I'd like a larger font, this graph has a horizontal constraint, and I'll settle for semibold. Right-click one of the countries in the header and select Format > Header.
  • Hide the Age header. This takes up valuable horizontal space, and it'll work better to put the labels in the graph. Right-click the Age pill and deselect Show Header.
  • Add annotations for title and age groups. I placed these in the upper left corner of each graph. I also made the boxes very wide with white fill to hide the left part of the horizontal grid lines.
  • Remove the y-axis title. Given the graph's title of "Unemployment Rate" and the fact there is only one measure on the view, we can safely remove the label from the y-axis. Click the y-axis > Edit Axis and delete the contents in the Title box. In general, vertical labels are difficult to read, and I usually remove them or replace them with a horizontal label on the highest value.
  • Edit y-axis tick-mark format. If the y-axis tick marks are not in percentiles, click the right axis, select Format > Axis > Scale and change Numbers to percentage with no decimal point.
  • Edit y-axis range. To provide more white space between the rows and for the highest labels, right-click the y-axis and select Edit Axis > General and select Fixed Axis with Start and End values of 0 and 0.75, respectively.
  • Optional: Hide y-axis. The y-axis is not very effective in wide graphs, nor is it necessary, since we've labeled the end point of each graph. Instead, I added a 0% mark using a floating text object on the Tableau dashboard (described below under Publishing).
  • Remove Country / Region field label. Right-click "Country / Region" label on top of the viz and select Hide Field Labels for Columns.
  • Remove the x-axis title. Again, it's clear the x-axis shows the date.
  • Edit x-axis date format. Click the pull-down menu on the Date pill, select Format > Axis > Date format and select Custom: 'YY to generate '07, '08, ... Next, right-click the x-axis and select Edit Axis > Tick Marks and set the Major tick mark to Fixed, Every 6 Years, and Tick Origin of 2007-01-01. Now the axis will have a tick mark at '07 and '13.
  • Edit x-axis range. To get additional white space between the columns, right-click the x-axis > Edit Axis > General and select a Fixed Axis with Start and End dates of 2006-07-01 and 2014-07-01, respectively. As with the tip above, this only works if the date range is static for this view.
  • Remove borders around panes and vertical grid lines. The column dividers are not necessary given the vertical grid lines, and my preference was to remove the vertical grid lines for a cleaner look. Click Format > Borders > Columns and, at the bottom Column Divider section, select None for both Pane and Header. To remove the grid lines, click Format > Lines > Columns > Lines and select None for Grid Lines.
  • Reduce column width. All of the columns must be visible for efficient comparison (no horizontal scroll bars). I manually adjusted the width so that the titles for all countries (except the Netherlands and Luxembourg) fit on one line.
  • Hyphenate Netherlands and Luxembourg. Tableau doesn't hyphenate, but we can manually add a hyphen in a new Country / Region field. I could also edit the alias of Country / Region, but I want unhyphenated names for tool tips, etc. On the Columns shelf, replace the current Country / Region pill with Country / Region (hyphenated)
  • Post-hyphenation updates. After replacing the Country / Region with the hyphenated field, Unemployment Rate US (TC) will turn red; fix this by right-clicking and selecting Compute Using > Country / Region (hyphenated). You'll also need to right-click the new headers and select Formatting > 9pt Segoe UI Semibold and Alignment > Wrap > On. Finally, adjust the height of the header row to enable wrapping. 

  • Move mark labels. For countries with unemployment rates below the U.S. rate, the gray U.S. reference line obscures the label. Click these labels and drag them above the U.S. line.

  • Tooltips. I'm using minimal tooltips with just the date and unemployment rate, which simplifies the design and prevents the tooltips from getting in the way of each other. There is an issue, however: when adding QUARTER(Date) to the tooltip, I get '07 instead of 2007 Q2. This is a result of my x-axis formatting. To fix this, I drag Date to the Tooltip card and then right-click the pill > Measure (Minimum). Right-click again > Format > Pane and select 2001 Q1. Now edit the Tooltip with this new field: <MIN(Date)>: <Measure Values>.
  • Annotate mark with U.S. rate. Rather than add a legend to show that the gray line represents the U.S. unemployment rate, I added a gray text annotation. This is cleaner, and allows me to also show the U.S. rate. You could manually add a text annotation, but Tableau can annotate a mark using fields in the worksheet. Hover over the last mark on the gray U.S. Unemployment line where you want the label. Right-click the mark > Annotate > Mark. Edit the annotation—it may take a couple of tries to get the correct fields—and format to 9pt Segoe UI Bold. After adding the annotation, I removed the arrow by right-clicking the annotation > Format and selecting None for both Line and Line End.

  • Highlight Spain in orange. The Post's original graph highlighted Spain, which was the focus of an accompanying article. To do this,
    1. Create a new Boolean calculated field Spain? = 

    2. Ctrl-click both this new field and Measure Names and drag them to the Color card on the All marks cards. 
    3. You'll notice that the gray US benchmark line disappears for Spain. Fix this by clicking Unemployment Rate US (TC) in the Measure Values shelf and selecting Edit Table Calculation > Compute Using > Advanced and move Spain? to the right, Addressing box.
    4. Lastly, click the color legend to reset the various color combinations.


After making the above changes, we have our final graph:

Publishing

To be complete, the graph needs a title and footer with the data source information and, in this case, attribution to the original Post graphic.
Create a new dashboard and drop the final worksheet. I almost always fix the size of the dashboard to prevent browser sizing issues. Here I made the dashboard 876 pixels wide—as narrow as possible without causing the country names to wrap.

I also took advantage of Tableau's floating text objects to add 0% labels on the y-axis. Hold the Shift key and drag a Text object to the dashboard view.

Interactivity and click for details

Other than the tooltips, the electronic version of this graph is identical to the static version. Generally, this is a good design strategy—for most audiences, a data viz should not require interactivity.

We can take advantage of the fact we're presenting this over the web instead of in a newspaper, however. One simple idea is to allow users to investigate different time periods. Another is to explore countries in greater detail.

I'll leave these for another day. 

Link to this page
  • No labels

1 Comment

  1. Anonymous

    Just discovered your blog, hope to see more viz from you in the future. You are doing great!