How to Restructure Data in Alteryx: Files With Two Column Headers
As analysts and developers, we don’t always have control over the data we receive and how it is formatted. We still need to do our best to deliver what we can with what we get. Recently I was given the task of creating a Tableau dashboard off of a file that contained column headers split out into the first two rows. The first row of column headers contained a date field, and the second row of headers contained product hierarchy and measure name information. This is definitely not how reporting tools, such as Tableau or PowerBI would expect to see the information, so it’s going to take some Alteryx work to get where it needs to be.
1. Input the Two Header File
For better or for worse, we work with what we’ve got. Let’s input it into Alteryx and restructure our files so they can work their best in Tableau.
For the sake of this demonstration, I’ve created a sample file out of superstore data that emulates the structure of common files I’ve seen that contain two rows of headers. It looks a little something like this:
- The first row of headers: timeframe of the metric results: January 2020, February 2020, etc.
- The second row of headers: name of the dimension in the product hierarchy or the name of a measure within the timeframe from the first row.
The first thing I’m going to do is input this file into my workflow. I can do this by selecting it in the Input Tool’s file connector, or directly drag and drop the file from my file explorer.
As I pull in this file, my first row containing columns will become our initial headers. Our second row of headers will become the first row of inputted data. Because I have repetitive headers for the timeframes, Alteryx will also append numbers to each column to ensure no repeats. This is fine, and we’ll fix this in the future.
As I mentioned, our first row of inputted “data” is actually our second row of column headers. Let’s isolate those headers from our actual data into two separate streams of data. I’m going to use two sample tools: one that selects the first row, and one that skips the first row.
Our “First 1” Sample tool will now output our sample headers. We’re going to continue onward with that stream as we move forward, and will eventually revisit the “Skip 1st 1” stream, which contains all of our data.
2. Use the Transpose Data to Pivot Columns
Now we will pull in the Transpose tool. No configuration is required on this tool in this case, as we want to pivot all of the columns. I’m going to use this to turn both the column names and first row of data into rows that contain two fields:
- [Name]: the name of the original column, as it appears in our original data stream. This will be important to keep with us for future steps.
- [Value]: The first row of data we isolated.
3. Concatenate The Two Header Fields
Next, we want to concatenate our header fields to become one consolidated name for our columns moving forward. I’m going to use the Formula tool to edit our [Value] field to append our date columns to the front, with a “_” delimiter in the middle. We are also going to get rid of the appended “_2” that ended up in our repeated columns with a LEFT function.
In the case of the Category and Sub-Category field, I don’t want the F1, F2, etc. to come through, so I’ll use an IF clause to exclude those fields. The formula ends up looking like the below:
4. Use Dynamic Rename to Input New Field Names
I now have two fields that contain the original field name and the new field name. We’re perfectly set up to use one of my favorite tools: Dynamic Rename!
We are going to use both of our data streams to input this tool. We are going to connect our data stream without the headers to the L input anchor. The R input anchor is optional, but in this case, we are going to use the R anchor to input our new field names.
This will enable us to configure the tool in rename mode “Take Field Names from Right Input Rows”. We’ll take the old fields from the [Name] field, new field names as our concatenated [Value] field. This will result in our original data stream being renamed according to the new names we created. Boom – awesome!
5. Use Text-to-Columns Tool to Split Data (Optional)
Depending on what your two rows of headers indicate, you can likely stop at this step and continue on with your analysis. However, in my case, I’ll take this a step further by also turning our date portion of some of the headers into rows. It is important to be able to slice and dice by date when building a data product and getting the dates into the rows is the best way to do that.
I use the same method with the Transpose tool to turn the columns into rows, this time grouping by the key fields of [Category] and [Sub-Category]. This enables me to use the Text-to-Columns tool to split the date from the measure names on the “_” delimiter we added in our formula tool.
6. Use Cross-Tab to Group Data Fields (Optional)
After removing the original field with our Select tool and renaming our fields to be more clear ([Date] and [Measure]), we can then Cross-Tab the remaining fields, again grouping on [Category] and [Sub-Category] but this time also selecting [Date] to group by. We will use [Measure] as the column headers and [Value] as the new values. Your method for aggregation should not matter as we will not need to aggregate any further.
In the end, you should be left with a data source that has Columns for each dimension and measure, and dates for each row. This should be perfectly optimized for Tableau or any other reporting tool.
Want to Learn More?
We are a modern analytics consultancy. We enable and manage organizations’ analytics and self-service teams by educating people, optimizing technology, developing world-class products, and providing sustainable results. Curious to know how we can level up your organization’s analytics? Click here!
Interested in joining the Tessellation Team as a data analyst, Tableau expert, dashboard designer, or data scientist? We’re hiring! Check out our latest job listings on our website and on our Linkedin page!