APIs and Alteryx: The Basics

So you want to pull data into an Alteryx workflow, but there isn’t an ODBC driver or native connector. Don’t despair! The next best option is using a custom API call to pull data into Alteryx.

Getting Started

Requirements to start:

  1. Alteryx Designer
  2. Access to an API.
  3. Access to the API’s developer portal.

Note that many APIs require you to sign up and many require a subscription.

For this example I will be using the Marvel Comics API. This is a free to use API, but does require users to sign up and be approved before getting access. Please always be sure to review the terms of the API you are using.

Be a good citizen!

Create a working API call.

First, gather the following information from the developer’s portal:

  • Base Service Endpoint
  • API Keys
  • GET Statement Code
  • Required Parameters
  • Optional Parameters
  • Known the restrictions of the API

Note that there are different types of APIs. Please read the information on the API’s developer portal website to understand specific requirements.

My goal is to create a list of all new 2018 Marvel comics series.

There are a couple of restrictions that I need to work around. First, I can only do 3000 or less API calls a day. That shouldn’t be an issue due to the volume I am expecting. The second issue is that I can only return 100 records per call. This will be a problem, as Marvel has released hundreds of new comic series a year. I will need to work around this problem

My workaround will be doing 26 calls, one for each letter in the alphabet. Each call will only return series that start with the specified letter.

Now, this is not a full-proof method. Theoretically, Marvel could have released 250 new series that start with the letter S and the limit would be 100 per call. Alternatively, and series title could start with a numeric character. For this post’s example, we will ignore these issues for simplicity.

Here is the required information I collected form Marvel’s developer website:

Base Service Endpoint:

https://gateway.marvel.com/

This is the base URL in which all API calls for Marvel will be made against.

API Keys:

When subscribed to the API portal, Marvel provides two API keys for each user. A public API key and a private API key. Each key is a long string of characters.  For security, I won’t be providing my actual keys in this post. We will use these dummy keys for example.

Public Key: 123456789

Private Key: 987654321

GET Statement:

The most common statement we give an API is the GET statement. This grabs information for us. For me, I want to grab series titles.

Code for grabbing all series: /v1/public/series

Additional Required Parameters:

Required parameters will vary from API to API. For Marvel, the additional required values are:

Timestamp (TS)

A unique string key that can change for every set of calls. Usually represents a date or date-time. value For this example, we will use a static timestamp value that represents December 31, 2018 as a string text value.

ts=20181231

Hash

Hash is a little more complicated. For Marvel, it is the text concatenation of three fields (ts + prviateKey + publicKey).

Concatenated string:
“20181231987654321123456789”.

Once concatenated, we need to encrypt the new string using the MD5 algorithm.

Marvel requires us to use MD5. MD5 is one (of many) methods for encrypting things like passwords and user names. This means our completed formula is md5(ts+privateKey+publicKey).

Luckily, Alteryx has a function that can convert to MD5. The function is MD5_ASCII().

MD5_ASCII(20181231987654321123456789) = 45bc05a748f50f9d7a8351acae3991e3

This returned number may seem like gibberish, but the Marvel API will know what it means. The API will un-encrypt the value and look at the following:

  • Make sure the Public and Private APIs exist and are valid.
  • Make sure the timestamp

Additional Optional Parameters:

There are some additional optional parameters we are going to use. These are to confirm we are getting only the data we need and that we are not going over the 100 record limit with each API call.

First Character in Title

titleStartsWith=A

We are going to start with A and iterate through the other letters.

First Year Published

startyear=2018

This won’t change, as we are only looking for data from 2018 for comic series that launched.

Record Limit

limit=100

With the combination of filter parameters above, it would be rare to go over the 100 record limit. However, we need to be safe just in case. If we don’t do this there could be an error.

Building URLs via an Alteryx Workflow

Our goal is to build 26 URLs, one for each letter in the alphabet, that will pull comic series data from the API. Each pull will get new 2018 comic series that start with that letter of the alphabet.

Step 1

To start, I would recommend using a text input tool that takes our static inputs.

Workflow so far:

Configuration of Text Input tool:

Note that startswith, timestamp (ts), and hash are not included as they are dynamic and need to be calculated by Alteryx.

Step 2

We need a list of all alphabetical characters. I would recommend the Text Input Tool again.

Workflow so far:

Configuration of second Text Input tool:

Step 3: Append

We can use the append tool in Alteryx to create a record with all of the static variables in text input 1 with the 26 letter inputs in text input 2.

Workflow so far:

Note that the first text input, which contains the static variables, goes to the S input for Source and the letter text goes to the T input for Target. If reversed, Alteryx will give us an error.

Configuration:

Output so far:

Step 4: Calculate Timestamp Value

We need to calculate the timestamp (ts) value dynamically. This example was made on 12/31/2018, so it should return  ts=20181231. Note this is dynamic and will return a ts value that changes for the run date of the workflow.

Workflow so far:

Configuration:

Results so far:

Step 5: Confirm data types.

For the next calculation we need to do, we will have a data type issue. We need to put a seelct between the append tool and the formula tool. All data types need to be a text string.

Workflow so far:

Configuration:

Step 6: Create the hash value.

Remember, to create the hash value we need to do the calculation MD5_ASCII(20181231987654321123456789), assuming a run date of 12/31/2018. This concatenates the ts value, private key, and public key values. We then convert it to MD5, which is a form of simple encryption.

We will use the same formula tool as before:

Configuration:

Results so far:

Step 7: Create URL

We now have all of the values we need for the URL. We will use a formula tool to combined them together.

We will use the same formula tool as before:

Configuration:

Results for far:

Step 8: Download Data from API

We are finally ready to download data from the API using the download tool.

Workflow so far:

Configuration:

Final Results:

Your final results now contain the Downloaded data column. This contains all of the data we are looking for. However, it is in a difficult to read semi-structured form called JSON.

What’s Next?

Just because we have data from an API, it doesn’t mean that it is usable. Next time we will talk about parsing JSON files to compete the workflow.

Comments are closed.