<iframe src="https://www.googletagmanager.com/ns.html?id=GTM-T93P62" height="0" width="0" style="display:none;visibility:hidden">

Agilos tech blog

header.jpg

Dismiss the VLOOKUP's

Posted by Jeremy Jean on 02-Mar-2016 14:50:00
Jeremy Jean

Since we started the Alteryx workshops at our prospects and customers, I realized the majority of them was using several Excel spreadsheets linked together by the most famous Excel function: VLOOKUP.

For those who don’t know that function, Youtube resarch for ‘VLOOKUP’ gives 128 000 results, some of them are really nice explained. Here’s a simple view of how it works

vlookup1.png

 

This function is mainly use in the BI area to complete fact tables with dimensional data. For example, fact data has a ProductID, but we would like to filter on the ProductCategoryName that is saved in the Product table. Another use of the function is to retrieve the corresponding value in a mapping table.

But, Excel has its limitations and questions can be raised:

  • What happens if there are two product with the same product ID?
  • What happens if no corresponding value?
  • What happens if I have a fact table with 10 million rows?
  • Why is it so long when I apply the function on a big dataset?

All these questions cannot be answered with Excel. I saw a data analyst who was forced to split the data by month to fit with the limitation of row number and had to do the same function on each sheet.

And then we try to do the same job with Alteryx:

 

     vlookup2.png

 

One of the most used tool in Alteryx is the join tool. Not really like a SQL join, not like a VLOOKUP neither…it’s all that together: it joins data coming from two streams either by position, either by a field. In my example, I join the header to the details of orders.4

vlookup3.png

As you can see on the screenshot (grey background), there are 29 910 orders and 113 957 lines. The three outputs are visually explained in the configuration tab, just below the mapping configuration.

No row in both Left and Right outputs is information about data quality: no order header without lines, and vice versa.

Next step is to append the customer information (orange background). Here, the counters on the outputs are revealing important information: 40404 order lines have no associated customer information. The data coming from my customer database may be not updated.

In the meantime, it gave the information that 2234 registered customer didn’t order anything.

The Join tool is one of the several tools available in Alteryx. For more information, see our webpage or download the whitepaper.

Topics: Technical, Alteryx, Data Preparation

Through this blog we want to share expertise and thoughts about technical aspects in the BI industry or related to our solution offerings. 

Visit also our business blog for more business and market related articles.

  

Subscribe to Monthly Email Updates

Recent Posts