Parse Cell Value in Excel Using Regular Expressions

Using Excel formulas to parse a substring from a cell value is usually difficult especially if the cell value or pattern is not easily defined. I have found that it requires various combinations of SEARCH, FIND, LEFT, RIGHT, MID, etc. The resulting formula is both unreadable and it does not successfully handle all cell values thus resulting in manual clean up. It would be nice if the SEARCH or FIND functions were able to use regular expressions by default.

The following VBA function allows you to parse a cell value based on a regular expression. To use this function, you must enable Microsoft VBScript Regular Expressions 5.5.

  1. On the Developer ribbon, click Visual Basic
  2. Once the Visual Basic window opens, go to Tools -> References…
  3. From the References dialog box, check/enable Microsoft Regular Expressions 5.5
  4. Click the OK button

Now that the regular expression reference is activated, insert a new module into the workbook if one doesn’t already exist. The code will not work if added directly to a worksheet object.

Add the below code to the module. You can then use it as part of formulas in the workbook, e.g. =RegExParse(A1,”ID[0-9]+”). This would parse the value in cell A1 returning the substring matching the pattern ID[0-9]+, e.g. ID0, ID123, ID183274917234.

Creating Cascading Drop-down Lists in Excel

If you’re using Microsoft Excel to capture and track data, one of the challenges is maintaining good data quality when more than one person is updating the workbook. The data validation features in Excel help by allowing the user to select data based on pre-defined options in a list. This feature works well on individual cells. However, if you have a column that depends on the value in a different column, you will need to get a little more creative. This post describes the steps for creating cascading drop-down lists in Excel using a combination of data validation and named ranges.

Step 1 – Define the Reference Data

Let’s assume that you are collecting data that includes Organization and Department attributes. In this example, an Organization is the parent to one or more Departments. We’ll set up the reference data for the data validation drop-down lists as follows on its own worksheet.

  • Column A (“ORGANIZATION”) represents the list of valid Organizations that will appear in the Organization drop-down list.
  • Column B (“NAMED RANGE MAPPING”) represents the names given to ranges that will be defined in a later step.
  • Columns D through E (“ORG_X_DEPARTMENTS”) represent the lists of valid Departments that will appear in the Department drop-down list depending upon which Organization is selected.
Excel Cascading Drop-down List Reference Data
Excel Cascading Drop-down List Reference Data

Step 2 – Define the Named Ranges

Now that we have the reference data established, we can define the named ranges that refer to these lists. The named range entry defined as ORGS_TO_DEPTS_MAPPINGS is the key to creating the cascading drop-down list functionality. It is important that the values in Column B match the names given to the ORG_X_DEPARTMENTS ranges. This value will be used in a subsequent step with an indirect and vlookup data validation formula.

Excel Cascading Drop-down List Reference Data Named Ranges
Excel Cascading Drop-down List Reference Data Named Ranges

Step 3 – Define the Organization Data Validation (Parent Column)

In this step, we establish the standard data validation on the parent data in Column A (“ORGANIZATIONS”) on a new data collection worksheet.

Excel Cascading Drop-down List Top Level Validation
Excel Cascading Drop-down List Top Level Validation

Step 4 – Define the Department Data Validation (Child / Dependent Column)

Here we establish the data validation rule that performs the cascading drop-down list function. Once the user selects a valid Organization, this formula performs a vlookup against the ORGS_TO_DEPTS_MAPPINGS named range and returns the name of the Department named range associated with the selected Organization. The indirect function then converts that name, a text value, into the range reference.

Excel Cascading Drop-down List Validation Formula
Excel Cascading Drop-down List Validation Formula

When you click the OK button, Excel will display the following error message. Click the Yes button to continue.

Excel Cascading Drop-down List Error Message
Excel Cascading Drop-down List Error Message

Step 5 – Test the Cascading Drop-down Lists

Now that the data validation rules are set up, we can test the cascading drop-down list functionality. Selecting “Org 1” will cause the validation drop-down list in the Department column to reflect only those Departments associated with “Org 1”. Selecting “Org 2” causes the Department drop-down list to show those Departments associated with “Org 2”.

Excel Cascading Drop-down List Example 1
Excel Cascading Drop-down List Example 1
Excel Cascading Drop-down List Example 2
Excel Cascading Drop-down List Example 2

Step 6 – Macro to Validate Parent / Dependent Relationship

Now that the worksheet is functioning as expected, you release it for users to update and someone will inevitably enter data in a way that breaks the parent / dependent relationship. As an example, “Org 1” is selected in the Organization column and “Dept 1” is selected in the Department column. The user then returns to the Organization column and changes the entry to “Org 2”. The Department column retains the value of “Dept 1” which is not valid for the “Org 2” selection in the Organization column. To help avoid these errors, you can save the file as a macro enabled workbook and add the following code to the data entry worksheet. If this macro had been enabled, the “Dept 1” value would have been removed as soon as the user selected “Org 2”. This code assumes that the parent column is the first column (Column A) in the worksheet and the dependent column is immediately to the right (Column B). Please revise the code to meet your specific requirements.

Microsoft Power Query for Excel Tips and Tricks

In this post, I’ve documented a few Microsoft Power Query for Excel tips and tricks that I’ve learned from a recent project. For all of the following examples, I have an Excel table named TEST_DATA containing two columns (Description and Amount). The Power Query add-in for Excel can be downloaded from Microsoft.

Test Data in Microsoft Power Query for Excel
Test Data in Microsoft Power Query for Excel

Improving Performance

Ignore Privacy Level Settings

On the Power Query ribbon, click the Options button. The Query Options dialog box is displayed. Click the Privacy option, select Always ignore Privacy Level settings, and click the OK button.

Unless you have a specific reason to keep privacy settings active, you may gain some performance improvements by ignoring them. I didn’t notice much of an improvement, but it may depend on the volume and complexity of the data set involved.

Privacy Level Settings in Microsoft Power Query for Excel
Privacy Level Settings in Microsoft Power Query for Excel

Retrieve Source Data Once

When I started using Power Query, I created an initial query and then duplicated it each time I needed to start a new query. I would then modify the duplicated query to meet the new reporting requirement. All of these queries used the same source data which was a table on another worksheet. These weren’t complicated queries, but around the 15th query created in this manner, Power Query became sluggish during refreshes. The application would eventually throw Out of Memory exceptions and Excel would crash. The data source was a simple 100+ row worksheet in the same workbook. Again, nothing complicated.

Each of those queries started with a number of common steps, e.g. define the source data, remove some columns, and filter some data. After realizing my sloppiness, I defined a new query that handled all of these common steps. I then modified the existing queries to remove those common steps and instead refer to this base query as the starting point. Let’s assume that I named the base query as BASE_DATA. In each of the other queries, the source data is then established with the following statement: = #”BASE_DATA”

Once these common steps were pulled into a single, standalone connection query, then the Out of Memory exceptions stopped and the refreshes were substantially faster.

The BASE_DATA query is defined as:

Tips and Tricks

I’ve started a new query named EXAMPLE_QUERY to demonstrate the following tips and tricks. Again, it will use the TEST_DATA table data by way of the BASE_DATA query defined above.

Our first step is to establish the source data: = #”BASE_DATA”

Creating Total Rows

I know that a total row can be easily added to an Excel table through the standard Excel table options. However, what if we wanted to add one using Power Query? Through the Table.Combine and List.Sum functions we can combine all of the records from the source data with a calculated total row created using Table.FromRecords.

This step combines all of the rows from #”BASE_DATA” and, by using the Table.FromRecords function, creates an in-line table with one record. This in-line table sets the Description column to the text Total and the Amount column to the sum of all entries in the #”BASE_DATA”[Amount] column. I found this useful when creating a query where the base data was used in a waterfall / bridge chart.

Creating a Total Row Using Microsoft Power Query for Excel
Creating a Total Row Using Microsoft Power Query for Excel

Creating Running Totals

The next example adds a column to calculate a running or accumulating total for each of the rows. The first step is to create an Index column starting from zero. If you create an Index column starting from 1, then you would need to adjust the second step accordingly. The #”Create Total Row” is the name of the previous step.

Next, we add another column to contain the running total. Notice that #”BASE_DATA”[Amount] is used in the List.FirstN function. This is to avoid including the total row in the running total. If you used #”Added Index” (the default step name from adding the Index column), then the running total would incorrectly include the total row. Also, if you set your index to start at 1, then you would need to change the [Index]+1 parameter to [Index].

Creating a Running Total Column Using Microsoft Power Query for Excel
Creating a Running Total Column Using Microsoft Power Query for Excel

Using Data from the Previous Row

In this example, I create a new column that adds the value of the Amount column from the previous row to the value of the Amount column in the current row. The if conditional handles the first row (Index equals zero) where there is no previous row. This is for demonstration purposes only so the formula doesn’t particularly make sense to do in real life. The previous Amount column value is obtained with the statement #”Added Running Total”{[Index] – 1}[Amount].

Accessing Previous and Current Row Values in Microsoft Power Query for Excel
Accessing Previous and Current Row Values in Microsoft Power Query for Excel

Filtering Based on a Named Cell Value

Let’s say that you want to filter the data based on the value of a cell somewhere in your workbook. As the first step, you would need to define a named range for that cell. In this example, I’ll define a cell named FILTER_VALUE. To get the value of that cell from within a query, you can use a custom Power Query function which we’ll create called GetNamedCellValue. This query function has one parameter named rangeName and it is created as a connection only query.

Once defined, you can use this function in another query as follows:

Handling an Empty Table

Continuing with the same query, I add a filter to intentionally return an empty table to demonstrate this example. The source data doesn’t have any entries with an amount greater than 1,000 so I filter the column to show only entries with a greater amount.

If you’re using the custom GetNamedCellValue function defined earlier, you can set the FILTER_VALUE cell to 1,000 and use the following step instead:

As expected, the filter step returns an empty table. In the next step, I check if I have an empty table using Table.IsEmpty and then combine the empty table with an in-line table where the Description column is set to No entries found. If the table is not empty, I return the results from the previous step with no modification. I normally include this step at the end of most queries so that when a query is loaded to a worksheet, it is clear to users that the table is empty because of the data and not because of an error.

Adding a Message to an Empty Table in Microsoft Power Query for Excel
Adding a Message to an Empty Table in Microsoft Power Query for Excel

Summary

The full EXAMPLE_QUERY query used in the above examples is: