
SEO for Beginners: 10 Things You Need to Know Now
SEO is one of the most important marketing methods for businesses. Its value continues to increa...
Read More
By: Aaron Bertrand | Updated: 2022-05-25 | Comments (3) | Related: More SQL Server 2022
Every SQL Server release has new capabilities that are exciting to some group of customers – sometimes a change is introduced to please sysadmins, sometimes it's for finance, sometimes it's for customers of other platforms, and sometimes it's for developers. Now that the first public preview of SQL Server 2022 is available, I'll talk about some of the other new features in another post, but today I wanted to share a few of my favorite new features that will excite anyone who writes Transact-SQL.
I like to talk early about T-SQL enhancements because, unlike major features, they are largely baked in by the time the first public beta versions hit the shelves. They also aren't prone to further changes or renames by other Microsoft business units (like marketing).
A few of the most useful changes I've been able to play with in SQL Server 2022 so far:
In this tip, I'll explain each one, and show some practical use cases.
Having covered them before, these functions are basically MAX and MIN, but across columns instead of across rows. A quick demonstration:
In this simple example, the logic is a lot like a CASE expression. Taking just the first one:
When evaluating two expressions, it really is that simple: is the first one bigger than the second, or not? (However, keep NULL handling in mind; like MIN and MAX, both new functions also ignore NULLs.)
When you introduce a third value, though, it becomes much more complex. While the new syntax will offer:
How we would write this as a CASE expression in current and older versions is tedious:
And it gets much worse from there, as you can imagine. I'm not even going to try typing out what that spiderweb of CASE expressions would look like when comparing 4 or more values.
But let's look at a real problem, because it's hard to feign complexity when we're talking about things we can easily do in our head. Picture a table like this, that holds a row for each year, and monthly sales figure columns (we'll just have three months to keep it simple):
If we want to return the lowest and highest sales figure for each year, we could write nasty CASE expressions (again, just imagine that if we had all the months):
Output:
There are a couple of other ways to solve this problem, that at least scale better without complicating the code exponentially. Here's one way using UNPIVOT:
And here's one using CROSS APPLY:
Those are easier to expand to cover more columns, but they're still tedious, and I don't like that both use transpose and grouping operations. Now we can perform this kind of task with ease:
I wrote about the enable_ordinal enhancement to this function in this tip, but I wanted to mention it again because, at the time, I could not confirm the change would make it into SQL Server 2022. Now I can, and I wanted to mention a few use cases where having the ordinal is beneficial:
I've seen many requests to return the 2 nd or 3 rd item in a list, which was cumbersome to do with STRING_SPLIT before because the output order was not guaranteed. Instead, you'd see more verbose solutions with OPENJSON or tricks with PARSENAME. With this new parameter, I can simply say:
Let's say you want to assign new listings to salespeople based on past performance. You have this table:
And now you have a set of new listings that have come in, ranked by preference:
In this case, we'd want to assign the most preferential listing (81) to salesperson 6, the second listing (76) to salesperson 3, and the third (80) to salesperson 2. Having a meaningful and reliable ordinal output makes this easy:
Output:
Another scenario I've dealt with in a tedious way is reconstructing a string to remove duplicates. Let's say we have a string like this:
We want to remove duplicates from the list, but also maintain the original order, making the desired output:
With this new functionality we can accurately rebuild the string in a very direct way by taking the first instance of any string in the list, and then using its overall ordinal position to define the ordering used by STRING_AGG:
Output:
That is a much simpler approach than any of the awkward solutions I've used in the past.
That all said…
STRING_SPLIT is unfortunately still limited by a single-character delimiter, which I have addressed in this previous tip. But the new enable_ordinal argument does simplify some of the more frequent use cases that have traditionally required tiresome workarounds. It also adds a performance benefit compared to current methods, because the optimizer recognizes that the data is returned sorted. Meaning it won't always need to explicitly add a sort operator in the plan if the data needs to be ordered by ordinal. While the complex example above does require sorting, the following example does not:
Here is the plan:
This function collapses a date/time to a fixed interval, eliminating the need to round datetime values, extract date parts, perform wild conversions to and from other types like float, or make elaborate and unintuitive dateadd/datediff calculations (sometimes using magic dates from the past).
The arguments are:
The output is a date/time type (based on the input), but at an interval governed by the datepart and bucket_width. For example, if I wanted to simplify the output of a particular column so it just gave me the month boundaries, I might have done this in the past:
Or this, on SQL Server 2012 or better:
Now, in SQL Server 2022, I can do this:
All three of the above queries give me identical results:
Additionally, this might have better performance in some cases; since the function is order-preserving, there are cases where a sort can be avoided. Getting out of the system objects business, let's create a simpler table, and compare the plans generated by grouping:
Here are the plans:
More importantly, the function allows me to do much more elaborate things, like segmenting data into 5-minute intervals:
Output:
Want 10-minute intervals? No problem. We can even pass a parameter or variable so we can adjust on the fly:
Output:
Another thing I can do is vastly simplify week boundary calculations. Here is a completely unintuitive and cryptic way to get the previous Saturday (regardless of a user's SET DATEFIRST or SET LANGUAGE settings):
If we know any Saturday in the past, like January 1 st , 2000, we can simplify this as follows, by passing that date into the origin parameter:
This gives the same answer (at the time of writing – Tuesday, May 24 th , 2022 – this returned Saturday, May 21 st , 2022). And, like above, if we have a bunch of data, we can use this same technique to filter or group based on any known weekday.
Output:
This is a much simpler way to segment data based on a non-standard work week. As another example, our team's on-call schedule at Stack Overflow cycles on Wednesdays, and I've already used this function to map out our future schedule.
This function produces a set-based sequence of numeric values. It supplants cumbersome numbers tables, recursive CTEs, and other on-the-fly sequence generation techniques we've all used at one point or another.
The arguments are:
A couple of simple examples:
Output:
(Note that it won't include the STOP value, or anything near it, if the next STEP pushes past it.)
In previous versions, to generate a sequence of numbers like this, you would probably use a numbers table, or a recursive CTE like this:
GENERATE_SERIES has a clear advantage here in terms of simplicity.
One downside is that this is an operator, not a table-valued function, so there is an unintuitive outcome if you don't name the parameters explicitly:
The error message you'll see is a bit misleading:
This can imply a variety of issues, including that the function isn't there, that they need a dbo. prefix, that they connected to the wrong server, that the upgrade failed, or that it's actually a scalar function.
Another downside is that – at least in current builds – the function is not order-preserving. This means that if you try to sort by value, there is a sort in the plan, whereas that has been guarded against in many cases with the other functions mentioned above. For this example:
Here is the plan:
Still, this is a versatile function that will simplify code and – even with the sort – performs no worse than existing methods that typically require sorts as well. Microsoft is aware of the issue so, when they fix that, even better!
We can combine DATE_BUCKET and GENERATE_SERIES to build a contiguous series of date/time values. I often see people struggle to build a full data set when they are reporting on intervals where not all intervals are populated. For example, I want hourly sales figures across a day but, if we're selling something like cars, not every hour will always contain a sale. Let's say we have this data:
If I want to find the hourly sales for business hours on May 1 st , I might write this query:
What I get:
What I actually want is a row for each hour, even if there were no sales:
The typical way we'd start is with a simple recursive CTE that builds out all the possible rows in the range, and then performs a left join against the populated data.
The thing I like least about this solution is the awkward dateadd/datediff expression to normalize date/time data to the top of the hour. Functions like SMALLDATETIMEFROMPARTS are clearer in their intent, but even more hassle to construct. Instead, I wanted to use DATE_BUCKET and GENERATE_SERIES to turn this whole query pattern on its head:
I see great potential in both functions to help simplify logic and reduce dependencies on helper objects.
There are a few other T-SQL enhancements coming in SQL Server 2022, but I'm going to leave them for Itzik Ben-Gan to tell you about in this article:
Note that many of the features that eventually make it into a major release of SQL Server first appear in Azure SQL Database and/or Azure SQL Edge. In fact, most of the functions above were available in those flavors months ago. You don't necessarily have to wait for a public preview to kick the tires on new syntax. But for SQL Server 2022 specifically, the first builds available for CTP 2.0 are only for Windows.
Now that it's here, though, go grab the CTP, see the "What's New" documentation, and grab Bob Ward's demos.
In the meantime, see these tips and other resources:
Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.
View all my tips
Article Last Updated: 2022-05-25
SEO is one of the most important marketing methods for businesses. Its value continues to increa...
Read MoreKisha Gulley was once kicked out of a Facebook group for mothers with autistic children after a c...
Read More