Using Triggers to Replace Scalar UDFs on Computed Columns

T-SQL
46 Comments

Your database is riddled with computed columns whose definition includes a scalar user-defined function. Even up to & including SQL Server 2019, which boasts faster scalar function processing, any table that includes a scalar function cause all access to that table to go single-threaded. In that case, a trigger can actually be a great replacement.

No, wait, come back. Triggers have a pretty bad reputation amongst the database community because:

  • You can pile a lot of work into them, leading to surprise slowdowns at scale
  • You can hold locks open for a long time as you work through a lot of tables
  • Troubleshooting can be tricky since authors rarely log errors generated in triggers
  • They almost always have a bug involving multi-record processing

But there are some places where a trigger is actually the least-worst fit, and this is one of ’em. To show the problem, I’ll take the mid-size 50GB Stack Overflow database and do a simple count of the number of rows:

The execution plan goes parallel:

But add a computed column to the table, and then run our count again:

And the query can no longer go parallel, even though it’s not calling the function:

If we look at the query plan’s properties, we can see that SQL Server was unable to build a parallel execution plan, although it’s not obvious as to why:

We can fix that with a trigger.

Instead of basing the Seniority column off a computed, user-defined function, we can:

  • Remove the computed column
  • Add a real column
  • Keep it up to date with a trigger

Here’s how:

When I insert or update the users table, the trigger goes single-threaded as it updates the Seniority column – but that’s it. Now, when I run a count again, the count can go multi-threaded:

Sure, I would rather have the application owners recompile their app and keep the Seniority column up to date themselves as they do inserts & updates, but I often don’t have that luxury – especially when lots of applications are touching the Users table, and they’re all accustomed to the Seniority computed column just being constantly updated, automatically.

This post isn’t meant to be a start-to-finish tutorial on how to write the fastest triggers – it’s just a quick 500 words to remind you that in this scenario, triggers are better than the alternative of having scalar UDFs as computed columns.

Previous Post
Building SQL ConstantCare®: What Database Growths Do You Care About?
Next Post
If Your Trigger Uses UPDATE(), It’s Probably Broken.

46 Comments. Leave new

  • Since the Seniority value should not change if the Reputation column isn’t updated, you can avoid running the Seniority update when users are changing their profile data, etc. Just put the update in a block protected by the conditional “IF ( UPDATE (Reputation) ) ” to only run the update if the Reputation column was inserted/updated. This would be most beneficial if you have a heavyweight function that’s dependent on an oft-unchanged column.

    Reply
    • Dave_W – make sure you read the linked post about the bug I find in most triggers. You’re in for a nasty surprise. 😉

      Reply
      • Totally get the set-based SQL issue – something I’ve found in code handed to me as well. Just saying that if someone updates half the rows in the table, but not the Reputation column, there’s no need to take the performance hit of running the Seniority update for the updated rows. Nothing in the BOL for UPDATE() says it doesn’t work for a trigger invoked by a multi-row update.

        Reply
      • Okay, I’ve read your new post about UPDATE(). I’ll forgive you for not realizing I’m one of the 0.01% of the population who know what UPDATE() actually does. You thought I was trying to horizontally partition the trigger – only run it for records with changed values. In actuality, I was trying to vertically partition the trigger – don’t run it if the Reputation column is not in the UPDATE SQL statement. Or am I not as smrt as I thunk I are?

        Reply
  • Since in-line table valued functions aren’t parallelism inhibitors, would converting the function also work?

    Reply
    • @Viking,

      I’m pretty sure that you cannot use an iTVF in a computed column. Only scalar expressions are allowed for computed columns. To use an iTVF, you’d need to us a subquery, and that’s not allowed. I don’t know of a way to use an iTVF without it being part of a FROM clause.

      If you know a way, man… please share it because I’d love to use it.

      Reply
      • Ah…sorry to disappoint. I haven’t needed to use a function in a calculated column and wasn’t aware that you couldn’t use an iTVF.

        Reply
  • Depending on the definition of the computed column, could you alternatively mark it as persisted and enable trace flag 176?

    Reply
    • Cool tip. I’m going to give that a try on my test box and see what digs. Thanks.

      Reply
    • Ok… I just read up on trace flag 176 and, while it does deal with computed columns, it doesn’t appear to have anything to do with the problem that Brent described in the article. I’m not sure why you think it might be the fix for a computed column that’s based on a Scalar UDF denying the ability for queries against the table to go parallel. Please explain?

      Reply
      • If the column is persisted, presumably the function needn’t be evaluated at run time and so its property as a parallelism inhibitor would seem moot. Something to test I suppose but it seems reasonable to me.

        Reply
      • From what I’d been reading, trace 176 disabled the expansion of persisted computed column definitions. This is supposed to stop the Scalar UDF definition from making its way into the query plan, which in turn prevents the query plan from being serialised.

        There’s an SO answer with a bit more detail and a link to an older article describing the behaviour: https://dba.stackexchange.com/a/187370.

        I’ve only got access to an Azure SQL instance at the moment, so can’t set the trace flag to test this though.

        Ben – I was able to confirm while gathering the first query plan that any reference to a Scalar UDF (even if the computed column is persisted) makes the query fall back to serial execution with a NonParallelPlanReason of TSQLUserDefinedFunctionsNotParallelizable.

        Reply
        • I’ve been in search of more info on why people think that Trace Flag 176 could be a magic bullet for this and found much more detail, including some of the pitfalls of using it as well as why it can work in the following article.

          https://sqlperformance.com/2017/05/sql-plan/properly-persisted-computed-columns

          I’m going to give it a whirl on my laptop and see what happens insofar as parallelism goes. Unfortunately, that’s not going to be a great test for the negative aspects that Paul White writes about in the referenced article.

          Reply
  • Ok… this is one of those things that I’ven somehow never heard about before. High praise to you posting about it because I’ve got tables that have computed columns that use Scalar UDFs. After finding this good article, I also did the test with a properly PERSISTED Scalar UDF and no difference.

    The funny part was that I originally couldn’t get your code to go parallel at all. It turns out that’s because I have the Cost Threshold for Parallelism set to 50 on my boxes. After I (temporarily) took it down to 25, then it would go parallel until the Scalar UDF was added as a computed column, persisted or otherwise.

    You really had me worried about computed columns and so I also checked to make sure that a Scalar expression other than a Scalar UDF would still allow parallelism. Thankfully, it does.

    The good part of all this is that now I know and I learned something new today. The bad part is, now I’ve got a shedload of computed columns to fix. Thank you Microsoft for continuing to give me things to do.

    Lordy… to recoin and old prayer…
    “Dear Lord, I’m a DBA. Please give me patience because, if you give me strength, I’m going to need bail money to go with it”. 😀

    Reply
  • Silviu Spataru
    October 21, 2020 12:32 am

    I replaced such a table with a view (with the same name), plus another table to hold the non-computed columns, plus an INSTEAD OF DELETE trigger.
    But using triggers – I could’ve saved several hours of work. And SELECTs would go even faster. Thanks, Brent.

    Reply
  • Wojciech Sawicki
    October 21, 2020 1:31 am

    My version:
    CREATE OR ALTER TRIGGER [Users_Seniority] ON [dbo].[Users] AFTER INSERT, UPDATE AS
    BEGIN
    IF @@ROWCOUNT = 0 –prevent from update Users … where 1=0;
    RETURN;
    SET NOCOUNT ON; –rows affected, careful, this will reset the @@ROWCOUNT variable

    IF UPDATE([Seniority]) –prevent from update Users set Seniority=’Unknown’
    BEGIN
    –THROW?
    ROLLBACK;
    END;
    IF UPDATE([Reputation]) –prevent from modify [Reputation] if update Users set Age=Age+1
    BEGIN
    UPDATE [u]
    SET
    [Seniority] = [dbo].[fn_GetSeniority]([u].[Reputation])
    FROM [dbo].[Users] [u]
    INNER JOIN [inserted] [i] ON [u].[Id] = [i].[Id]
    WHERE NOT EXISTS
    (
    SELECT *
    FROM –prevent from update Users set Reputation=Reputation
    [deleted] [d]
    WHERE [d].[Id] = [i].[Id]
    AND [d].[Reputation] = [i].[Reputation]
    );
    END;
    END;
    GO

    Reply
    • Bad news – read the post carefully about the part “bug involving multi-record processing”. Your trigger has that bug.

      Reply
      • Wojciech Sawicki
        October 21, 2020 2:32 am

        Bug with one record without any variable?

        Reply
        • No, the IF UPDATE([Seniority]) won’t work the way you expect with multiple rows. For more details, go ahead and work through the code in the related post – I put a ton of work into that. Thanks!

          Reply
          • Wojciech Sawicki
            October 21, 2020 4:57 am

            I did it and my trigger work correctly.

          • I guess I don’t understand here, Brent (probably low caffeine levels for both of us). I understand the variable thing and totally agree with the RBAR thing in triggers being a problem but I’ve never had a problem with IF UPDATE() in a proper set based trigger. And, yeah, I read the article you’ve provided the link for.

            Also, I’m not seeing the “bug” in the code Wojciech posted above. Are you sure that you’re not mistaking the IF @@ROWCOUNT = 0 RETURN early out of the trigger as “the bug”? That’s definitely not “the bug” we all know and love and it’s actually a pretty good practice, especially if you have code in the trigger that does row detection in the INSERTED/DELETED logical tables to calculate which of the 3 basic triggering events caused the trigger to fire.

            The only thing that I don’t like about IF UPDATE() is that it’s just an indication that a column was part of an update (or insert/delete) rather than whether or not a value on a row actually changed to something different.

          • No problemo! I’ve made a note to write a longer blog post about the use of “IF UPDATE(colname)” in the future.

          • Ok. Looking forward to that. Thanks, Brent. I’m still confused because I didn’t see IF UPDATE() in the trigger article you provided the link to. Have I simply looked at the wrong article?

          • To be sure, here’s the article that your “bug involving multi-record processing” link led to.
            https://www.brentozar.com/archive/2019/05/the-silent-bug-i-find-in-most-triggers/

            If there’s something in there about a bug with IF UPDATE() then I’m not finding it. And, no… despite appearances in the written word, I’m not trying to be argumentative. I really want to know because I’ve 1) never experienced a problem with IF UPDATE() and 2) we use it a whole lot at work and so if there is an actual bug with IF UPDATE() itself, then I really want to know what it is.

          • BWAAA-HAAA!!! Good Sir, you’re amazing. You just cranked out that article. I’m humbled.

            To be clear, from the discussion, I was thinking that IF UPDATE() itself had a problem. It doesn’t. I’m much relieved.

            I totally agree with your new article on the subject of how IF UPDATE() can be so severely and incorrectly used. Looking back at Wojciech’s trigger code, it definitely has the IF UPDATE() problem especially with the IF UPDATE([Seniority]), which looks like it will actually prevent inserts from occurring.

            The point that you’re trying to make is the IF UPDATE() doesn’t only apply to UPDATEs. It also applies to INSERTS/DELETES, both of which will qualify IF UPDATE(somecolumn) because both of those actions “update” the “somecolumn” being tested.

            That’s also why I’ve never run into a problem with IF UPDATE()… I made that good realization long ago and, especially on multi-triggers (handles more than one triggering event), I first calculate what the triggering event is and write the code so the fault you so very adequately describe in your good article never happens. I’m a bit ashamed that I missed what you were getting at in Wojciech’s trigger code… it turns out the apparent caffeine deprivation was all on my part.

            Thanks for your attention to detail and the article you cranked out so nasty fast. Again, truly amazing.

          • Hahaha, thank YOU for sticking through it, sir. I wouldn’t have cranked it out so quickly if you hadn’t asked.

          • Wojciech Sawicki
            October 21, 2020 10:19 am

            ok i understood the misunderstanding.
            update should not be understood as a column that changes value, but a column appearing in the update or insert list. In my case, I want to prevent any changes to the Seniority column. It just can’t follow a set or insert statement. Prevention from changes set Reputation = Reputation is in a condition where not exists

        • Brian Boodman
          October 21, 2020 6:54 am

          “Bug with one record without any variable?”

          Admittedly, this bug probably won’t be triggered if you never modify more than one row at a time.
          However, I’m not a fan of leaving time bombs in your code.

          Reply
          • Nah… From what I can see in the trigger he posted, his code doesn’t have “the bug”. He was asking about the IF UPDATE() thing, which isn’t a problem no matter how many or how few rows are processed by a trigger.

          • Ok… correction to the above. I missed what Brent was getting at. IF UPDATE() itself doesn’t have a problem… people’s perception of what it does is the problem. Despite its name, it doesn’t only fire on UPDATEs of a given column. It also fires on all INSERTS and DELETEs because those also cause an “update” of EVERY column in a row or rows.

            It’s not good excuse but the reason why I missed the IF UPDATE() bug in Wojciech’s trigger code is I was concentrating on the wrong thing (I should have been drinking more coffee instead of posting). Although I’m still a bit timid this morning for the well deserved proverbial snap in the ass with a rolled up towel that Brent gave me, I’ll say that it looks like Wojciech’s trigger code will prevent inserts from actually happening.

            Ok… I’m going back to sucking my thumb and twiddling my hair while in search of more coffee now. 😀

  • Reply
  • Joel Coehoorn
    October 21, 2020 9:48 am

    What about:

    ALTER TABLE dbo.Users
    ADD Seniority AS
    CASE WHEN Reputation > 1000 THEN ‘Senior’
    ELSE ‘Junior’ END

    Reply
  • I had to do a quick look through our many tables and only encountered a couple computed columns. Simple first last name concatenation stuff. Seems harmless. And I can see the benefit to moving it to a trigger, in these cases. However, the table is a temporal table. Which because I can’t use INSTEAD trigger, I would now have 2 updates going to the history table instead of one, if I implemented the trigger. So now, the trick is, weigh the parallelism performance hit versus the row count hit on the history table, and both are queried pretty evenly. This will be fun to test.

    Reply
  • Hi Brent,
    Thanks for such a great post. What about adding a computed column as below:
    [CreatedDate] AS (todatetimeoffset(dateadd(minute,[CreatedDateOFFSET],[CreatedDateUTC]),[CreatedDateOFFSET])) PERSISTED

    Also if the table has an index will it go parallel against Index and not just against the table?

    Reply
  • Its going parallel in both cases, that’s why I was confused how todatetimeoffset is going in parallel.

    Reply
  • Sorry and thanks.
    I did tried both it also forces all Index’s to go single threaded too.

    Reply
  • Wow, taught Jeff Moden something! I bow to the masters.

    Reply
  • In this particular case, you could write the computed column directly as a CASE expression. If a scalar function was really necessary you should have added WITH SCHEMABINDING.

    But people tend to use functions to add info gleaned from other tables, for example a total of child items. In this case I would use an indexed view.

    Reply
    • Charlieface – I’m using a simplified version of the computed column in order to quickly teach the topic. Thanks!

      Reply
      • True, but I still think there is no almost no use case for scalar functions in computed columns, as the two options I mention are better.
        If you really must, defo use SCHEMABINDING.

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.