Aggregating Data Into a Single SQL Server Column

2018.04.13 EDIT:  I made a slight change to this to fix the filtering for the Start Date and End Date for the Promotions in the query below.

This week, I was working with a client using nopCommerce to try and fix a performance issue relating to syncing data between their legacy product management system and their nopCommerce system used for their front-facing website.  A plugin was built in nopCommerce that would utilize a SQL Server staging database to stage data from the legacy system and import into nopCommerce on a daily basis.

Specifically, this issue relates to trying to add a promotion to a product, in which promotions and products have a many-to-many relationship with each other, and each product has a one-to-many relationship with a series of product tabs that are used to display information on the product.  The entity relationship looks like this:

When the client attempted to add a promotion to 3000 products, it slowed the sync process from about 30 minutes to 8 hours – causing those products to not reflect the new promotion until mid-day and making for an unhappy client.

Digging into the code, the major slowness issue was relating to creating the product tab page – the system was searching to see if a tab page existed for the product, and then either created a new one or adding to the existing with the appropriate data, forcing iteration through a large dataaset and giving a poorly scaling solution.

Creating a stored procedure for this make the most sense – since it’s just copying data from a few tables into another, but with database programming, I cannot truly iterate through each mapping to create the Promotion product tab correctly, so I needed to use aggregation to add the content to the product tab for each product.

I found out about the STUFF() operator and came up with this:

    'Promotions' as title,
        (SELECT '<br />' + p2.Name
         FROM Promotions p2
         JOIN ProductPromoMappings ppm2 on ppm2.Promo_Id = p2.Id
             and p2.StartDate <= GETDATE() AND GETDATE() < p2.EndDate
         WHERE ppm2.Product_Id = ppm.Product_Id
         FOR XML PATH(''), TYPE)
             .value('.','varchar(max)'),1,6, '') AS content,

 FROM ProductPromoMappings ppm
 JOIN Promos p ON p.Id = ppm.Promo_Id
     and p.StartDate <= GETDATE() AND GETDATE() < p.EndDate

Here’s what is happening – we are going through the ProductPromoMappings and returning the content for the product tab pages for use.  For the content in the product tabs, we are going through each promotion in an inner query and getting the content for each promotion.  After that, then concatenating it into a single value to be used by the outer query.

Here’s a little more details on some of the commands that may be new to you:

  • WITH TIES – Addition to the ‘TOP’ keyword that allows for returning ‘ties’, or multiple rows in last place with the limited data set.
  • STUFF() – Inserts a string into another string.  Specifically, this allows us to insert the ‘<br />’ element into each subsequent string without inserting the line break in the first element.
  • FOR XML PATH(”) – Outputs the results of a query in XML – for us, this means taking the contents and making them into a single string for use.
  • .value() – Converts the XML node passed from FOR XML PATH into a varchar,  allowing us to escape all characters that would cause issue with XML conversion (such as ‘<‘ and ‘>’).
  • ROW_NUMBER() – Numbers the result rows – this is uses with the OVER and PARTITION keywords to filter each product tab page by the Product ID.

Putting this command into a stored procedure reduced the runtime from 8 hours to about 15 seconds, as the data is now just copying directly into the database.




Leave a Reply

Your email address will not be published. Required fields are marked *