2007/02/06

How to convert several rows into a CSV string

Scenario

I recently posted just about the opposite (see INNER JOIN with a comma separated values CSV field ) and now I want to show how to create a CSV list when you have several rows to merge, grouping by some criteria. Let's suppose you have a set of data such as:

F1  F2  Item
-----------------
1  4    A
1  4    B
3  2    A
3  3    C
5  1    B
5  4    A
5  4    E
5  4    F
6  1    E
6  1    W

And you need to generate something like:

F1  F2  ItemsList
-----------------
1  4    A, B
3  2    A
3  3    C
5  1    B
5  4    A, E, F
6  1    E, W

Solution

This procedure is based on the information I found on Converting Multiple Rows into a CSV String (Set Based Method) but tuned to be able to group by a set of fields, not only one:

IF NOT OBJECT_ID('tempdb..#Source') IS NULL
  DROP TABLE #Source
IF NOT OBJECT_ID('tempdb..#t') IS NULL
  DROP TABLE #t

SELECT 1 F1, 4 F2, 'A' Item INTO #Source
UNION ALL SELECT 1, 4, 'B'
UNION ALL SELECT 3, 2, 'A'
UNION ALL SELECT 3, 3, 'C'
UNION ALL SELECT 5, 1, 'B'
UNION ALL SELECT 5, 4, 'A'
UNION ALL SELECT 5, 4, 'E'
UNION ALL SELECT 5, 4, 'F'
UNION ALL SELECT 6, 1, 'E'
UNION ALL SELECT 6, 1, 'W'

CREATE TABLE #t (
  F1 int not null,
  F2 int not null,
  Item varchar(7) not null,
  ItemsList varchar(8000)
)

INSERT INTO #t (F1, F2, Item)
  SELECT TOP (100) PERCENT F1, F2, Item
  FROM #Source
  ORDER BY F1, F2, Item

DECLARE
    @List varchar(8000),
    @F1_Last int,
    @F2_Last int

SELECT
    @List = '',
    @F1_Last = -1,
    @F2_Last = -1

UPDATE #t
  SET
    @List = ItemsList =
      CASE
        WHEN (@F1_Last <> F1) OR
             (@F2_Last <> F2)
          THEN Item
        ELSE @List + ', ' + Item
      END,
    @F1_Last = F1,
    @F2_Last = F2

SELECT TOP (100) PERCENT F1, F2,
  CASE
    WHEN LEN(MAX(ItemsList)) > 512
      THEN CONVERT(varchar(512), LEFT(MAX(ItemsList), 509) + '...')
    ELSE CONVERT(varchar(512), MAX(ItemsList))
    END AS ItemsList
FROM #t
GROUP BY F1, F2
ORDER BY F1, F2
GO

Regarding efficency I must say that using this approach in my production environment, where the original internal query (real table instead of #Source temporary table) would return near 5800 rows, 5 columns, needed grouping by 4 of them (integers) and creating a list combining the fifth column (varchar(512)), this procedure does the job in just 3 seconds, whereas our old iterative-cursor-based procedure took more than 23 seconds to complete.

No comments: