在数据库操作中,经常需要将多个字符串拼接成一个字符串。遗憾的是,标准的T-SQL并不支持在聚合函数SUM中使用字符串数据类型,这会导致错误消息:"Msg 8117, Level 16, State 1, Line 1 Operand data type char is invalid for sum operator."。为了解决这类问题,MySQL引入了GROUP_CONCAT函数,Oracle引入了LISTAGG函数,而SQL Server没有内置这样的功能。但这并不是一个缺点,因为T-SQL的能力允许通过使用其他构造来更灵活和有效地执行字符串拼接,下面将揭示这些构造。
假设需要将几个用逗号分隔的字符串拼接在一起,使用以下表中的数据:
IF OBJECT_ID('dbo.Chars', 'U') IS NOT NULL DROP TABLE dbo.Chars;
GO
CREATE TABLE dbo.Chars ([Char] CHAR(1) PRIMARY KEY);
INSERT INTO dbo.Chars ([Char]) VALUES ('A'), ('B'), ('C'), ('F'), ('D');
最明显的解决方案是使用游标:
DECLARE @Chars VARCHAR(100), @Char CHAR(1);
DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR SELECT [Char] FROM dbo.Chars;
OPEN cur;
FETCH NEXT FROM cur INTO @Char;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @Chars = ISNULL(@Chars + ', ' + @Char, @Char);
FETCH NEXT FROM cur INTO @Char;
END;
CLOSE cur;
DEALLOCATE cur;
SELECT @Chars;
使用游标本身会导致查询执行效率降低,而且看起来并不优雅。为了避免使用游标,可以通过变量赋值来拼接字符串:
DECLARE @Chars VARCHAR(100);
SELECT @Chars = ISNULL(@Chars + ', ' + [Char], [Char]) FROM dbo.Chars;
SELECT @Chars;
一方面,这个构造相当简单;另一方面,它在大量数据抓取时的性能表现并不理想。因此,如果需要在SELECT块内进行拼接,这种方法将无法胜任。
在这种情况下,可以使用XML特性,使用以下构造:
SELECT Chars = STUFF((SELECT ', ' + [Char] FROM dbo.Chars FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 2, '');
如果在dbForge Studio forSQL Server中查看执行计划,可以注意到value方法的查询成本很高。为了避免这个操作,可以使用XQuery属性重写查询:
SELECT Chars = STUFF(CAST((SELECT [text()] = ', ' + [Char] FROM dbo.Chars FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '');
结果将得到一个非常简单且快速的执行计划。
原则上,通过列进行字符串拼接并不会造成特别的困难。更有趣的情况是需要拼接多个列。例如,有以下表:
IF OBJECT_ID('dbo.EntityValues', 'U') IS NOT NULL DROP TABLE dbo.EntityValues;
GO
CREATE TABLE dbo.EntityValues (
EntityID INT, Value1 CHAR(1), Value2 CHAR(1)
);
CREATE NONCLUSTERED INDEX IX_WorkOut_EntityID ON dbo.EntityValues (EntityID);
GO
INSERT INTO dbo.EntityValues (EntityID, Value1, Value2)
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (2, 'C', 'Z'), (2, 'F', 'H'), (1, 'D', 'R');
在其中,数据必须以以下方式分组:
SELECT ev.EntityID, Values1 = STUFF(CAST((SELECT [text()] = ', ' + ev2.Value1 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, ''), Values2 = STUFF(CAST((SELECT [text()] = ', ' + ev2.Value2 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '') FROM (SELECT DISTINCT EntityID FROM dbo.EntityValues) ev;
这可以通过查看执行计划来轻松证明。为了减少重新读取,可以使用一个小的XML技巧:
SELECT ev.EntityID, Values1 = STUFF(REPLACE(CAST([XML].query('for $a in /a return xs:string($a)') AS VARCHAR(100)), ',', ','), 1, 1, ''), Values2 = STUFF(REPLACE(CAST([XML].query('for $b in /b return xs:string($b)') AS VARCHAR(100)), ',', ','), 1, 1, '') FROM (SELECT DISTINCT EntityID FROM dbo.EntityValues) ev CROSS APPLY (SELECT [XML] = CAST((SELECT [a] = ', ' + ev2.Value1, [b] = ', ' + ev2.Value2 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH('')) AS XML) FROM (SELECT DISTINCT EntityID FROM dbo.EntityValues) ev);
但这个查询也不会是最优的,因为多次调用了query方法。另一种方式是使用游标:
IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL DROP TABLE #EntityValues;
GO
SELECT DISTINCT EntityID, Values1 = CAST(NULL AS VARCHAR(100)), Values2 = CAST(NULL AS VARCHAR(100)) INTO #EntityValues FROM dbo.EntityValues;
DECLARE @EntityID INT, @Value1 CHAR(1), @Value2 CHAR(1);
DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR SELECT EntityID, Value1, Value2 FROM dbo.EntityValues;
OPEN cur;
FETCH NEXT FROM cur INTO @EntityID, @Value1, @Value2;
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE #EntityValues SET Values1 = ISNULL(Values1 + ', ' + @Value1, @Value1), Values2 = ISNULL(Values2 + ', ' + @Value2, @Value2) WHERE EntityID = @EntityID;
FETCH NEXT FROM cur INTO @EntityID, @Value1, @Value2;
END;
CLOSE cur;
DEALLOCATE cur;
SELECT * FROM #EntityValues;
IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL DROP TABLE #EntityValues;
GO
DECLARE @Values1 VARCHAR(100), @Values2 VARCHAR(100);
SELECT EntityID, Value1, Value2, RowNum = ROW_NUMBER() OVER (PARTITION BY EntityID ORDER BY 1/0), Values1 = CAST(NULL AS VARCHAR(100)), Values2 = CAST(NULL AS VARCHAR(100)) INTO #EntityValues FROM dbo.EntityValues;
UPDATE #EntityValues SET @Values1 = Values1 = CASE WHEN RowNum = 1 THEN Value1 ELSE @Values1 + ', ' + Value1 END, @Values2 = Values2 = CASE WHEN RowNum = 1 THEN Value2 ELSE @Values2 + ', ' + Value2 END;
SELECT EntityID, Values1 = MAX(Values1), Values2 = MAX(Values2) FROM #EntityValues GROUP BY EntityID;