灵活高效的字符串拼接技术

在数据库操作中,经常需要将多个字符串拼接成一个字符串。遗憾的是,标准的T-SQL并不支持在聚合函数SUM中使用字符串数据类型,这会导致错误消息:"Msg 8117, Level 16, State 1, Line 1 Operand data type char is invalid for sum operator."。为了解决这类问题,MySQL引入了GROUP_CONCAT函数,Oracle引入了LISTAGG函数,而SQL Server没有内置这样的功能。但这并不是一个缺点,因为T-SQL的能力允许通过使用其他构造来更灵活和有效地执行字符串拼接,下面将揭示这些构造。

使用代码示例

假设需要将几个用逗号分隔的字符串拼接在一起,使用以下表中的数据:

IF OBJECT_ID('dbo.Chars', 'U') IS NOT NULL DROP TABLE dbo.Chars; GO CREATE TABLE dbo.Chars ([Char] CHAR(1) PRIMARY KEY); INSERT INTO dbo.Chars ([Char]) VALUES ('A'), ('B'), ('C'), ('F'), ('D');

最明显的解决方案是使用游标:

DECLARE @Chars VARCHAR(100), @Char CHAR(1); DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR SELECT [Char] FROM dbo.Chars; OPEN cur; FETCH NEXT FROM cur INTO @Char; WHILE @@FETCH_STATUS = 0 BEGIN SET @Chars = ISNULL(@Chars + ', ' + @Char, @Char); FETCH NEXT FROM cur INTO @Char; END; CLOSE cur; DEALLOCATE cur; SELECT @Chars;

使用游标本身会导致查询执行效率降低,而且看起来并不优雅。为了避免使用游标,可以通过变量赋值来拼接字符串:

DECLARE @Chars VARCHAR(100); SELECT @Chars = ISNULL(@Chars + ', ' + [Char], [Char]) FROM dbo.Chars; SELECT @Chars;

一方面,这个构造相当简单;另一方面,它在大量数据抓取时的性能表现并不理想。因此,如果需要在SELECT块内进行拼接,这种方法将无法胜任。

在这种情况下,可以使用XML特性,使用以下构造:

SELECT Chars = STUFF((SELECT ', ' + [Char] FROM dbo.Chars FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 2, '');

如果在dbForge Studio forSQL Server中查看执行计划,可以注意到value方法的查询成本很高。为了避免这个操作,可以使用XQuery属性重写查询:

SELECT Chars = STUFF(CAST((SELECT [text()] = ', ' + [Char] FROM dbo.Chars FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '');

结果将得到一个非常简单且快速的执行计划。

原则上,通过列进行字符串拼接并不会造成特别的困难。更有趣的情况是需要拼接多个列。例如,有以下表:

IF OBJECT_ID('dbo.EntityValues', 'U') IS NOT NULL DROP TABLE dbo.EntityValues; GO CREATE TABLE dbo.EntityValues ( EntityID INT, Value1 CHAR(1), Value2 CHAR(1) ); CREATE NONCLUSTERED INDEX IX_WorkOut_EntityID ON dbo.EntityValues (EntityID); GO INSERT INTO dbo.EntityValues (EntityID, Value1, Value2) VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (2, 'C', 'Z'), (2, 'F', 'H'), (1, 'D', 'R');

在其中,数据必须以以下方式分组:

SELECT ev.EntityID, Values1 = STUFF(CAST((SELECT [text()] = ', ' + ev2.Value1 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, ''), Values2 = STUFF(CAST((SELECT [text()] = ', ' + ev2.Value2 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '') FROM (SELECT DISTINCT EntityID FROM dbo.EntityValues) ev;

这可以通过查看执行计划来轻松证明。为了减少重新读取,可以使用一个小的XML技巧:

SELECT ev.EntityID, Values1 = STUFF(REPLACE(CAST([XML].query('for $a in /a return xs:string($a)') AS VARCHAR(100)), ',', ','), 1, 1, ''), Values2 = STUFF(REPLACE(CAST([XML].query('for $b in /b return xs:string($b)') AS VARCHAR(100)), ',', ','), 1, 1, '') FROM (SELECT DISTINCT EntityID FROM dbo.EntityValues) ev CROSS APPLY (SELECT [XML] = CAST((SELECT [a] = ', ' + ev2.Value1, [b] = ', ' + ev2.Value2 FROM dbo.EntityValues ev2 WHERE ev2.EntityID = ev.EntityID FOR XML PATH('')) AS XML) FROM (SELECT DISTINCT EntityID FROM dbo.EntityValues) ev);

但这个查询也不会是最优的,因为多次调用了query方法。另一种方式是使用游标:

IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL DROP TABLE #EntityValues; GO SELECT DISTINCT EntityID, Values1 = CAST(NULL AS VARCHAR(100)), Values2 = CAST(NULL AS VARCHAR(100)) INTO #EntityValues FROM dbo.EntityValues; DECLARE @EntityID INT, @Value1 CHAR(1), @Value2 CHAR(1); DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR SELECT EntityID, Value1, Value2 FROM dbo.EntityValues; OPEN cur; FETCH NEXT FROM cur INTO @EntityID, @Value1, @Value2; WHILE @@FETCH_STATUS = 0 BEGIN UPDATE #EntityValues SET Values1 = ISNULL(Values1 + ', ' + @Value1, @Value1), Values2 = ISNULL(Values2 + ', ' + @Value2, @Value2) WHERE EntityID = @EntityID; FETCH NEXT FROM cur INTO @EntityID, @Value1, @Value2; END; CLOSE cur; DEALLOCATE cur; SELECT * FROM #EntityValues; IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL DROP TABLE #EntityValues; GO DECLARE @Values1 VARCHAR(100), @Values2 VARCHAR(100); SELECT EntityID, Value1, Value2, RowNum = ROW_NUMBER() OVER (PARTITION BY EntityID ORDER BY 1/0), Values1 = CAST(NULL AS VARCHAR(100)), Values2 = CAST(NULL AS VARCHAR(100)) INTO #EntityValues FROM dbo.EntityValues; UPDATE #EntityValues SET @Values1 = Values1 = CASE WHEN RowNum = 1 THEN Value1 ELSE @Values1 + ', ' + Value1 END, @Values2 = Values2 = CASE WHEN RowNum = 1 THEN Value2 ELSE @Values2 + ', ' + Value2 END; SELECT EntityID, Values1 = MAX(Values1), Values2 = MAX(Values2) FROM #EntityValues GROUP BY EntityID;
沪ICP备2024098111号-1
上海秋旦网络科技中心:上海市奉贤区金大公路8218号1幢 联系电话:17898875485