How to Strip HTML Tags from a String in SQL Server

How to strip HTML tags from a string in SQL Server?

There is a UDF that will do that described here:

User Defined Function to Strip HTML

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO

Edit: note this is for SQL Server 2005, but if you change the keyword MAX to something like 4000, it will work in SQL Server 2000 as well.

SQL - Remove all HTML tags in a string

DECLARE @x XML = '<html><head></head><body><p>My text.</p></body></html>'

SELECT t.c.value('.', 'NVARCHAR(MAX)')
FROM @x.nodes('*') t(c)

Update - For strings with unclosed tags:

DECLARE @x NVARCHAR(MAX) = '<html><head></head><body><p>My text.<br>More text.</p></body></html>'

SELECT x.value('.', 'NVARCHAR(MAX)')
FROM (
SELECT x = CAST(REPLACE(REPLACE(@x, '>', '/>'), '</', '<') AS XML)
) r

Remove html tags from a column

please try this

DECLARE @txt NVARCHAR(MAX) = '<p>I like it.</p>this is my job.<main>current.</main>'

SELECT x.value('.', 'NVARCHAR(MAX)') FROM ( SELECT x =
CAST(REPLACE(REPLACE(@txt, '>', '/>'), '</', '<') AS XML) ) r

this will help to remove all tags

Remove HTML tags in SQL Server

It is doable without the creation of a Table-Valued Function but is NOT pretty

Cross Apply B will "Clean the String" this may require some maintenance.

Cross Apply C will Split/Parse the String, reduce it down to values between > ... < and then aggregate the string.

You may notice that char(13) and char(10) are NOT removed, but replaced with a space. This is to ensure the replace is not destructive.

Example

Declare @YourTable table (ID int, [Message] varchar(max))
Insert Into @YourTable values
(1,'<br /><h1><span style="font-size: x-small"><span style="font-family: Verdana"><u><i><b>test</b></i></u></span></span><u><i><b><br /></b></i></u></h1><div contenteditable="false"><a href="dsi://OpenPatientFolder.671">George C Test2</a> </div><br /><br />')
,(2,'<div><h1>This is the Title</hr><br >The content<br >Extra Spaces will be trapped. The End</div>')
,(3,'No HTML Tags')

Select A.id
,NewValue = ltrim(rtrim(replace(replace(replace(TxtOnly,' ','†‡'),'‡†',''),'†‡',' ')))
From @YourTable A
Cross Apply ( values ( replace(
replace(
replace('<fake>'+[Message]+'</fake>'
,char(13),' ')
,char(10),' ')
,' ',' ') -- May want to add items like — as - and ™ as TM etc...
)
) B(CleanString)
Cross Apply (
Select TxtOnly = stuff((Select ' ' +RetVal
From (
Select RetSeq
,RetVal = left(RetVal,charindex('<',RetVal+RetVal)-1)
From (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (cast('<x>' + replace((Select replace(CleanString,'>','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.'))) as A(x)
Cross Apply x.nodes('x') AS B(i)
) C1
Where charindex('<',RetVal)>1
) C2
Order By RetSeq
For XML Path(''),TYPE).value('(./text())[1]','varchar(max)')
,1,1,'')
) C

Returns

id  NewValue
1 test George C Test2
2 This is the Title Extra Spaces will be trapped. The End
3 No HTML Tags

How do I remove html tags in one of the column upon SELECT in sql server 2008?

First create a user defined function (UDF) like this:

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO

This is taken from Best way to strip html tags from a string in sql server?.

Then, run your SQL statement like this:

SELECT  DISTINCT 
[column1],
[column2],
[column3],
[column4],
[column5],
[column6],
udf_StripHTML([column7]) as col7, -- this is the column containing HTML you want to remove
[column8],
[column9]
FROM SAMPLETABLE

I don't have access to SQL Server right now so you'll just have to try it out.

how to strip all html tags and special characters from string using sql server

The best I can suggest is to use a .net HTML parser or such which is wrapped in a SQL CLR function. Or to wrap the regex in SQL CLR if you want.

Note regex limitations: http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html

Raw SQL language won't do it: it is not a string (or HTML) processing language

Regex to remove HTML Tags, empty lines and blank spaces in sql query

Not all can be combined.

But some can, via a regex OR |

And better replace those first then.

As removing them could cause extra empty lines.

SELECT
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(DBMS_LOB.SUBSTR(feedback, 4000)
, '( )|(<[/[:alpha:]]+>)')
, '[[:space:]]+$','',1,0,'m')
, '(['||chr(13)||']?['||chr(10)||']){2,}','\1') AS feedback
FROM your_table
ORDER BY feedback DESC
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY

A SQL Query to select a string between two html tags

SELECT SUBSTRING(ColumnName,CHARINDEX('html_tag',ColumnName)+LEN('html_tag'),CHARINDEX('html_close_tag',ColumnName)-LEN('html_close_tag')) FROM TableName

please replace html_tag,html_close_tag with your tags.



Related Topics



Leave a reply



Submit