How to strip HTML tags from a string in SQL Server?
There is a UDF that will do that described here:
User Defined Function to Strip HTML
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO
Edit: note this is for SQL Server 2005, but if you change the keyword MAX to something like 4000, it will work in SQL Server 2000 as well.
SQL - Remove all HTML tags in a string
DECLARE @x XML = '<html><head></head><body><p>My text.</p></body></html>'
SELECT t.c.value('.', 'NVARCHAR(MAX)')
FROM @x.nodes('*') t(c)
Update - For strings with unclosed tags:
DECLARE @x NVARCHAR(MAX) = '<html><head></head><body><p>My text.<br>More text.</p></body></html>'
SELECT x.value('.', 'NVARCHAR(MAX)')
FROM (
SELECT x = CAST(REPLACE(REPLACE(@x, '>', '/>'), '</', '<') AS XML)
) r
Remove html tags from a column
please try this
DECLARE @txt NVARCHAR(MAX) = '<p>I like it.</p>this is my job.<main>current.</main>'
SELECT x.value('.', 'NVARCHAR(MAX)') FROM ( SELECT x =
CAST(REPLACE(REPLACE(@txt, '>', '/>'), '</', '<') AS XML) ) r
this will help to remove all tags
Remove HTML tags in SQL Server
It is doable without the creation of a Table-Valued Function but is NOT pretty
Cross Apply B will "Clean the String" this may require some maintenance.
Cross Apply C will Split/Parse the String, reduce it down to values between > ... <
and then aggregate the string.
You may notice that char(13) and char(10) are NOT removed, but replaced with a space. This is to ensure the replace is not destructive.
Example
Declare @YourTable table (ID int, [Message] varchar(max))
Insert Into @YourTable values
(1,'<br /><h1><span style="font-size: x-small"><span style="font-family: Verdana"><u><i><b>test</b></i></u></span></span><u><i><b><br /></b></i></u></h1><div contenteditable="false"><a href="dsi://OpenPatientFolder.671">George C Test2</a> </div><br /><br />')
,(2,'<div><h1>This is the Title</hr><br >The content<br >Extra Spaces will be trapped. The End</div>')
,(3,'No HTML Tags')
Select A.id
,NewValue = ltrim(rtrim(replace(replace(replace(TxtOnly,' ','†‡'),'‡†',''),'†‡',' ')))
From @YourTable A
Cross Apply ( values ( replace(
replace(
replace('<fake>'+[Message]+'</fake>'
,char(13),' ')
,char(10),' ')
,' ',' ') -- May want to add items like — as - and ™ as TM etc...
)
) B(CleanString)
Cross Apply (
Select TxtOnly = stuff((Select ' ' +RetVal
From (
Select RetSeq
,RetVal = left(RetVal,charindex('<',RetVal+RetVal)-1)
From (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (cast('<x>' + replace((Select replace(CleanString,'>','§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.'))) as A(x)
Cross Apply x.nodes('x') AS B(i)
) C1
Where charindex('<',RetVal)>1
) C2
Order By RetSeq
For XML Path(''),TYPE).value('(./text())[1]','varchar(max)')
,1,1,'')
) C
Returns
id NewValue
1 test George C Test2
2 This is the Title Extra Spaces will be trapped. The End
3 No HTML Tags
How do I remove html tags in one of the column upon SELECT in sql server 2008?
First create a user defined function (UDF) like this:
CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX) AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0 AND @End > 0 AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
GO
This is taken from Best way to strip html tags from a string in sql server?.
Then, run your SQL statement like this:
SELECT DISTINCT
[column1],
[column2],
[column3],
[column4],
[column5],
[column6],
udf_StripHTML([column7]) as col7, -- this is the column containing HTML you want to remove
[column8],
[column9]
FROM SAMPLETABLE
I don't have access to SQL Server right now so you'll just have to try it out.
how to strip all html tags and special characters from string using sql server
The best I can suggest is to use a .net HTML parser or such which is wrapped in a SQL CLR function. Or to wrap the regex in SQL CLR if you want.
Note regex limitations: http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html
Raw SQL language won't do it: it is not a string (or HTML) processing language
Regex to remove HTML Tags, empty lines and blank spaces in sql query
Not all can be combined.
But some can, via a regex OR |
And better replace those first then.
As removing them could cause extra empty lines.
SELECT
REGEXP_REPLACE(
REGEXP_REPLACE(
REGEXP_REPLACE(DBMS_LOB.SUBSTR(feedback, 4000)
, '( )|(<[/[:alpha:]]+>)')
, '[[:space:]]+$','',1,0,'m')
, '(['||chr(13)||']?['||chr(10)||']){2,}','\1') AS feedback
FROM your_table
ORDER BY feedback DESC
OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY
A SQL Query to select a string between two html tags
SELECT SUBSTRING(ColumnName,CHARINDEX('html_tag',ColumnName)+LEN('html_tag'),CHARINDEX('html_close_tag',ColumnName)-LEN('html_close_tag')) FROM TableName
please replace html_tag,html_close_tag with your tags.
Related Topics
With Css, Use "..." For Overflowed Block of Multi-Lines
Are There Any Style Options For the Html5 Date Picker
Why Does This Inline-Block Element Have Content That Is Not Vertically Aligned
How Do Search Engines Deal With Angularjs Applications
Scraping HTML Tables into R Data Frames Using the Xml Package
Do Checkbox Inputs Only Post Data If They'Re Checked
Height of an HTML Select Box (Dropdown)
How to Set Subject/Content of Email Using Mailto:
How to Write a CSS Selector Selecting Elements Not Having a Certain Class or Attribute
A Space Between Inline-Block List Items
Css Display an Image Resized and Cropped
Are Iframes Considered 'Bad Practice'
How to Center Elements on the Last Row in CSS Grid
Is It Wrong to Change a Block Element to Inline With CSS If It Contains Another Block Element