Mssql Cast( [Varcharcolumn] to Int) in Select Gets Executed Before Where Clause Filters Out Bad Values

MSSQL cast( [varcharColumn] to int) in SELECT gets executed before WHERE clause filters out bad values

First, this is not a "glaring design issue". SQL is a descriptive language of the output, not a procedural language that specifies how prcoessing is being done. There is no guarantee of the order of processing, in general, and this is an advantage. I might say there is a design issue, but it is around the general handling of exceptions in SQL statements.

According to SQL Server documentation (http://msdn.microsoft.com/en-us/library/ms181765.aspx), you can depend on the order of evauation for a CASE statement for scalar expressions. So, the following should work:

select (case when isnumeric(c.varcharColumn) = 1 then cast(c.varcharColumn as int) end)

Or, to get closer to an "int" expression:

select (case when isnumeric(c.varcharColumn) = 1 and c.varcharColumn not like '%.%' and c.varcharColumn not like '%e%'
then cast(c.varcharColumn as int)
end)

At least your code is doing an explicit CAST. This situation is much nastier when the casts are implicit (and there are hundreds of columns).

Why is it that a change in query plan suddenly seems to break a query

Because filters and expressions may be evaluated in a different order (e.g. before or after values that break get filtered away). Think about it more simply, imagine you are a barber with a lobby full of customers:

  • cut one inch of hair from every customer
  • filter out the bald guys

vs.

  • filter out the bald guys
  • cut one inch of hair from the remaining customers

A simpler example in SQL Server is:

  • get the month name from a string
  • filter out the strings that aren't valid dates

vs.

  • filter out the strings that aren't valid dates
  • get the month name from the remaining strings which are valid dates

One way you get around this in SQL Server is to use a CASE expression to only evaluate rows that don't break the expression (or, in this example, to alter the expression so it doesn't break):

substring(ext_inv_ref, 1, len(ext_inv_ref) - CASE 
WHEN len(ext_inv_ref) >= 3 THEN 3 ELSE 0 END)

In plain English this just says remove the right-most 3 characters but only if the string is already at least 3 characters long.

Or you can use COALESCE/NULLIF or a variety of other methods depending on whether you want an empty string or NULL when the input is < 3 characters.

Some other questions for more insight:

  • Understanding why my CAST to INT is not working
  • Why would YEAR fail with a conversion error from a Date?
  • MSSQL cast( [varcharColumn] to int) in SELECT gets executed before WHERE clause filters out bad values
  • Conversion failed when converting date and/or time from character string. SQL query
  • invalid length parameter passed to the left or substring function - Error is not happening consistently for the same data

CAST to int used with LEFT on an nvarchar field not working properly?

I see it now.

It's seeing 'A/P in' in your values. It can't CAST the '/' character to INT

MSSQL cast( [varcharColumn] to int) in SELECT gets executed before WHERE clause filters out bad values

In short you will want to do this:

select (case when isnumeric(TO.LineMemo) = 1 then cast(TO.LineMemo as int) end)

Why would YEAR fail with a conversion error from a Date?

I assume that RValues is a string column of some type, for some reason. You should fix that and store date data using a date data type (obviously in a separate column than this mixed bag).

If you can't fix that, then you can prevent what Damien described above by:

CASE WHEN ISDATE(RValues) = 1 THEN CONVERT(Date, RValues) END AS FechaFirma 

(Which will make the "date" NULL if SQL Server can't figure out how to convert it to a date.)

You can't prevent this simply by adding a WHERE clause, because SQL Server will often try to attempt the conversion in the SELECT list before performing the filter (all depends on the plan). You also can't force the order of operations by using a subquery, CTE, join order hints, etc. There is an open Connect item about this issue - they are "aware of it" and "hope to address it in a future version."

Short of a CASE expression, which forces SQL Server to evaluate the ISDATE() result before attempting to convert (as long as no aggregates are present in any of the branches), you could:

  • dump the filtered results into a #temp table, and then subsequently select from that #temp table, and only apply the convert then.
  • just return the string, and treat it as a date on the client, and pull YEAR/MONTH etc. parts out of it there
  • just use string manipulation to pull YEAR = LEFT(col,4) etc.
  • use TRY_CONVERT() since I just noticed you're on SQL Server 2012:

    TRY_CONVERT(DATE, RValues) AS FechaFirma

Conversion to datetime fails only on WHERE clause?

If the conversion is in the WHERE clause it may be evaluated for many more records (values) than it would be if it appears in the projection list. I have talked about this before in different context, see T-SQL functions do no imply a certain order of execution and On SQL Server boolean operator short-circuit. Your case is even simpler, but is similar, and ultimately the root cause is the same: do not an assume an imperative execution order when dealing with a declarative language like SQL.

Your best solution, by a far and a large margin, is to sanitize the data and change the column type to a DATETIME or DATETIME2 type. All other workarounds will have one shortcoming or another, so you may be better to just do the right thing.

Update

After a closer look (sorry, I'm @VLDB and only peeking SO between sessions) I realize you have an EAV store with inherent type-free semantics (the attribute_value can bea string, a date, an int etc). My opinion is that your best bet is to use sql_variant in storage and all the way up to the client (ie. project sql_variant). You can coherce the type in the client, all client APIs have methods to extract the inner type from a sql_variant, see Using sql_variant Data (well, almost all client APIs... Using the sql_variant datatype in CLR). With sql_variant you can store multiple types w/o the problems of going through a string representations, you can use SQL_VARIANT_PROPERTY to inspect things like the BaseType in the stored values, and you can even do thinks like check constraints to enforce data type correctness.

SQL Server - Compare ordered columns from multiple tables

As long as item names are consistent between years, you just need to join to the other tables

with counts as (
select e.ItemName, e.ItemCount, row_number() over (partition by e.ItemName order by cast(e.ItemCount as int) desc) as rk
from Table2015 e where e.ItemCount <> 'X')

select s.ItemName
, s.ItemCount
, CASE WHEN CAST(t15.ItemCount AS INT) > ISNULL(CAST(t14.ItemCount AS INT), 0) THEN 1 ELSE 0 END AS GreaterThan2014
, CASE WHEN CAST(t15.ItemCount AS INT) > ISNULL(CAST(t13.ItemCount AS INT), 0) THEN 1 ELSE 0 END AS GreaterThan2013
from counts s
inner join counts t15 ON s.ItemName = t15.ItemName and t15.rk = 1
left join (
select ItemName, MAX(CASE WHEN IsNumeric(ItemCount) = 1 THEN CAST(ItemCount AS INT) ELSE -1 END)
from Table2014
where ItemCount <> 'X'
group by ItemName
) t14 on s.ItemName = t14.ItemName
left join (
select ItemName, MAX(CASE WHEN IsNumeric(ItemCount) = 1 THEN CAST(ItemCount AS INT) ELSE -1 END)
from Table2013
where ItemCount <> 'X'
group by ItemName
) t13 on s.ItemName = t13.ItemName

where s.rk<4
order by s.ItemName,s.rk;

Also, you really shouldn't have an 'X' as a possibility in a count field. If you're counting something it should be typed as an INT.



Related Topics



Leave a reply



Submit