Performance of String Comparison VS Int Join in SQL

Performance of string comparison vs int join in SQL

Compared to the other operations being performed, it is unlikely that there is much performance difference between the two approaches. If you have only a handful of colors (up to a few hundred), the color table fits on a single page in most databases. An index on the color would make look up quite fast and not incur any I/O activity (after the first run to load the page).

A string comparison depends on the database, but it does involve a function and reading the data from the page. So, it is not free. Different databases, of course, might have different performance characteristics for a string function.

Where it should be stored should be a function of your application. Say you have an application where the color is going to be presented to the user. You might, one day, want to show the name of the color in Spanish, Swahili, or Chinese. If so, having a separate table makes such internationalization much easier. More prosaicly, you might want to prevent "Grene" from being entered, if so, having such a table makes a selection list easier.

On the other hand, if performance is your only concern, it doesn't make a different. In other cases, it is actually possible for a lookup table to be faster than a denormalized table. This occurs when the strings are long, increasing the length of every record in a larger table. Larger tables mean more pages, which take longer to load into memory.

Performance Difference in comparing integers and comparing strings

OK interesting question always took it as read that integer was quicker and never actually tested it. I took 1M Random Surnames and types from a list of Contacts from my data into a scratch database with no indices or primary key just raw data. No measurement was made over the range of my data in either of the columns has not been standardised so reflects the reality of my database rather than a pure statistical set.

select top 100 * from tblScratch where contactsurname = '<TestSurname>' order by NEWID()
select top 100 * from tblScratch where contacttyperef = 1-22 order by NEWID()

The Newid is there to randomise the data list out each time. Quickly ran this for 20 surnames and 20 types. Queries were run surname than ref then surname. Searching for the reference number was almost 4x quicker and used about 1/2 so the books were right all those years ago.

String -

SELECT TOP 100 * FROM tblScratch WHERE contactsurname = 'hoare' ORDER BY NEWID()

Duration 430ms
Reads 902
CPU 203

Integer -

SELECT TOP 100 * FROM tblScratch WHERE contacttyperef = 3 ORDER BY NEWID()

Duration 136ms
Reads 902
CPU 79

SQL - performance in varchar vs. int

Should I create a new column with a number datatype in both the table and join the table to reduce the time taken by the SQL Query.?

If you're in a position where you can change the design of the database with ease then yes, your Primary Key should be an integer. Unless there is a really good reason to have an FK as a varchar, then they should be integers as well.

If you can't change the PK or FK fields, then make sure they're indexed properly. This will eventually become a bottleneck though.

Is it better to have int joins instead of string columns?

On most systems it makes little or no difference to performance. Personally I'd use a short string for clarity and join that to a table with more detail as you suggest.

create table intLookup
(
pk integer primary key,
value varchar(20) not null
)
insert into intLookup (pk, value) values
(1,'value 1'),
(2,'value 2'),
(3,'value 3'),
(4,'value 4')

create table stringLookup
(
pk varchar(4) primary key,
value varchar(20) not null
)

insert into stringLookup (pk, value) values
(1,'value 1'),
(2,'value 2'),
(3,'value 3'),
(4,'value 4')

create table masterData
(
stuff varchar(50),
fkInt integer references intLookup(pk),
fkString varchar(4)references stringLookup(pk)
)
create index i on masterData(fkInt)
create index s on masterData(fkString)

insert into masterData
(stuff, fkInt, fkString)
select COLUMN_NAME, (ORDINAL_POSITION %4)+1,(ORDINAL_POSITION %4)+1 from INFORMATION_SCHEMA.COLUMNS
go 1000

This results in 300K rows.

select 
*
from masterData m inner join intLookup i on m.fkInt=i.pk

select
*
from masterData m inner join stringLookup s on m.fkString=s.pk

On my system (SQL Server)
- the query plans, I/O and CPU are identical
- execution times are identical.
- The lookup table is read and processed once (in either query)

There is NO difference using an int or a string.

In MySQL, is it faster to compare with integer or string of integer?

MySQL ultimately runs on some processor, and in general an integer comparison can be done in a single CPU cycle, while string comparisons will generally take multiple cycles, perhaps one cycle per character. See Why is integer comparison faster then string comparison? for more information.

SQL SELECT speed int vs varchar

Int comparisons are faster than varchar comparisons, for the simple fact that ints take up much less space than varchars.

This holds true both for unindexed and indexed access. The fastest way to go is an indexed int column.


As I see you've tagged the question postgreql, you might be interested in the space usage of different date types:

  • int fields occupy between 2 and 8 bytes, with 4 being usually more than enough ( -2147483648 to +2147483647 )
  • character types occupy 4 bytes plus the actual strings.

Performance gain by converting to INTEGER type in SQL for JOIN?

You should convert the string representation of a number to a number. Your references are not appropriate for two reasons:

  1. They seem to be more centered on MySQL (although that doesn't matter).
  2. They talk about primary keys, rather than joins.

I know of no reference that is going to say that having joins with different types is a good idea. There might be some situations where it doesn't matter, but you should settle on a single type and a number is better than a string:

  • A number is fixed in size. Strings vary in size. This adds a wee bit of overhead to indexes and comparison operations. Not a big deal, but stick with the better one if you have a choice.

  • The mixing of data types can preclude the use of indexes.

  • The mixing of data types requires conversion operations for each comparison.
  • The optimizer statistics for numbers and strings may not be directly comparable (depends on the optimizer).
  • You cannot declare foreign key relationships if the types are not the same.

So, stick with the same types. That is most important. Integer is marginally better than string, so use that.

Will joining on integer be quicker than joining on nvarchar?

Please take a look at the following post.

JOIN ON varchar VS join on int

Basically the int is the best option.



Related Topics



Leave a reply



Submit