SQL Indexing on Varchar

Index on a Varchar?

  1. Yes, absolutely! Go right ahead and add an index. Clustering the index is probably unnecessary here, and will not be possible anyway if you already have another clustered index (such as the primary key) on the table.

  2. Changing the column to a CHAR(10) might have some benefits in terms of storage size, but it's unlikely to make a particularly great difference in index performance. I'd skip it for now.

SQL indexing on varchar

Keys on VARCHAR columns can be very long which results in less records per page and more depth (more levels in the B-Tree). Longer indexes also increase the cache miss ratio.

How many strings in average map to each integer?

If there are relatively few, you can create an index only on integer column and PostgreSQL will do the fine filtering on records:

CREATE INDEX ix_mytable_assoc ON mytable (assoc);

SELECT floatval
FROM mytable
WHERE assoc = givenint
AND phrase = givenstring

You can also consider creating the index on the string hashes:

CREATE INDEX ix_mytable_md5 ON mytable (DECODE(MD5(phrase), 'HEX'));

SELECT floatval
FROM mytable
WHERE DECODE(MD5(phrase), 'HEX') = DECODE(MD5('givenstring'), 'HEX')
AND phrase = givenstring -- who knows when do we get a collision?

Each hash is only 16 bytes long, so the index keys will be much shorter while still preserving the selectiveness almost perfectly.

Does index on Varchar make performance difference?

Does index on a varchar column make the query run slower?

No, it does not.

If the optimizer decides to use of the index, the query will run faster. INSERTs/UPDATEs/DELETEs on that table will be slower, but not likely enough to notice.

I don't need to do the LIKE % comparison

Be aware that using:

LIKE '%whatever%'

...will not use an index, but the following will:

LIKE 'whatever%'

The key is wildcarding the lefthand side of the string means that an index on the column can't be used.

Also know that MySQL limits the amount of space set aside for indexes - they can be up to 1000 bytes long for MyISAM (767 bytes for InnoDB) tables.

Indexing VARCHAR in MySQL

Indexes on VARCHAR() columns are indeed slightly less efficient than indexes on fixed-length fields like INT or BIGINT. But not significantly so.

The only conceivable situation where you would want to use a second table containing a numbered list of text strings is this: the number of distinct text strings in your application is much smaller than the number of rows in your tables. Why might that be true? For example, the text strings might be words in a so-called "controlled vocabulary." For example, music tracks have a genre like "rock", "classical", or "hiphop". It's pointless to allow arbitrary genres like "southern california alt-surf-rock" in such an application.

Don't overthink this. Keep in mind that database server developers have spent a great deal of time optimizing the performance of their indexes. It's almost impossible that you can do better than they have done, especially if you have to introduce extra tables and constraints to your system.

Put indexes on your VARCHAR() columns as needed.

(Another factor: collations get baked into indexes on VARCHAR() columns. If you build a custom indexing scheme like the one you propose, you have to deal with that complexity in your code. It's a notorious pain in the neck.)

Fun fact to know and tell: Systems in the olden days of computing (when all the cool kids had T1 lines) offered objects called "atoms." These were text strings referred to with id numbers. Atoms showed up in the X Window System (for example) in the xlib function call XInternAtom() and related functions. Why? partly to save memory and network bandwidth, which were scarcer then than now. Partly for the "controlled vocabulary" purpose mentioned earlier in this post.

Will indexing improve varchar(max) query performance, and how to create index

It's not worthwhile creating a regular index if you're doing LIKE '%keyword%' searches. The reason is that indexing works like searching a dictionary, where you start in the middle then split the difference until you find the word. That wildcard query is like asking you to lookup a word that contains the text "to" or something-- the only way to find matches is to scan the whole dictionary.

You might consider a full-text search, however, which is meant for this kind of scenario (see here).

Sql Server ignores index on varchar column and does tablescan when queried from Java

ANSWER: The problem results from Java passing a unicode string for the query parameter to SQLServer. SQLServer will not use this on a varchar index.

If you want the column to stay varchar (or cannot change it) and have access to the Java code, set the sendStringParametersAsUnicode connection string property to "false" (it defaults to "true"). Search "MSDN International Features of the JDBC Driver" for more details but also applies with CHAR, VARCHAR or LONGVARCHAR columns.

If you don't have access to the Java code but can change the database, changing the varchar column in the database to nvarchar will fix the problem at the cost of doubling data storage requirements.

EXAMPLE

jdbc:sqlserver://localhost:1433;databaseName=mydb;sendStringParametersAsUnicode=false

How is a varchar index stored?

Your varchar(150) column values will be sorted alphabetically (defined by the collation you use), as strings (not as arrays of chars). So basically, in the end, you have one long list of sorted strings.

This list is then arranged in a balanced btree fashion. Each level of an index points down to another level, and the values of the index entries define which range of values is contained on each lower-level index page.

With this arrangement, with only a few page reads, SQL Server will reach the leaf level of the index and be able to get your data. So really, the strings are interpreted as atomic strings - not compounds of characters or anything.

Basically, the structure of the index will look at lot like the one shown in SQL Server Index Basics - only instead of numerical values, you'll have string values in your index pages.

Is it good to create a nonclustered index on a column of type varchar?

There is nothing whatever wrong with creating an index on a VARCHAR column, or set of columns.

Regarding the performance of VARCHAR/INT, as with everything in a RDBMS, it depends on what you are doing. What you may be thinking of is the fact that clustering a table on a VARCHAR key is (in SQL Server) marginally less efficient than clustering on a monotonically increasing numerical key, and can introduce fragmentation.

Or you may be thinking of what you have heard about writing JOINs on VARCHAR columns - it is true, it is a little less efficient than a JOIN on numeric type, but it is only a little less efficient, nothing that would lead you to never join on varchar cols.

None of this does not mean that you should not create indexes on VARCHAR columns. A needed index on a VARCHAR column will boost query performance, often by orders of magnitude. If you need an index on a VARCHAR, create it. It makes no sense to try to find an integer column to create the index on - the engine will never use it.



Related Topics



Leave a reply



Submit