Comma Separated Values in a Database Field

Is storing a delimited list in a database column really that bad?

In addition to violating First Normal Form because of the repeating group of values stored in a single column, comma-separated lists have a lot of other more practical problems:

  • Can’t ensure that each value is the right data type: no way to prevent 1,2,3,banana,5
  • Can’t use foreign key constraints to link values to a lookup table; no way to enforce referential integrity.
  • Can’t enforce uniqueness: no way to prevent 1,2,3,3,3,5
  • Can’t delete a value from the list without fetching the whole list.
  • Can't store a list longer than what fits in the string column.
  • Hard to search for all entities with a given value in the list; you have to use an inefficient table-scan. May have to resort to regular expressions, for example in MySQL:

    idlist REGEXP '[[:<:]]2[[:>:]]' or in MySQL 8.0: idlist REGEXP '\\b2\\b'
  • Hard to count elements in the list, or do other aggregate queries.
  • Hard to join the values to the lookup table they reference.
  • Hard to fetch the list in sorted order.
  • Hard to choose a separator that is guaranteed not to appear in the values

To solve these problems, you have to write tons of application code, reinventing functionality that the RDBMS already provides much more efficiently.

Comma-separated lists are wrong enough that I made this the first chapter in my book: SQL Antipatterns, Volume 1: Avoiding the Pitfalls of Database Programming.

There are times when you need to employ denormalization, but as @OMG Ponies mentions, these are exception cases. Any non-relational “optimization” benefits one type of query at the expense of other uses of the data, so be sure you know which of your queries need to be treated so specially that they deserve denormalization.

Getting values of comma separated fields in SQL Server

The easy way is to convert CSV values to rows for each Id, join that with CITY table and convert back to CSV values. I have written the logic inside the query.

;WITH CTE1 AS
(
-- Convert CSV to rows
SELECT Id,LTRIM(RTRIM(Split.a.value('.', 'VARCHAR(100)'))) 'NAME'
FROM
(
-- To change ',' to any other delimeter, just change ',' before '</M><M>' to your desired one
SELECT Id,CAST ('<M>' + REPLACE(Name, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM #TEMP
) AS A
CROSS APPLY Data.nodes ('/M') AS Split(a)
)
,CTE2 AS
(
-- Now join the values in rows with Id in CITY table
SELECT T.ID,T.NAME,C.CITYNAME
FROM CTE1 T
JOIN #CITY C ON T.NAME=C.ID
)
-- Now convert back to CSV format
SELECT DISTINCT ID,
SUBSTRING(
(SELECT ', ' + CITYNAME
FROM CTE2 I
WHERE I.Id=O.Id
FOR XML PATH('')),2,200000) [VALUES]
FROM CTE2 O
  • Click here to view result

I have some comma separated values in database column and I have a value to check if that value exists in those comma separated value in database

You should fix your table design and never store data as comma separated.

You could use FIND_IN_SET

SELECT * FROM colleges where FIND_IN_SET(1, Courses);

Demo

If you have spaces after or before comma you could use:

SELECT * FROM colleges where FIND_IN_SET(1, REPLACE(REPLACE(Courses, ', ', ','), ' ,', ','));

Demo

matching comma separated string to a database field and sort the result to be in the same order as the comma separated string

You can use ORDER BY FIELD

ORDER BY field(email_address, 'test@test.com','test@test2.com','test@test3.com');

Reference: https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_field

MySQL: How to check if a field value exists in a comma separated field in the same table

You can do it with this:

select *
from tablename
where
concat(',', col2, ',') like
concat('%,aaa:000.', substring_index(col1,'aaa:',-1), '/,%')

and if there may or may not be a / at the end then:

select *
from tablename
where
concat(',', col2, ',') like
concat('%,aaa:000.', substring_index(col1,'aaa:',-1), '/,%')
or
concat(',', col2, ',') like
concat('%,aaa:000.', substring_index(col1,'aaa:',-1), ',%')

See the demo

Data separated by commas inside a field vs new table

Never, ever, ever choose the separate-by-commas solution. It is a violation of every principle of database design. Create a separate table instead.

In your particular case, create the table with the PRIMARY KEY on (article_id, user_id). The database will then prohibit the entry of duplicate records. Depending on your SQL engine, you can additionally use INSERT OR IGNORE (or equivalent) to avoid throwing exceptions.

The other solution requires you to enforce the uniqueness in the all applications that touch the data.

When to use comma-separated values in a DB Column?

You already know the answer.

First off, your PHP code isn't even close to working because it only works if user 2 has only a single value in LookingFor or Drugs. If either of these columns contains multiple comma-separated values then IN won't work even if those values are in the exact same order as User 1's values. What do expect IN to do if the right-hand side has one or more commas?

Therefore, it's not "easy" to do what you want in PHP. It's actually quite a pain and would involve splitting user 2's fields into single values, writing dynamic SQL with many ORs to do the comparison, and then doing an extremely inefficient query to get the results.

Furthermore, the fact that you even need to write PHP code to answer such a relatively simple question about the intersection of two sets means that your design is badly flawed. This is exactly the kind of problem (relational algebra) that SQL exists to solve. A correct design allows you to solve the problem in the database and then simply implement a presentation layer on top in PHP or some other technology.

Do it correctly and you'll have a much easier time.



Related Topics



Leave a reply



Submit