Sql Design Approach for Searching a Table with an Unlimited Number of Bit Fields

SQL design approach for searching a table with an unlimited number of bit fields

something like this may work for you:

define tables:

CREATE TABLE #Apartments
(
ApartmentID int not null primary key identity(1,1)
,ApartmentName varchar(500) not null
,Status char(1) not null default ('A')
--....
)

CREATE TABLE #AttributeTypes
(
AttributeType smallint not null primary key
,AttributeDescription varchar(500) not null
)

CREATE TABLE #Attributes --boolean attributes, if row exists apartment has this attribute
(
ApartmentID int not null --FK to Apartments.ApartmentID
,AttributeID int not null primary key identity(1,1)
,AttributeType smallint not null --fk to AttributeTypes
)

insert sample data:

SET NO COUNT ON
INSERT INTO #Apartments VALUES ('one','A')
INSERT INTO #Apartments VALUES ('two','A')
INSERT INTO #Apartments VALUES ('three','I')
INSERT INTO #Apartments VALUES ('four','I')

INSERT INTO #AttributeTypes VALUES (1,'dishwasher')
INSERT INTO #AttributeTypes VALUES (2,'deck')
INSERT INTO #AttributeTypes VALUES (3,'pool')
INSERT INTO #AttributeTypes VALUES (4,'pets allowed')
INSERT INTO #AttributeTypes VALUES (5,'washer/dryer')
INSERT INTO #AttributeTypes VALUES (6,'Pets Alowed')
INSERT INTO #AttributeTypes VALUES (7,'No Pets')

INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,1)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,2)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,3)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,4)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,5)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (1,6)

INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,1)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,2)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,3)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,4)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (2,7)

INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,1)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,2)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,3)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (3,4)

INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (4,1)
INSERT INTO #Attributes (ApartmentID, AttributeType) VALUES (4,2)
SET NOCOUNT OFF

sample search query:

;WITH GetMatchingAttributes AS
(
SELECT
ApartmentID,COUNT(AttributeID) AS CountOfMatches
FROM #Attributes
WHERE AttributeType IN (1,2,3) --<<change dynamically or split a CSV string and join in
GROUP BY ApartmentID
HAVING COUNT(AttributeID)=3--<<change dynamically or split a CSV string and use COUNT(*) from resulting table
)
SELECT
a.*
FROM #Apartments a
INNER JOIN GetMatchingAttributes m ON a.ApartmentID=m.ApartmentID
WHERE a.Status='A'
ORDER BY m.CountOfMatches DESC

OUTPUT:

ApartmentID ApartmentName 
----------- --------------
1 one
2 two

(2 row(s) affected)

In the search query above, I just included a CSV string of atribute IDs to search for. In reality, you could create a Search stored procedure where you pass in a CSV parameter containing the IDs to search on. You can look at this answer to learn about loop free splitting of that CSV strings into table which you can join to. This would result in not needing to use any dynamic SQL.

EDIT based on the many comments:

if you add a few columns to the #AttributeTypes table you could dynamically build the search page. Here are a few suggestions:

  • Status: "A"ctive "I"nactive
  • ListOrder: can use this to sort by to build the screen
  • ColumnNumber: can help organize fields on the same screen row
  • AttributeGroupID: to group fields, see below
  • etc.

You could make all the fields checkboxes, or add another table called #AttributesGroups, and group some together and use radio buttons. For example, since "Pets Allowed" and "No Pets" are exclusive, add a row in the #AttributesGroups table "Pets". The application would group the attributes in the interface. Attributes in Groups would work the same as regular ungrouped attributes, just collect the selected IDs and pass it in to the search procedure. However, for each group you'll need to have the application include a "no preference" radio button and default it on. This option will not have an attribute ID and is not passed in, since you don't want to consider the attribute.

In my example, I do show an example of a "super attribute" that is in the #Apartments
table, "Status". You should only consider major attributes for this table. If you start using these, you may want to alter the CTE to be FROM #Apartments with filtering on these fields and then join to #Attributes. However you will run into issues of Dynamic Search Conditions, so read this article by Erland Sommarskog.

EDIT on latest comments:

here is code to have a list of exclude attributes:

;WITH GetMatchingAttributes AS
(
SELECT
ApartmentID,COUNT(AttributeID) AS CountOfMatches
FROM #Attributes
WHERE AttributeType IN (1,2,3) --<<change dynamically or split an include CSV string and join in
GROUP BY ApartmentID
HAVING COUNT(AttributeID)=3--<<change dynamically or split a CSV string and use COUNT(*) from resulting include table
)
, SomeRemoved AS
(
SELECT
m.ApartmentID
FROM GetMatchingAttributes m
LEFT OUTER JOIN #Attributes a ON m.ApartmentID=a.ApartmentID
AND a.AttributeType IN (5,6) --<<change dynamically or split an exclude CSV string and join in
WHERE a.ApartmentID IS NULL
)
SELECT
a.*
FROM #Apartments a
INNER JOIN SomeRemoved m ON a.ApartmentID=m.ApartmentID
WHERE a.Status='A'

I don't think I would go this way though. I'd go with the approach I outlined in my previous EDIT above. When include/exclude of an attribute is necessary, I'd just add an attribute for each: "Pets allowed" and "No Pets".

I updated the sample data from the original post to show this.

Run the original query with:

  • (..,..,6,..) to find apartments that allow pets
  • (..,..,7,..) to find apartments where no pets are allowed
  • (..,..,..) if there is no preference.

I think this is the better approach. When combined with the grouping idea and dynamically built search page described in the last edit, I think this would be better and would run faster.

Approach to generic database design

  1. Since users can enter their own property names, i guess every query is going to involve a scan of the properties table (in your example i need to find the propertyid of [inspectionDate]). If the properties table is large, your join would also take a long time. You could try and optimize by denormalizing and storing name with propertyID. This would be a denaormalized column in the MaterialsProperties table.
  2. You could try adding a property type (int, char etc) to the materialsproperty table and partition the table on the type.
  3. Look at Object Relational Mapping/Entity Attribute Value Model techniques for query optimization.
  4. Since you already have a lot of data (2 million records) do some data mining as see if there are repeating groups of properties for many materials. You can them put them in one schema and the rest as the EAV table. Look here for details: http://portal.acm.org/citation.cfm?id=509015&dl=GUIDE&coll=GUIDE&CFID=49465839&CFTOKEN=33971901

Implementing custom fields in a database for large numbers of records

Microsoft introduced sparse columns exactly for this type of problems. Tha point is that in a "classic" design you end up with large number of columns, most of the NULLs for any particular row. Same here with sparse columns, but NULLs don't require any storage. Moreover, you can create sets of columns and modify sets with XML.

Performance- and storage-wise, sparse columns are the winner.

http://technet.microsoft.com/en-us/library/cc280604.aspx



Related Topics



Leave a reply



Submit