Opinions on sensor / reading / alert database design
Revised 01 Jan 11 21:50 UTC
Data Model
I think your Data Model should look like this:▶Sensor Data Model◀. (Page 2 relates to your other question re History).
Readers who are unfamiliar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.
Business (Rules Developed in the Commentary)
I did identify some early business Rules, which are now obsolete, so I have deleted them
These can be "read" in the Relations (read adjacent to the Data Model). The Business Rules and all implied Referential and Data Integrity can be implemented in, and thus guaranteed by, RULES, CHECK Constraints, in any ISO SQL database. This is a demonstration of IDEF1X, in the development of both the Relational keys, and the Entities and Relations. Note the Verb Phrases are more than mere flourish.
Apart from three Reference tables, the only static, Identifying entities are Location, NetworkSlave, and User. Sensor is central to the system, so I ahve given it its own heading.
Location
- A
Location
contains one-to-manySensors
- A
Location
may have one Logger
NetworkSlave
- A NetworkSlave collects Readings for one-to-many NetworkSensors
User
- An
User
may maintain zero-to-manyLocations
- An
User
may maintain zero-to-manySensors
- An
User
may maintain zero-to-manyNetworkSlaves
- An
User
may perform zero-to-manyDownloads
- An
User
may make zero-to-manyAcknowledgements
, each on oneAlert
- An
User
may take zero-to-manyActions
, each of oneActionType
Sensor
A
SensorType
is installed as zero-to-manySensors
A
Logger
(houses and) collectsReadings
for oneLoggerSensor
A
Sensor
is either oneNetworkSensor
or oneLoggerSensor
- A
NetworkSensor
recordsReadings
collected by oneNetworkSlave
.
- A
- A
Logger
is periodicallyDownloaded
one-to-many times- A
LoggerSensor
recordsReadings
collected by oneLogger
.
- A
- A
Reading
may be deemed inAlert
, of oneAlertType
- An
AlertType
may happen on zero-to-manyReadings
.
- An
- An
Alert
may be oneAcknowledgement
, by one User
. - An
Acknowledgement
may be closed by oneAction
, of oneActionType
, by oneUser
- An
ActionType
may be taken on zero-to-manyActions
- An
Responses to Comments
Sticking
Id
columns on everything that moves, interferes with the determination of Identifiers, the natural Relational keys that give your database relational "power". They are Surrogate Keys, which means an additional Key and Index, and it hinders that relational power; which results in more joins than otherwise necessary. Therefore I use them only when the Relational key becomes too cumbersome to migrate to the child tables (and accept the imposed extra join).Nullable keys are a classic symptom of an Unnormalised database. Nulls in the database is bad news for performance; but Nulls in FKs means each table is doing too many things, has too many meanings, and results is very poor code. Good for people who like to "refactor" their databases; completely unnecessary for a Relational database.
Resolved: An
Alert
may beAcknowledged
; AnAcknowledgement
may beActioned
.The columns above the line are the Primary Key (refer Notation document).
SensorNo
is a sequential number withinLocationId
; refer Business Rules, it is meaningless outside aLocation
; the two columns together form the PK. When you are ready to INSERT a Sensor (after you have checked that the attempt is valid, etc), it is derived as follows. This excludes LoggerSensors, which are zero:INSERT Sensor VALUES (
@LocationId,
SensorNo = ( SELECT ISNULL(MAX(SensorNo), 0) + 1
FROM Sensor
WHERE LocationId = @LocationId
)
@SensorCode
)For accuracy or improved meaning, I have changed
NetworkSlave monitors NetworkSensor
toNetworkSlave collects Readings from NetworkSensor
.Check Constraints. The
NetworkSensor
andLoggerSensor
are exclusive subtypes ofSensor
, and their integrity can be set by CHECK constraints.Alerts, Acknowledgements
andActions
are not subtypes, but their integrity is set by the same method, so I will list them together.Every Relation in the Data Model is implemented as a CONSTRAINT in the child (or subtype) as FOREIGN KEY (child_FK_columns) REFERENCES Parent (PK_columns)
A Discriminator is required to identify which subtype a
Sensor
is. This isSensorNo = 0
forLoggerSensors
; and non-zero forNetworkSensors
.- The existence of
NetworkSensors
andLoggerSensors
are constrained by the FK CONSTRAINTS toNetworkSlave
andLogger
, respectively; as well as to Sensor. - In
NetworkSensor
, include a CHECK constraint to ensureSensorNo
is non-zero In
LoggerSensor
, include a CHECK constraint to ensureSensorNo
is zeroThe existence of
Acknowledgements
andActions
are constrained by the identified FK CONSTRAINTS (AnAcknowledgement
cannot exist without anAlert
; anAction
cannot exist without anAcknowledgement
). Conversely, anAlert
with noAcknowledgement
is in an unacknowledged state; anAlert
with andAcknowledgement
but noAction
is in an acknowledged but un-actioned state.
.
Alerts. The concept in a design for this kind of (live monitoring and alert) application is many small programs, running independently; all using the database as the single version of the truth. Some programs insert rows (
Readings, Alerts
); other programs poll the db for existence of such rows (and send SMS messages, etc; or hand-held units pick up Alerts relevant to the unit only). In that sense, the db is a may be described as an message box (one program puts rows in, which another program reads and actions).The assumption is,
Readings
forSensors
are being recorded "live" by theNetworkSlave
, and every minute or so, a new set ofReadings
is inserted. A background process executes periodically (every minute or whatever), this is the main "monitor" program, it will have many functions within its loop. One such function will be to monitorReadings
and produceAlerts
that have occurred since the last iteration (of the program loop).The following code segment will be executed within the loop, one for each AlertType. It is a classic Projection:
So an
-- Assume @LoopDateTime contains the DateTime of the last iteration
INSERT Alert
SELECT LocationId,
SensorNo,
ReadingDtm,
"L" -- AlertType "Low"
FROM Sensor s,
Reading r
WHERE s.LocationId = r.LocationId
AND s.SensorNo = r.SensorNo
AND r.ReadingDtm > @LoopDtm
AND r.Value < s.LowerLimit
INSERT Alert
SELECT LocationId,
SensorNo,
ReadingDtm,
"H" -- AlertType "High"
FROM Sensor s,
Reading r
WHERE s.LocationId = r.LocationId
AND s.SensorNo = r.SensorNo
AND r.ReadingDtm > @LoopDtm
AND r.Value > s.UpperLimitAlert
is definitely a fact, that exists as a row in the database. Subsequently that may beAcknowledged
by anUser
(another row/fact), andActioned
with anActionType
by anUser
.Other that this (the creation by Projection act), ie. the general and unvarying case, I would refer to
Alert
only as a row inAlert
; a static object after creation.Concerns re Changing
Users
. That is taken care of already, as follows. At the top of my (revised yesterday) Answer, I state that the major Identifying elements are static. I have re-sequenced the Business Rules to improve clarity.For the reasons you mention,
User.Name
is not a good PK forUser
, although it remains an Alternate Key (Unique) and the one that is used for human interaction.User.Name
cannot be duplicated, there cannot be more than oneFred
; there can be in terms ofFirstName-LastName
; twoFred Bloggs
, but not in terms ofUser.Name
. Our second Fred needs to choose anotherUser.Name
. Note the identified Indices.UserId
is the permanent record, and it is already the PK. Never deleteUser
, it has historical significance. In fact the FK constraints will stop you (never use CASCADE in a real database, that is pure insanity). No need for code or triggers, etc.Alternately (to delete
Users
who never did anything, and thus releaseUser.Name
for use) allow Delete as long as there are no FK violations (ie.UserId
is not referenced inDownload, Acknowledgement, Action
).
To ensure that only
Users
who are Current performActions
, add anIsObsolete
boolean in User (DM Updated), and check that column when that table is interrogated for any function (except reports) You can implement a ViewUserCurrent
which returns only thoseUsers
.Same goes for
Location
andNetworkSlave
. If you need to differentiate current vs historical, let me know, I will addIsObsolete
to them as well.I don't know: you may purge the database of ancient Historical data periodically, delete rows that are (eg) over 10 years old. That has to be done from the bottom (tables) first, working up the Relations.
Feel free to ask Questions.
Note the IDEF1 Notation document has been expanded.
Database design too repetitive?
I'd say it's repetitive if each Sensor (for example) has exactly the same properties (columns).
If they differ at all, they should have different tables.
I'd try capturing this using NORMA or similar too to validate the design.
How far to go with database constraints?
should I be bothered that I haven't constrained that somehow?
Yes.
You have made two basic mistakes.
Sticking
Id
iot keys on everything that moves.That has hindered your ability to model the data, as data (not as rows that have no meaning, but with an artificially enforced uniqueness), and expose Identifers; and Dependdencies (eg. a Sensor is Dependent o a Location). You are modelling spreadsheets, with pre-set Row_Ids, containing data. You need to Normalise the data, as data.
This has resulted in the problem you have identified , but there are other problems as well.
If you model the data, the Identifiers will be clear, and the Index and FK constraints will prevent this. What data is independent; what data belongs (is dependent on) what other data; what data does what to other data, and the basis of those actions.
Then (the major issues having been addressed) you are left with only minor constraints to address minor areas.
Otherwise you are stuck with adding constraints all over the place to try and get what you want, but never quite getting there. You know you need them, so you are looking for them.
Wrong place. We need to back up to (1).
I have answered your other question, and included a ▶Sensor Data Model◀. That does not address the deficiencies you identify here. However, I just saw this question, I will update the DM tomorrow and include these tables and columns.
▶Link to IDEF1X Notation◀ for anyone who is unfamiliar with the Standard for modelling Relational databases.
Questions
It looks like you need a reference table for Sensors, the shelf item, to hold UpperLimit and LowerLimit; rather than repeating it for every Location. Or are they set, localised, for each Location.
Think about the Logger being SensorNo zero.
Why don't the Sensors have an RFID ?
At each
Location
, is theLogger
optional, is it 1::0-1 ?,
Database supertable vs more tables vs generic table
Some thoughts (ideas and opinions, not answers) on this:
The "supertable" (type/subtype) model is compelling, but can be tricky to implement and support. A few tricks:
ALERT
AlertId PK 1/2
AlertType PK 2/2 Check constraint (1 or 2, or better L or S)
...that is, compound primary key, where "type" must always be L)og or S)ensor.
LOGALERT
LogAlertId PK 1/2 FK 1/2
AlertType PK 2/2 FK 2/2
(and again for SENSORALERT)
...that is, same compound primary key, and the foreign key is on both columns. Done this way, there can only be one subtype table for a given type table, and the top table clearly shows which subtype is involved. No way to enforce the existance of a row in the subtype table, so set up your data carefully. And much of the querying complexity can be dealt with (covered up?) using views.
The downside is, it is complex, confusing to those not (yet) familiar with it, and will require extra support and effort. The real question is, is it worth it?
- How often must you deal with all alerts, not only Log or only Sensor? If most of the time you only have to deal with one or the other, it's probably not worth it.
- How much Log- or Sensor-specific details do you have to deal with? Beyond the actual events related to an individual alert, how similar across both types are the myriad attributes (details in columns) you'll be tracking? If users, aknowledgements, and corrective actions are (sufficiently) identicial, you can make them attributes (columns) of ALERT, but if not then you have to make them atttributes of the appropriate subtype, and you lose the consolidating advantage of the supertype.
- And you have to get it correct now, during design time. Research, ask questions, gaze into crystal balls (i.e. ponder what might happen in the future to invalidate everyone's current assumptions), because if you get it wrong now you and your successors may have to live with it forever.
Database design problem
How abnout something like this?
Second structure with subclass relationship...
Building
BuildingId pk
BuildingName
etc
Area
AreaId pk
AreaName
BuildIngId fk -> Building
etc
Location
LocationId pk
LocationType (LiveMonitor, Logger, Handprobe) pk
LocationName
AreaId fk -> Area
etc
LiveMonitorLocation
LocationId pk, fk -> Location
LocationType ConstantValue = LiveMonitor fk -> Location
LoggerLocation
LocationId pk, fk -> Location
LocationType ConstantValue = Logger fk -> Location
HandprobeLocation
LocationId pk, fk -> Location
LocationType ConstantValue = Handprobe fk -> Location
Logger
LoggerId pk
LocationId fk -> LoggerLocation
SensorId fk -> Sensor
Handprobe
HandProbeId pk
Locationid fk -> HandprobeLocation
Sensor
SensorId pk
LiveMonitorSensors
SensorId pk, fk -> Sensor
LocationId pk, fk -> LiveMonitorLocation
SensorReadings
SensorId pk, fk -> Sensor
ReadingUtc pk
ReadingValue data
Historical / auditable database
Revised 01 Jan 11
Ok, so there is a gap between where I sit (deliver fully auditable databases; yours being a particular requirement of that) and where you sit: based on your questions and comments. Which we will probably work out in the commentary. Here's a position to start from.
To provide this requirement, there is no need at all for: triggers; mass duplication; broken integrity; etc.
This is not a classic Temporal requirement, either, so no need for the "period" capability, but you can.
ValidFrom and ValidTo is a Normalisation error: the ValidTo is data that is easily derived; ValidTo in any row is duplicated, in the ValidFrom of the next row; you have an Update Anomaly (when you update one column in one row, you additionally have to update the other column in the next row); you have to use a dummy value for "current".
All unnecessary, use ValidFrom only, and keep the db clean and pure 5NF.
The Caveat is, if PostgreSQL can't perform Subqueries without falling in a heap (ala Oracle), then fine, kep ValidTo.
All of these things are editable in the system by users, and deletable.
Well, no. It is a database holding important information; with Referential Integrity, not a scratchpad, so the user cannot just walk up to it and "delete" something. It will contradict the same users requirement for maintaining historical data (in the Reading; Alert; Ack; Action; Download).
Cascading deletes are not allowed. Those functions are check boxes for non-databases, MS Access types. For real databases, the RI constraints stop parents with children from being deleted.
Primary Keys cannot (should not) be changed. Eg. UserId; LocationId; NetworkSlaveCode never change; remember, they are carefully considered Identifiers. One characteristic of PKs is that they are stable.
You can add new Users; you can change a current User's name; but you cannot delete an User who has entries in Download, Acknowledgement, Action.
Basically if it's editable then it has to be historical (so that excludes readings and alerts).
Also excludes: Downloads; Acknowledgements; Actions.
And the Reference tables: SensorType; AlertType; ActionType.
And the new History tables: they are inserted into, but they cannot be updated or deleted.
The problem I find with the isObselete flag is.. Say if you change the Location, the Sensor foreign key will now point to an obselete record, meaning you will have to duplicate every sensor record. This problem gets exponentially worse as the hierachy gets bigger.
Ok, so now do you understand the
LocationId
(FK) inSensor
will not change; there is no mass duplication, etc ? There is no problem in the first place (and there is in that stupid book!) that gets exponentially worse in the second place.(Refer below)IsObsolete
is inadequate for your requirement.The
UpdatedDtm
in any real row (Reading
, etc) identifies the Parent (FK toSensor
) History row (itsAuditedDtm
) that was in effect at the time.Full Relational capability; Declarative Refential Integrity, etc.
Maintain the IDEF1X, Relational concept of strong Identifiers ... There is only one Current parent row (eg. Location)
The rows in the History are Images of the current row, before it was changed, at the stated
AuditedDtm
. The Current row (non-history) shows the one last UpdatedDtm, when the row was changed.The
AuditedDtm
shows the entire series ofUpdatedDtms
for any given key; and thus I have used it to "partition" the real key in a temporal sense.
All that is required is a History table for each changeable table. I have provided the Hiistory tables for four Identifying tables: Location; Sensor; NetworkSlave; and User.
Please read this for understanding Auditable in the accounting sense.
Data Model
Link to Sensor Data Model with History (Page 2 contains the History tables and context).
Readers who are not familiar with the Relational Modelling Standard may find IDEF1X Notation useful.
Response to Comments
(1) My first issue is that of referential integrity with the historic data, in that I'm not sure there is any, and if there is I'm not sure how it works. For instance, in SensoryHistory it would be possible to add a record that had an UpdatedDtm indicating a date time before the location itself existed, if you see what I mean. Whether this is actually an issue I'm not sure - enforcing that might be over the top.
(You raised a similar issue in the other question.) It may be that the dbs you have experienced did not actually have the Referential Integrity in place; that the Relation lines were there just for documentation; that the RI was "implemented in app code" (which means there is no RI).
This is an ISO/IEC/ANSI Standard SQL database. That allows Declarative Referential Integrity. Every Relation line is implemented as a PK::FK Reference, an actual Constraint that is Declared. Eg:
CREATE TABLE Location
...
CONSTRAINT UC_PK
PRIMARY KEY (LocationId)
...
CREATE TABLE Sensor
...
CONSTRAINT UC_PK
PRIMARY KEY (LocationId, SensorNo)
CONSTRAINT Location_Sensor_fk
FOREIGN KEY (LocationId)
REEFERENCES Location(LocationId)
...
CREATE TABLE SensorHistory
...
CONSTRAINT UC_PK
PRIMARY KEY (LocationId, SensorNo, UpdatedDtm))
CONSTRAINT Sensor_SensorHistory_fk
FOREIGN KEY (LocationId, SensorNo)
REEFERENCES Sensor (LocationId, SensorNo)
...
Those Declared Constraints are enforced by the server; not via triggers; not in app code. That means:
- A
Sensor
with aLocationId
that does not exist inLocation
cannot be inserted - A
LocationId
inLocation
that has rows inSensor
cannot be deleted - A
SensorHistory
with aLocationId+SensorNo
that does not exist inSensor
cannot be inserted - A
LocationId+SensorNo
inSensor
that has rows inSensorHistory
cannot be deleted.
(1.1) All columns should have RULEs and CHECK Constraints to Constrain their range of values. That in addition to the fact that all INSERT/UPDATE/DELETEs are programmatic, within stored procs, therefore accidents do not happen, and people do not walk up to the database and run commands against it (excepts SELECTS).
Generally I stay away from triggers. If you are using stored procs, and the normal permissions, then this:
in SensoryHistory it would be possible to add a record that had an UpdatedDtm indicating a date time before the Location itself existed, if you see what I mean
is prevented. So is inserting a SensorHistory with an UpdatedDtm earlier than the Sensor itself. But procs are not Declarative Rules. However if you want to be doubly sure (and I mean doubly, because the INSERTS are all via a proc, direct command by users), then sure, you have to use a trigger. For me, that is over the top.
(2) how do I indicate deletion? I could just add a flag to the non-historical version of the table I guess.
Not sure yet. Eg. Do you accept that when a Sensor
is deleted, it is final ... (yes, history is maintained) ... and then when a new Sensor
is added to the Location
, it will have a new SensorNo
... there is no Sensor
being logically replaced with the new one, with or without a gap in time ?
From a end-user's point of view, via the software they should be able to add, edit and delete sensors at will with no limitation. But yes, once deleted it is deleted and cannot be undeleted. There's nothing to stop them re-adding a sensor later though with the exact same parameters.
And "delete" Locations, NetworkSlaves
, and Users
as well.
Ok. Then the new Sensor
with the same parameters, is truly new, it has a new SensorNo
, and is independent of any previous logical Sensor
. We can add an IsObsolete
BOOLEAN to the four identifying tables; it is now identified as adequate. The Delete is now a Soft Delete.
(2.1) For NetworkSensor
and LoggerSensor
, which are actually dependent on two parents: they are obsolete if either of their parents are obsolete. So there is no point giving them an IsObsolete
column, which has a dual meaning, which can be derived from the applicable parent.
(2.2) Just to be clear, users cannot delete any rows from any Transaction and History tables, right?
(3) When updating a table, what method would be best to insert the new row in the historical table and update the main table? Just normal SQL statements inside a transaction maybe?
Yes. That is the classic use of a Transaction, as per ACID Properties, it is Atomic; it either succeeds in toto or fails in toto (to be retried later when the problem is fixed).
(4) Referenced Book
The definitive and seminal text is Temporal Data and the Relational Model C J Date, H Darwen, N A Lorentzos. As in, those of us who embrace the RM are familiar with the extensions, and what is required in the successor to the RM; rather than some other method.
The referenced book is horrible, and free. The PDF isn't a PDF (no search; no indexing). Opening my MS and Oracle is telling; a few good bits couched in lots of fluff. Many misrepresentations. Not worth responding to in detail (if you want a proper review, open a new question).
(4.1) ValidTo
in addition to ValidFrom
. Serious mistake (as identified at the top of my answer) which the book makes; then laboriously solves. Don't make the mistake in the first place, and you have nothing to solve in the second place. As I understand it, that will eliminate your triggers.
(4.2) Simple rules, taking both Normalisation and Temporal requirements into account. First and foremost, you need to deeply understand (a) the temporal requirement and (b) the DataTypes, correct usage and limitations. Always store:
Instant as DATETIME, eg. UpdatedDtm
Interval as INTEGER, clearly identifying the Unit in the column name, eg. IntervalSec
Period. Depends on conjunct or disjunct.
- For conjunct, which this requirement is, (4.1) applies: use one DATETIME; the end of the period can be derived from the beginning of the period of the next row.
- For disjunct periods, yes, you need 2 x DATETIMEs, eg,
RentedFrom
and aRentedTo
with gaps in-between.
(4.3) They mess with the "Temporal Primary Key", which complicates code (in addition to requiring triggers to control the Update Anomaly). I have already delivered a clean (tried and tested) Temporal Primary Key.
(4.4) They mess with dummy values, non-real values, and Nulls for "Now". I do not allow such things in a database. Since I am not storing the duplicated ValidTo
, I do not have the problem, there is nothing to solve.
(4.5) One has to wonder why a 528 page "textbook" is available free on the web, in poor PDF form.
(5) I [an User] could quiet happily delete all the LocationHistory rows for instance, (leaving only the current version in the Location table) - even though there may exist a SensorHistory row that conceptually "belongs" to a previous version of the Location, if that makes sense.
It does not make sense to me, there is still a gap in the communication we have to close. Please keep interacting until it is closed.
In a real (standard ISO/IEC/ANSI SQL) database, we do not GRANT INSERT/UPDATE/DELETE permission to users. We GRANT SELECT and REFERENCES only (to chosen users) All INSERT/UPDATE/DELETEs are coded in Transactions, which means stored procs. Then we GRANT EXEC on each stored proc to selected users (use ROLES to reduce administration).
Therefore no one can delete from any table without executing a proc.
Do not write a proc to delete from any History table. These rows should not be deleted. In this case, the non-permission and the non-existence of code is the Constraint.
Technically, all History rows are valid, there is no Period to concern yourself with. The oldest LocationHistory row contains the before-image of the original Location row before it was changed. The youngest LocationHistory rows is the before-image of the current Location row. Every LocationHistory row in-between is thusly valid and applies to the Period in-between.
No need to "prune" or look for a few LocationHistory rows that can be deleted on the basis that they apply to a Period that is not used: they are all used. (Definitively, without the need for checking for any mapping of Location children to any LocationHistory row(s), to prove it.)
Bottom line: an User cannot delete from any History (or Transaction) table.
Or do you mean something different again ?
Note I have added (1.1) above.
(6) Corrected one mistake in the DM. An Alert
is an expression of Reading
, not Sensor
.
(7) Corrected the Business Rules in the other question/answer to reflect that; and the new rules exposed in this question.
(8) Do you understand/appreciate, that since we have a fully IDEF1X compliant model, re Identifiers:
Related Topics
How to Copy a Record in a SQL Table But Swap Out the Unique Id of the New Row
Ora-01652: Unable to Extend Temp Segment by 128 in Tablespace System: How to Extend
Oracle SQL: Update If Exists Else Insert
SQL Server Cumulative Sum by Group
"Column Not Allowed Here" Error in Insert Statement
Combine Multiple Select Statements
SQL Server for Xml Path Make Repeating Nodes
Slow Bulk Insert for Table with Many Indexes
How to Select Only the First Rows for Each Unique Value of a Column
How to Delete the Top 1000 Rows from a Table Using SQL Server 2008
How to Execute a Text File from SQL Query
Partition by with and Without Keep in Oracle
The Conversion of a Datetime2 Data Type to a Datetime Data Type Resulted in an Out-Of-Range
Check Whether String Is a Date Postgresql