Opinions on Sensor/Reading/Alert Database Design

Opinions on sensor / reading / alert database design

Revised 01 Jan 11 21:50 UTC

Data Model

I think your Data Model should look like this:▶Sensor Data Model◀. (Page 2 relates to your other question re History).

Readers who are unfamiliar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.

Business (Rules Developed in the Commentary)

I did identify some early business Rules, which are now obsolete, so I have deleted them

These can be "read" in the Relations (read adjacent to the Data Model). The Business Rules and all implied Referential and Data Integrity can be implemented in, and thus guaranteed by, RULES, CHECK Constraints, in any ISO SQL database. This is a demonstration of IDEF1X, in the development of both the Relational keys, and the Entities and Relations. Note the Verb Phrases are more than mere flourish.

Apart from three Reference tables, the only static, Identifying entities are Location, NetworkSlave, and User. Sensor is central to the system, so I ahve given it its own heading.

Location

A Location contains one-to-many Sensors
A Location may have one Logger

NetworkSlave

A NetworkSlave collects Readings for one-to-many NetworkSensors

User

An User may maintain zero-to-many Locations
An User may maintain zero-to-many Sensors
An User may maintain zero-to-many NetworkSlaves
An User may perform zero-to-many Downloads
An User may make zero-to-many Acknowledgements, each on one Alert
An User may take zero-to-many Actions, each of one ActionType

Sensor

A SensorType is installed as zero-to-many Sensors
A Logger (houses and) collects Readings for one LoggerSensor
A Sensor is either one NetworkSensor or one LoggerSensor
- A NetworkSensor records Readings collected by one NetworkSlave
  .
A Logger is periodically Downloaded one-to-many times
- A LoggerSensor records Readings collected by one Logger
  .
A Reading may be deemed in Alert, of one AlertType
- An AlertType may happen on zero-to-many Readings
  .
An Alert may be one Acknowledgement, by one User
.
An Acknowledgement may be closed by one Action, of one ActionType, by one User
- An ActionType may be taken on zero-to-many Actions

Responses to Comments

Sticking Id columns on everything that moves, interferes with the determination of Identifiers, the natural Relational keys that give your database relational "power". They are Surrogate Keys, which means an additional Key and Index, and it hinders that relational power; which results in more joins than otherwise necessary. Therefore I use them only when the Relational key becomes too cumbersome to migrate to the child tables (and accept the imposed extra join).
Nullable keys are a classic symptom of an Unnormalised database. Nulls in the database is bad news for performance; but Nulls in FKs means each table is doing too many things, has too many meanings, and results is very poor code. Good for people who like to "refactor" their databases; completely unnecessary for a Relational database.
Resolved: An Alert may be Acknowledged; An Acknowledgement may be Actioned.
The columns above the line are the Primary Key (refer Notation document). SensorNo is a sequential number within LocationId; refer Business Rules, it is meaningless outside a Location; the two columns together form the PK. When you are ready to INSERT a Sensor (after you have checked that the attempt is valid, etc), it is derived as follows. This excludes LoggerSensors, which are zero:
```
INSERT Sensor VALUES (
    @LocationId,
    SensorNo = ( SELECT ISNULL(MAX(SensorNo), 0) + 1
        FROM Sensor
        WHERE LocationId = @LocationId
        )
    @SensorCode
    )
```
For accuracy or improved meaning, I have changed NetworkSlave monitors NetworkSensor to NetworkSlave collects Readings from NetworkSensor.
Check Constraints. The NetworkSensor and LoggerSensor are exclusive subtypes of Sensor, and their integrity can be set by CHECK constraints. Alerts, Acknowledgements and Actions are not subtypes, but their integrity is set by the same method, so I will list them together.
- Every Relation in the Data Model is implemented as a CONSTRAINT in the child (or subtype) as FOREIGN KEY (child_FK_columns) REFERENCES Parent (PK_columns)
- A Discriminator is required to identify which subtype a Sensor is. This is SensorNo = 0 for LoggerSensors; and non-zero for NetworkSensors.
- The existence of NetworkSensors and LoggerSensors are constrained by the FK CONSTRAINTS to NetworkSlave and Logger, respectively; as well as to Sensor.
- In NetworkSensor, include a CHECK constraint to ensure SensorNo is non-zero
- In LoggerSensor, include a CHECK constraint to ensure SensorNo is zero
- The existence of Acknowledgements and Actions are constrained by the identified FK CONSTRAINTS (An Acknowledgement cannot exist without an Alert; an Action cannot exist without an Acknowledgement). Conversely, an Alert with no Acknowledgement is in an unacknowledged state; an Alert with and Acknowledgementbut no Action is in an acknowledged but un-actioned state.
  .
Alerts. The concept in a design for this kind of (live monitoring and alert) application is many small programs, running independently; all using the database as the single version of the truth. Some programs insert rows (Readings, Alerts); other programs poll the db for existence of such rows (and send SMS messages, etc; or hand-held units pick up Alerts relevant to the unit only). In that sense, the db is a may be described as an message box (one program puts rows in, which another program reads and actions).
The assumption is, Readings for Sensors are being recorded "live" by the NetworkSlave, and every minute or so, a new set of Readings is inserted. A background process executes periodically (every minute or whatever), this is the main "monitor" program, it will have many functions within its loop. One such function will be to monitor Readings and produce Alerts that have occurred since the last iteration (of the program loop).
The following code segment will be executed within the loop, one for each AlertType. It is a classic Projection:
```
-- Assume @LoopDateTime contains the DateTime of the last iteration
INSERT Alert
    SELECT LocationId,
           SensorNo,
           ReadingDtm,
           "L"          -- AlertType "Low"
        FROM Sensor  s,
             Reading r
        WHERE s.LocationId = r.LocationId
        AND   s.SensorNo   = r.SensorNo
        AND   r.ReadingDtm > @LoopDtm
        AND   r.Value      < s.LowerLimit
INSERT Alert
    SELECT LocationId,
           SensorNo,
           ReadingDtm,
           "H"          -- AlertType "High"
        FROM Sensor  s,
             Reading r
        WHERE s.LocationId = r.LocationId
        AND   s.SensorNo   = r.SensorNo
        AND   r.ReadingDtm > @LoopDtm
        AND   r.Value      > s.UpperLimit
```
So an Alert is definitely a fact, that exists as a row in the database. Subsequently that may be Acknowledged by an User (another row/fact), and Actioned with an ActionType by an User.
Other that this (the creation by Projection act), ie. the general and unvarying case, I would refer to Alert only as a row in Alert; a static object after creation.
Concerns re Changing Users. That is taken care of already, as follows. At the top of my (revised yesterday) Answer, I state that the major Identifying elements are static. I have re-sequenced the Business Rules to improve clarity.
- For the reasons you mention, User.Name is not a good PK for User, although it remains an Alternate Key (Unique) and the one that is used for human interaction.
- User.Name cannot be duplicated, there cannot be more than one Fred; there can be in terms of FirstName-LastName; two Fred Bloggs, but not in terms of User.Name. Our second Fred needs to choose another User.Name. Note the identified Indices.
- UserId is the permanent record, and it is already the PK. Never delete User, it has historical significance. In fact the FK constraints will stop you (never use CASCADE in a real database, that is pure insanity). No need for code or triggers, etc.
- Alternately (to delete Users who never did anything, and thus release User.Name for use) allow Delete as long as there are no FK violations (ie. UserId is not referenced in Download, Acknowledgement, Action).
To ensure that only Users who are Current perform Actions, add an IsObsolete boolean in User (DM Updated), and check that column when that table is interrogated for any function (except reports) You can implement a View UserCurrent which returns only those Users.
Same goes for Location and NetworkSlave. If you need to differentiate current vs historical, let me know, I will add IsObsolete to them as well.
I don't know: you may purge the database of ancient Historical data periodically, delete rows that are (eg) over 10 years old. That has to be done from the bottom (tables) first, working up the Relations.

Feel free to ask Questions.

Note the IDEF1 Notation document has been expanded.

Database design too repetitive?

I'd say it's repetitive if each Sensor (for example) has exactly the same properties (columns).
If they differ at all, they should have different tables.

I'd try capturing this using NORMA or similar too to validate the design.

How far to go with database constraints?

should I be bothered that I haven't constrained that somehow?

Yes.

You have made two basic mistakes.

Sticking Idiot keys on everything that moves.
That has hindered your ability to model the data, as data (not as rows that have no meaning, but with an artificially enforced uniqueness), and expose Identifers; and Dependdencies (eg. a Sensor is Dependent o a Location). You are modelling spreadsheets, with pre-set Row_Ids, containing data. You need to Normalise the data, as data.
This has resulted in the problem you have identified , but there are other problems as well.
If you model the data, the Identifiers will be clear, and the Index and FK constraints will prevent this. What data is independent; what data belongs (is dependent on) what other data; what data does what to other data, and the basis of those actions.
Then (the major issues having been addressed) you are left with only minor constraints to address minor areas.
Otherwise you are stuck with adding constraints all over the place to try and get what you want, but never quite getting there. You know you need them, so you are looking for them.

Wrong place. We need to back up to (1).

I have answered your other question, and included a ▶Sensor Data Model◀. That does not address the deficiencies you identify here. However, I just saw this question, I will update the DM tomorrow and include these tables and columns.

▶Link to IDEF1X Notation◀ for anyone who is unfamiliar with the Standard for modelling Relational databases.

Questions

It looks like you need a reference table for Sensors, the shelf item, to hold UpperLimit and LowerLimit; rather than repeating it for every Location. Or are they set, localised, for each Location.
Think about the Logger being SensorNo zero.
Why don't the Sensors have an RFID ?
At each Location, is the Logger optional, is it 1::0-1 ?,

Database supertable vs more tables vs generic table

Some thoughts (ideas and opinions, not answers) on this:

The "supertable" (type/subtype) model is compelling, but can be tricky to implement and support. A few tricks:

ALERT
  AlertId    PK 1/2
  AlertType  PK 2/2  Check constraint (1 or 2, or better L or S)

...that is, compound primary key, where "type" must always be L)og or S)ensor.

LOGALERT
  LogAlertId  PK 1/2  FK 1/2
  AlertType   PK 2/2  FK 2/2

(and again for SENSORALERT)

...that is, same compound primary key, and the foreign key is on both columns. Done this way, there can only be one subtype table for a given type table, and the top table clearly shows which subtype is involved. No way to enforce the existance of a row in the subtype table, so set up your data carefully. And much of the querying complexity can be dealt with (covered up?) using views.

The downside is, it is complex, confusing to those not (yet) familiar with it, and will require extra support and effort. The real question is, is it worth it?

How often must you deal with all alerts, not only Log or only Sensor? If most of the time you only have to deal with one or the other, it's probably not worth it.
How much Log- or Sensor-specific details do you have to deal with? Beyond the actual events related to an individual alert, how similar across both types are the myriad attributes (details in columns) you'll be tracking? If users, aknowledgements, and corrective actions are (sufficiently) identicial, you can make them attributes (columns) of ALERT, but if not then you have to make them atttributes of the appropriate subtype, and you lose the consolidating advantage of the supertype.
And you have to get it correct now, during design time. Research, ask questions, gaze into crystal balls (i.e. ponder what might happen in the future to invalidate everyone's current assumptions), because if you get it wrong now you and your successors may have to live with it forever.

Database design problem

How abnout something like this?

Second structure with subclass relationship...

Building
  BuildingId   pk
  BuildingName
  etc

Area
  AreaId      pk
  AreaName
  BuildIngId  fk -> Building  
  etc

Location
  LocationId         pk
  LocationType (LiveMonitor, Logger, Handprobe)  pk
  LocationName
  AreaId       fk -> Area
  etc

LiveMonitorLocation
  LocationId  pk, fk -> Location
  LocationType ConstantValue = LiveMonitor  fk ->  Location    

LoggerLocation
  LocationId   pk, fk -> Location
  LocationType ConstantValue = Logger  fk ->  Location    

HandprobeLocation
  LocationId   pk, fk -> Location
  LocationType ConstantValue = Handprobe fk ->  Location    

Logger
  LoggerId     pk
  LocationId   fk -> LoggerLocation
  SensorId     fk -> Sensor

Handprobe
  HandProbeId  pk
  Locationid  fk -> HandprobeLocation

Sensor
  SensorId     pk

LiveMonitorSensors
  SensorId     pk, fk -> Sensor
  LocationId   pk, fk -> LiveMonitorLocation

SensorReadings
  SensorId     pk, fk -> Sensor
  ReadingUtc   pk
  ReadingValue data

Historical / auditable database

Revised 01 Jan 11

Ok, so there is a gap between where I sit (deliver fully auditable databases; yours being a particular requirement of that) and where you sit: based on your questions and comments. Which we will probably work out in the commentary. Here's a position to start from.

To provide this requirement, there is no need at all for: triggers; mass duplication; broken integrity; etc.
This is not a classic Temporal requirement, either, so no need for the "period" capability, but you can.
ValidFrom and ValidTo is a Normalisation error: the ValidTo is data that is easily derived; ValidTo in any row is duplicated, in the ValidFrom of the next row; you have an Update Anomaly (when you update one column in one row, you additionally have to update the other column in the next row); you have to use a dummy value for "current".
- All unnecessary, use ValidFrom only, and keep the db clean and pure 5NF.
- The Caveat is, if PostgreSQL can't perform Subqueries without falling in a heap (ala Oracle), then fine, kep ValidTo.

All of these things are editable in the system by users, and deletable.

Well, no. It is a database holding important information; with Referential Integrity, not a scratchpad, so the user cannot just walk up to it and "delete" something. It will contradict the same users requirement for maintaining historical data (in the Reading; Alert; Ack; Action; Download).

Cascading deletes are not allowed. Those functions are check boxes for non-databases, MS Access types. For real databases, the RI constraints stop parents with children from being deleted.
Primary Keys cannot (should not) be changed. Eg. UserId; LocationId; NetworkSlaveCode never change; remember, they are carefully considered Identifiers. One characteristic of PKs is that they are stable.
You can add new Users; you can change a current User's name; but you cannot delete an User who has entries in Download, Acknowledgement, Action.

Basically if it's editable then it has to be historical (so that excludes readings and alerts).

Also excludes: Downloads; Acknowledgements; Actions.

And the Reference tables: SensorType; AlertType; ActionType.

And the new History tables: they are inserted into, but they cannot be updated or deleted.

The problem I find with the isObselete flag is.. Say if you change the Location, the Sensor foreign key will now point to an obselete record, meaning you will have to duplicate every sensor record. This problem gets exponentially worse as the hierachy gets bigger.

Ok, so now do you understand the LocationId (FK) in Sensor will not change; there is no mass duplication, etc ? There is no problem in the first place (and there is in that stupid book!) that gets exponentially worse in the second place.
~~IsObsolete is inadequate for your requirement.~~ (Refer below)
The UpdatedDtm in any real row (Reading, etc) identifies the Parent (FK to Sensor) History row (its AuditedDtm) that was in effect at the time.
Full Relational capability; Declarative Refential Integrity, etc.
Maintain the IDEF1X, Relational concept of strong Identifiers ... There is only one Current parent row (eg. Location)
The rows in the History are Images of the current row, before it was changed, at the stated AuditedDtm. The Current row (non-history) shows the one last UpdatedDtm, when the row was changed.
The AuditedDtm shows the entire series of UpdatedDtms for any given key; and thus I have used it to "partition" the real key in a temporal sense.

All that is required is a History table for each changeable table. I have provided the Hiistory tables for four Identifying tables: Location; Sensor; NetworkSlave; and User.

Please read this for understanding Auditable in the accounting sense.

Data Model

Link to Sensor Data Model with History (Page 2 contains the History tables and context).

Readers who are not familiar with the Relational Modelling Standard may find IDEF1X Notation useful.

Response to Comments

(1) My first issue is that of referential integrity with the historic data, in that I'm not sure there is any, and if there is I'm not sure how it works. For instance, in SensoryHistory it would be possible to add a record that had an UpdatedDtm indicating a date time before the location itself existed, if you see what I mean. Whether this is actually an issue I'm not sure - enforcing that might be over the top.

(You raised a similar issue in the other question.) It may be that the dbs you have experienced did not actually have the Referential Integrity in place; that the Relation lines were there just for documentation; that the RI was "implemented in app code" (which means there is no RI).

This is an ISO/IEC/ANSI Standard SQL database. That allows Declarative Referential Integrity. Every Relation line is implemented as a PK::FK Reference, an actual Constraint that is Declared. Eg:

CREATE TABLE Location
    ...
    CONSTRAINT UC_PK
        PRIMARY KEY (LocationId)
    ...
CREATE TABLE Sensor
    ...
    CONSTRAINT UC_PK
        PRIMARY KEY (LocationId, SensorNo)
    CONSTRAINT Location_Sensor_fk
        FOREIGN KEY (LocationId)
        REEFERENCES Location(LocationId)
    ...
CREATE TABLE SensorHistory
    ...
    CONSTRAINT UC_PK
        PRIMARY KEY (LocationId, SensorNo, UpdatedDtm))
    CONSTRAINT Sensor_SensorHistory_fk
        FOREIGN KEY (LocationId, SensorNo)
        REEFERENCES Sensor (LocationId, SensorNo)
    ...

Those Declared Constraints are enforced by the server; not via triggers; not in app code. That means:

A Sensor with a LocationId that does not exist in Location cannot be inserted
A LocationId in Location that has rows in Sensor cannot be deleted
A SensorHistory with a LocationId+SensorNo that does not exist in Sensor cannot be inserted
A LocationId+SensorNo in Sensor that has rows in SensorHistory cannot be deleted.

(1.1) All columns should have RULEs and CHECK Constraints to Constrain their range of values. That in addition to the fact that all INSERT/UPDATE/DELETEs are programmatic, within stored procs, therefore accidents do not happen, and people do not walk up to the database and run commands against it (excepts SELECTS).

Generally I stay away from triggers. If you are using stored procs, and the normal permissions, then this:

in SensoryHistory it would be possible to add a record that had an UpdatedDtm indicating a date time before the Location itself existed, if you see what I mean

is prevented. So is inserting a SensorHistory with an UpdatedDtm earlier than the Sensor itself. But procs are not Declarative Rules. However if you want to be doubly sure (and I mean doubly, because the INSERTS are all via a proc, direct command by users), then sure, you have to use a trigger. For me, that is over the top.

(2) how do I indicate deletion? I could just add a flag to the non-historical version of the table I guess.

Not sure yet. Eg. Do you accept that when a Sensor is deleted, it is final ... (yes, history is maintained) ... and then when a new Sensor is added to the Location, it will have a new SensorNo ... there is no Sensor being logically replaced with the new one, with or without a gap in time ?

From a end-user's point of view, via the software they should be able to add, edit and delete sensors at will with no limitation. But yes, once deleted it is deleted and cannot be undeleted. There's nothing to stop them re-adding a sensor later though with the exact same parameters.

And "delete" Locations, NetworkSlaves, and Users as well.

Ok. Then the new Sensor with the same parameters, is truly new, it has a new SensorNo, and is independent of any previous logical Sensor. We can add an IsObsolete BOOLEAN to the four identifying tables; it is now identified as adequate. The Delete is now a Soft Delete.

(2.1) For NetworkSensor and LoggerSensor, which are actually dependent on two parents: they are obsolete if either of their parents are obsolete. So there is no point giving them an IsObsolete column, which has a dual meaning, which can be derived from the applicable parent.

(2.2) Just to be clear, users cannot delete any rows from any Transaction and History tables, right?

(3) When updating a table, what method would be best to insert the new row in the historical table and update the main table? Just normal SQL statements inside a transaction maybe?

Yes. That is the classic use of a Transaction, as per ACID Properties, it is Atomic; it either succeeds in toto or fails in toto (to be retried later when the problem is fixed).

(4) Referenced Book

The definitive and seminal text is Temporal Data and the Relational Model C J Date, H Darwen, N A Lorentzos. As in, those of us who embrace the RM are familiar with the extensions, and what is required in the successor to the RM; rather than some other method.

The referenced book is horrible, and free. The PDF isn't a PDF (no search; no indexing). Opening my MS and Oracle is telling; a few good bits couched in lots of fluff. Many misrepresentations. Not worth responding to in detail (if you want a proper review, open a new question).

(4.1) ValidTo in addition to ValidFrom. Serious mistake (as identified at the top of my answer) which the book makes; then laboriously solves. Don't make the mistake in the first place, and you have nothing to solve in the second place. As I understand it, that will eliminate your triggers.

(4.2) Simple rules, taking both Normalisation and Temporal requirements into account. First and foremost, you need to deeply understand (a) the temporal requirement and (b) the DataTypes, correct usage and limitations. Always store:

Instant as DATETIME, eg. UpdatedDtm
Interval as INTEGER, clearly identifying the Unit in the column name, eg. IntervalSec
Period. Depends on conjunct or disjunct.
- For conjunct, which this requirement is, (4.1) applies: use one DATETIME; the end of the period can be derived from the beginning of the period of the next row.
- For disjunct periods, yes, you need 2 x DATETIMEs, eg, RentedFrom and a RentedTo with gaps in-between.

(4.3) They mess with the "Temporal Primary Key", which complicates code (in addition to requiring triggers to control the Update Anomaly). I have already delivered a clean (tried and tested) Temporal Primary Key.

(4.4) They mess with dummy values, non-real values, and Nulls for "Now". I do not allow such things in a database. Since I am not storing the duplicated ValidTo, I do not have the problem, there is nothing to solve.

(4.5) One has to wonder why a 528 page "textbook" is available free on the web, in poor PDF form.

(5) I [an User] could quiet happily delete all the LocationHistory rows for instance, (leaving only the current version in the Location table) - even though there may exist a SensorHistory row that conceptually "belongs" to a previous version of the Location, if that makes sense.

It does not make sense to me, there is still a gap in the communication we have to close. Please keep interacting until it is closed.

In a real (standard ISO/IEC/ANSI SQL) database, we do not GRANT INSERT/UPDATE/DELETE permission to users. We GRANT SELECT and REFERENCES only (to chosen users) All INSERT/UPDATE/DELETEs are coded in Transactions, which means stored procs. Then we GRANT EXEC on each stored proc to selected users (use ROLES to reduce administration).
- Therefore no one can delete from any table without executing a proc.
- Do not write a proc to delete from any History table. These rows should not be deleted. In this case, the non-permission and the non-existence of code is the Constraint.
- Technically, all History rows are valid, there is no Period to concern yourself with. The oldest LocationHistory row contains the before-image of the original Location row before it was changed. The youngest LocationHistory rows is the before-image of the current Location row. Every LocationHistory row in-between is thusly valid and applies to the Period in-between.
- No need to "prune" or look for a few LocationHistory rows that can be deleted on the basis that they apply to a Period that is not used: they are all used. (Definitively, without the need for checking for any mapping of Location children to any LocationHistory row(s), to prove it.)
- Bottom line: an User cannot delete from any History (or Transaction) table.
- Or do you mean something different again ?
- Note I have added (1.1) above.

(6) Corrected one mistake in the DM. An Alert is an expression of Reading, not Sensor.

(7) Corrected the Business Rules in the other question/answer to reflect that; and the new rules exposed in this question.

(8) Do you understand/appreciate, that since we have a fully IDEF1X compliant model, re Identifiers:

Opinions on Sensor/Reading/Alert Database Design