Generating Sequential Numbers in Multi-User Saas Application

Generating sequential numbers in multi-user saas application

Typical solution for any relation database could be a table like

user_invoice_numbers (user_id int primary key clustered, last_id int)

and a stored procedure or a SQL query like

update user_invoice_numbers set last_id = last_id + 1 where user_id = @user_id
select last_id from user_invoice_numbers where user_id = @user_id

It will work for users (if each user has a few simultaneously running transactions) but will not work for companies (for example when you need companies_invoice_numbers) because transactions from different users inside the same company may block each other and there will be a performance bottleneck in this table.

The most important functional requirement you should check is whether your system is allowed to have gaps in invoice numbering or not. When you use standard auto_increment, you allow gaps, because in most database I know, when you rollback transaction, the incremented number will not be rolled back. Having this in mind, you can improve performance using one of the following guidelines

1) Exclude the procedure that you use for getting new numbers from the long running transactions. Let's suppose that insert into invoice procedure is a long running transaction with complex server-side logic. In this case you first acquire a new id , and then, in separate transaction insert new invoice. If last transaction will be rolled back, auto-number will not decrease. But user_invoice_numbers will not be locked for long time, so a lot of simultaneous users could insert invoices at the same time

2) Do not use a traditional transactional database to store the data with last id for each user. When you need to maintain simple list of keys and values there are lot of small but fast database engines that can do that work for you. List of Key/Value databases. Probably memcached is the most popular. In the past, I saw the projects where simple key/value storages where implemented using Windows Registry or even a file system. There was a directory where each file name was the key and inside each file was the last id. And this rough solution was still better then using SQL table, because locks were issued and released very quickly and were not involved into transaction scope.

Well, if my proposal for the optimization seems to be overcomplicated for your project, forget about this now, until you will actually run into performance issues. In most projects simple method with an additional table will work pretty fast.

Best way to generate (and save) incremental invoice numbers in a multi-tenant MySQL database

Regardless of if you're using these values as database IDs or not, re-using IDs is virtually guaranteed to cause problems at some point. Even if you're not re-using IDs you're going to run into the case where two invoice creation requests run at the same time and get the same MAX()+1 result.

To get around all this you need to reimplement a simple sequence generator that locks its storage while a value is being issued. Eg:

CREATE TABLE client_invoice_serial (
  -- note: also FK this back to the client record
  client_id INTEGER UNSIGNED NOT NULL PRIMARY KEY,
  serial INTEGER UNSIGNED NOT NULL DEFAULT 0
);

$dbh = new PDO('mysql:...');
/* this defaults to 'on', making every query an implicit transaction. it needs to
be off for this. you may or may not want to set this globally, or just turn it off
before this, and back on at the end. */
$dbh->setAttribute(PDO::ATTR_AUTOCOMMIT,0);
// simple best practice, ensures that SQL errors MUST be dealt with. is assumed to be enabled for the below try/catch.
$dbh->setAttribute(PDO::ATTR_ERRMODE_EXCEPTION,1);

$dbh->beginTransaction();
try {
    // the below will lock the selected row
    $select = $dbh->prepare("SELECT * FROM client_invoice_serial WHERE client_id = ? FOR UPDATE;");
    $select->execute([$client_id]);

    if( $select->rowCount() === 0 ) {
        $insert = $dbh->prepare("INSERT INTO client_invoice_serial (client_id, serial) VALUES (?, 1);");
        $insert->execute([$client_id]);
        $invoice_id = 1;
    } else {
        $invoice_id = $select->fetch(PDO::FETCH_ASSOC)['serial'] + 1;
        $update = $dbh->prepare("UPDATE client_invoice_serial SET serial = serial + 1 WHERE client_id = ?");
        $update->execute([$client_id])
    }
    $dbh->commit();
} catch(\PDOException $e) {
    // make sure that the transaction is cleaned up ASAP, then let the exception bubble up into your general error handling.
    $dbh->rollback();
    throw $e; // or throw a more pertinent error/exception of your choosing.
}
// both committing and rolling back will release the lock

At a very basic level this is what MySQL is doing in the background for AUTOINCREMENT columns.

Should I allow my users to update their registered email in SAAS application

With the assumption of you have SaaS Application,if you are using registered email id as the username for the users to get authenticated in the system, then while updating the new emailid make sure that there is no other account exists with the new email address.

From my POV, there wont be any problem as long as you have a Logging facility which tracks all the email gone out for a particular customer.

Generally, SaaS application will not force the customers to stick with one particular email address throughout the lifetime of the user in the application.

What is the proper way to manually sequence a column in Postgres?

I don't think there is a "cheap" solution to this problem. The only solution that is safe (but not necessarily fast) in a multi-user environment is to have a "counter" table with one row for each customer.

Each transaction has to first lock the customer's entry before inserting a new ticket, something like this:

UPDATE cust_numbers
  SET current_number = current_number + 1
WHERE cust_id = 42
RETURNING current_number;

That will do three things in one step

increase the current "sequential" number for that customer
lock the row so other transactions doing the same will have to wait for a lock
return the new value of that column.

With that new number you can now insert a new ticket. If the transaction is committed, it will also release the lock on the cust_numbers table, thus other transactions "waiting for a number" can proceed.

You could wrap the two steps (update.. returning & the insert) into a single stored function so that the logic behind this is centralized. Your application would only call select insert_ticket(...) without knowing how the ticket number is generated.

You might also want to create a trigger on the customer table to automatically insert a row into the cust_numbers table when a new customer is created.

The disadvantage of this is that you effectively serialize the transactions that are inserting new tickets for the same customer. Depending on the volumn of inserts in your system this might turn out to be a performance problem.

Edit

Another disadvantage of this is, that you are not forced to insert tickets that way which might lead to problems if e.g. a new developer forgets about this.

How to create a multi-tenant database with shared table structures?

However there are some companies of
course who fear that their data might
be compromised, so we are evaluating
other solutions.

This is unfortunate, as customers sometimes suffer from a misconception that only physical isolation can offer enough security.

There is an interesting MSDN article, titled Multi-Tenant Data Architecture, which you may want to check. This is how the authors addressed the misconception towards the shared approach:

A common misconception holds that
only physical isolation can provide an
appropriate level of security. In
fact, data stored using a shared
approach can also provide strong data
safety, but requires the use of more
sophisticated design patterns.

As for technical and business considerations, the article makes a brief analysis on where a certain approach might be more appropriate than another:

The number, nature, and needs of the
tenants you expect to serve all affect
your data architecture decision in
different ways. Some of the following
questions may bias you toward a more
isolated approach, while others may
bias you toward a more shared
approach.

How many prospective tenants do you expect to target? You may be nowhere
near being able to estimate
prospective use with authority, but
think in terms of orders of magnitude:
are you building an application for
hundreds of tenants? Thousands? Tens
of thousands? More? The larger you
expect your tenant base to be, the
more likely you will want to consider
a more shared approach.

How much storage space do you expect the average tenant's data to occupy?
If you expect some or all tenants to
store very large amounts of data, the
separate-database approach is probably
best. (Indeed, data storage
requirements may force you to adopt a
separate-database model anyway. If so,
it will be much easier to design the
application that way from the
beginning than to move to a
separate-database approach later on.)

How many concurrent end users do you expect the average tenant to support?
The larger the number, the more
appropriate a more isolated approach
will be to meet end-user requirements.

Do you expect to offer any per-tenant value-added services, such
as per-tenant backup and restore
capability? Such services are easier
to offer through a more isolated
approach.

UPDATE: Further to update about the expected number of tenants.

That expected number of tenants (10k) should exclude the multi-database approach, for most, if not all scenarios. I don't think you'll fancy the idea of maintaining 10,000 database instances, and having to create hundreds of new ones every day.

From that parameter alone, it looks like the shared-database, single-schema approach is the most suitable. The fact that you'll be storing just about 50Mb per tenant, and that there will be no per-tenant add-ons, makes this approach even more appropriate.

The MSDN article cited above mentions three security patterns that tackle security considerations for the shared-database approach:

Trusted Database Connections
Tenant View Filter
Tenant Data Encryption

When you are confident with your application's data safety measures, you would be able to offer your clients a Service Level Agrement that provides strong data safety guarantees. In your SLA, apart from the guarantees, you could also describe the measures that you would be taking to ensure that data is not compromised.

UPDATE 2: Apparently the Microsoft guys moved / made a new article regarding this subject, the original link is gone and this is the new one: Multi-tenant SaaS database tenancy patterns (kudos to Shai Kerer)

Saas: Single-instance vs Multi-instance vs Single-tenant vs Multi-tenant?

I'm going to write from my direct experience:

1) simple answer is 'yes'.

2) nearly yes: there will probably be refined access rights, say an administrator or two, and general users.

3) they're providing you with just one instance of that module, which will be single tenant.

4) they're providing you with multiple instances of that module, which will be single tenant.

5) I would use single-tenant to refer to the server hosting the instances is used by only one tenant. This might be done for perceived security benefits, or the server is running on a time zone that is non-standard for SaaS provider, like staying on UTC all year round.

6) I would use multi-tenant to refer to the server hosting the instances is used by more than one tenant. This tends to be more cost effective and probably just as secure as single-tenant.

7) yes, no, yes, yes.

Generating Sequential Numbers in Multi-User Saas Application