Reliable Data Serving

Reliable data serving

If those are static files, just link to it directly. All decent servletcontainers/appservers have a well-developed DefaultServlet. If those are static files located outside the webapplication from where you'd link them to, then you can also just add the root folder of those files as another context. It's unclear which server you're using, but if it were Tomcat, you could just add a new <Context> to server.xml:

<Context docBase="/path/to/static/files" path="/files" />

This way it's accessible by http://example.com/files/....

If those are dynamically generated files or files coming from a database, then you need to develop a servlet which does the IO job efficiently: i.e. do not unnecessarily store the entire data in memory (e.g. in a ByteArrayInputStream or byte[] before emitting them to the output. Just write the bytes immediately to the output as it comes in. You may find this those examples of a basic fileservlet and a more advanced fileservlet (supporting resumes and so on) useful.

Service Fabric - A web api in cluster who' only job is to serve data from reliable collection

After reading around and observing others issues and Azure's samples, I have implemented a solution. Posting here the gotchas I had, hoping that will help other devs that are new to Azure Service fabric (Disclaimer: I am still a newbie in Service Fabric, so comments and suggestions are highly appreciated):

First, pretty simple - I ended up with a stateful service and a WEB Api Stateless service in an azure service fabric application:

DataStoreService - Stateful service that is reading the large XMLs and stores them into Reliable dictionary (happens in the RunAsync method).
Web Api provides an /api/query endpoint that filters out the Collection of XElements that is stored in the rteliable dictionary and serialize it back to the requestor

3 Gotchas

1) How to get your hands on the reliable dictionary data from the Stateless service, i.e how to get an instance of the Stateful service from Stateless one :

ServiceUriBuilder builder = new ServiceUriBuilder("DataStoreService");
IDataStoreService DataStoreServiceClient = ServiceProxy.Create<IDataStoreService>(builder.ToUri(), new ServicePartitionKey("Your.Partition.Name"));

Above code is already giving you the instance. I.e - you need to use a service proxy. For that purpose you need:

define an interface that your stateful service will implement, and use it when invoking the Create method of ServiceProxy (IDataStoreService)
Pass the correct Partition Key to Create method. This article gives very good intro on Azure Service Bus partiotions

2) Registering of Replica listeners - in order to avoid errors saying

The primary or stateless instance for the partition 'a67f7afa-3370-4e6f-ae7c-15188004bfa1' has invalid address, this means that right address from the replica/instance is not registered in the system

, you need to register replica listeners as stated in this post :

    public DataStoreService(StatefulServiceContext context)
        : base(context)
    {
        configurationPackage = Context.CodePackageActivationContext.GetConfigurationPackageObject("Config");
    }

3) Service fabric name spacing and referencing services - the ServiceUriBuilder class I took from the service-fabric-dotnet-web-reference-app. Basically you need something to generate an Uri of the form:

new Uri("fabric:/" + this.ApplicationInstance + "/" + this.ServiceInstance);,
where ServiceInstance is the name of the service you want to get instance of (DataStoreService in this case)

Is there an established pattern for paging in Service Fabric ReliableCollections

One way to build secondary indicies is to use Notifications. Using notifications with a reference type TKey & TValue, you can maintain a secondary index without creating any copies of your TKey or TValue.

If you need the secondary index to provide snapshot isolation, then the data structure chosen for the secondary index must implement Multi-Version Concurrency Control.

If you do not have such a data structure to host the secondary index, another option is to keep the transaction and the enumeration live across the paged client calls. This way you can use Reliable Dictionary's built-in snapshot support to provide a paged consistent scan over the data without blocking writes. Token in this case would be the TransactionId allowing your service to find the relevant enumeration to MoveNextAsync on. The disadvantage of using this option is that Reliable Dictionary will not be able to trim old versions of the values that are kept visible by the potentially long running snapshot transactions.

To mitigate the above disadvantage, you would probably want to throttle the number of in-flight snapshot transactions and how long a client has to complete the paged enumeration before your service disposes the enumeration and the relevant read transaction.

When CreateEnumerableAsync with a key filter is used, Reliable Dictionary will invoke the filter for every key to see if it satisfies the custom filter. Since TKeys are always kept in-memory today, for most key filters we have not seen issues here. The most expensive part of an enumeration tends to be retrieving paged out values from disk.

How reliable is a TCP connection?

The ordering of the packages of a tcp transmission os reliable.

For example your tcp message is split in three packages A, B and C.

Your client receives A, package B gets lost and then the client receives C. In the stream you will get only package A, package C is stored and as soon package B is retransmitted and received by your client, in the stream you will get package B and then C.

The same is done if package B is routed throug another way and is therefor received after package C.

The field 'Sequence Number' in the tcp header is needed for this mechanism.

Is storing data on the NodeJs server reliable?

Congrats on your learning so far! I hope you're having fun with it.

Is data stored in the server reliable does the data always stay the way it is intended?

No, storing data on the server is generally not reliable enough, unless you manage your server in its entirety. With managed services, storing data on the server should never be done because it could easily be wiped by the party managing your server.

Is it advisable to even store data in the server? I am thinking of a scenario where there are millions of users.

It is not advisable at all, you need a DB of some sort.

Is it that there is always one instance of the server running even when the app is served from different locations? If not, will storing data in the server bring up inconsistencies between the different server instances?

The way this works typically is that the server is always running, and has some basics information regarding its configuration stored locally - when scaling, hosted services are able to increase the processing capacity automatically, and handle load balancing in the background. Whenever the server is retrieving data for you, it requests it from the database, and then it's loaded into RAM (memory). In the example of the user, you would store the user data in a table or document (relational databases vs document oriented database) and then load them into memory to manipulate the data using 'functions'.

Additionally, to learn more about your 'data inconsistency' concern, look up concurrency as it pertains to databases, and data race conditions.

Hope that helps!

Reliable Data Serving