How to Create a Linux Cluster for Running Physics Simulations in Java

How to create a Linux cluster for running physics simulations in java?

I would very highly recommend the Java Parallel Processing Framework especially since your computations are already independant. I did a good bit of work with this undergraduate and it works very well. The work of doing the implementation is already done for you so I think this is a good way to achieve the goal in "number 2."

http://www.jppf.org/

Buying Cluster/Grid/Cloud Time?

You might want to check out Amazon's EC2 service:

http://aws.amazon.com/ec2/

Some people have already done some work in regards to clustering with EC2:

http://www.google.com/search?q=cluster+computing+amazon+ec2&rls=com.microsoft:*&ie=UTF-8&oe=UTF-8&startIndex=&startPage=1

Additionally, Microsoft has offered Windows Azure, which has native hooks for .NET, allows you to run anything, really (Java, php), given that you are able to load a runtime and code from storage (or deployed with your app, but that has it's own set of pros/cons).

Is there any scenario where an application instance runs across multiple computers?

If you want to learn how to realize such a scenario (a single instance across multiple computers), I think you should read some articles about MPI.

it has become a de facto standard for
communication among processes that
model a parallel program running on a
distributed memory system.

Regarding your worries: Obviously, you'll need to somehow consciously change your program to run as one instance across several computers. Otherwise, no sharing of course takes place, and as Shy writes, there is nothing to worry about. This kind of stuff wouldn't happen automatically.

Running application on a cluster

__________Edit__________

Based on your comment, you can put all your jobs from stage 0 into a queue and start to process it. You can also have a logic what checks if you have only a few jobs left and tries to add new jobs from stage 1. This would speed up a bit your calculation, gives you better resource usage, but it's optional and makes your system more complex.

I suggest you to use SQS ( Or SWF) for storing the jobs, S3 for storing the files and an autoscaling group of spot instances for worker nodes.

Unfortunately Lambda doesn't support C++ at the moment. ( Node.js and Java is supported.)

________Original________

AWS supports several concepts which you may consider:

Decoupling: You can use SQS (Simple Queue Service) for job queuing, which gives you a redundant and fault tolerant job queue. You can have a fleet of worker instances, which are requesting jobs form the queue, running them and if they are finished, deleting the job from the queue. If the instances hangs/crashes during the execution of the job, after the timeout period the job goes back to the queue and an other instance will execute it again.

Other service is the SWF ( Simple Workflow Service). This service internally uses SQS queues, with this service, you may need less script to glue your entire workflow together.

Redundant storage: I would definitely use AWS S3 for storage, because it's cheap and redundant. After the first read, I don't think you need any advanced (file system like) feature. ( for example locking.)

Spot instances: For the worker nodes, I would use Spot instances which are much cheaper. The only issue with them if you need a really fast answer for your task all the time. ( If you generating daily reports, spot instances are perfect solution.)

+1: You may use AWS Lambda function to run your jobs. You can trigger your lambda function based on S3 events. For example you uploaded a new *.data file. However Lambda functions cannot run too long. But if you are able to use lambda function, then all your environment will contains only S3 buckets and lambda function. Both of them are AWS managed service, so your system would be extremely flexible, fault tolerant. I can't say any exact details about pricing, but I assume it would be cheaper then running EC2 instances.

Summary: If you can run your estimations parallel, AWS gives you a lots of power and speed. (for a good money) especially if your load is changing during the day.

A good source: White Paper on ‘Cloud Architectures’ and Best Practices of Amazon S3, EC2, SimpleDB, SQS



Related Topics



Leave a reply



Submit