Doing Calculations in MySQL VS PHP

Doing calculations in MySQL vs PHP

I'd play to the strengths of each system.

Aggregating, joining and filtering logic obviously belongs on the data layer. It's faster, not only because most DB engines have 10+ years of optimisation for doing just that, but you minimise the data shifted between your DB and web server.

On the other hand, most DB platforms i've used have very poor functionality for working with individual values. Things likes date formatting and string manipulation just suck in SQL, you're better doing that work in PHP.

Basically, use each system for what it's built to do.

In terms of maintainability, as long as the division between what happens where is clear, separating these to types of logic shouldn't cause much problem and certainly not enough to out way the benefits. In my opinion code clarity and maintainability are more about consistency than about putting all the logic in one place.


Re: specific examples...

  1. I know this isn't what you're referring too but dates are almost a special case. You want to make sure that all dates generated by the system are created either on the web server OR the database. Doing otherwise will cause some insidious bugs if the db server and webserver are ever configured for different timezones (i've seen this happen). Imagine, for example, you've got a createdDate column with a default of getDate() that is applied on insert by the DB. If you were to insert a record then, using a date generated in PHP (eg date("Y-m-d", time() - 3600), select records created in the last hour, you might not get what you expect. As for which layer you should do this on, i'd favour the DB for, as in the example, it lets you use column defaults.

  2. For most apps i'd do this in PHP. Combining first name and surname sounds simple until you realise you need salutations, titles and middle initials in there sometimes too. Plus you're almost definitely going to end up in a situation where you want a users first name, surname AND a combine salutation + firstname + surname. Concatenating them DB-side means you end up moving more data, although really, it's pretty minor.

  3. Depends. As above, if you ever want to use them separately you're better off performance-wise pulling them out separately and concatenating when needed. That said, unless the datasets your dealing with are huge there are probably other factors (like, as you mention, maintainability) that have more bearing.

A few rules of thumb:

  • Generating incremental ids should happen in the DB.
  • Personally, i like my default applied by the DB.
  • When selecting, anything that reduces the number of records should be done by the DB.
  • Its usually good to do things that reduce the size of the dataset DB-side (like with the strings example above).
  • And as you say; ordering, aggregation, sub-queries, joins, etc. should always be DB-side.
  • Also, we haven't talked about them but triggers are usually bad/necessary.

There are a few core trade-offs your facing here and the balance really depends on you application.

Some things should definitely-everytime-always be done in SQL. Excluding some exceptions (like the dates thing) for lot of tasks SQL can be very clunky and can leave you with logic in out of the way places. When searching your codebase for references to a specific column (for example) it is easy to miss those contained in a view or stored procedure.

Performance is always a consideration but, depending on you app and the specific example, maybe not a big one. Your concerns about maintainability and probably very valid and some of the performance benefits i've mentioned are very slight so beware of premature optimisation.

Also, if other systems are accessing the DB directly (eg. for reporting, or imports/exports) you'll benefit from having more logic in the DB. For example, if you want to import users from another datasource directly, something like an email validation function would be reusable is implemented in SQL.

Short answer: it depends. :)

Calculations in php or mysql data

There is no blanket rule for all situations. Many factors affect the performance and efficiency of websites. So there's no single 'Best'.

If you look at something like Magento, it does it both ways. On the one hand it has a full EAV structure with every piece of data abstracted out and normalised to the nth degree. On the other hand, it also aggregates pre-calculated values in flat tables for performance reasons. This includes discount amounts, base prices, tax quantities (in base and chosen currency), etc. The former situation is best in terms of flexibility and robustness, the flat table is better in terms of performance.

A flat table obviously makes it faster when dealing with bulk calculations, as everything has already been worked out. But it does, as kernelpanic pointed out, mean that any changes to settings may require a bulk recalculation of every value. In the case of historical data such as order history, you probably won't want to recalculate the actual amounts people ended up paying, but the possibility of having to do so does need to be taken into consideration when determining the best solution.

If performance is paramount and the calculations are expensive to run, then knowing that you may have to refresh the values in bulk from time-to-time allows you to make an informed decision to cache it or not.

But if it's not a performance critical aspect, or the calculations are expensive but not run often, it's cleaner to leave them out of the database as they really belong in the business logic processing part of an application i.e. the code.

Again there is more than one way of defining "best", so it depends on the circumstances. It is really just a matter of balancing requirements - speed, cleanliness, memory usage, processor requirements, disk space usage, the need to fit into some arbitrary data structure defined by development managers - your decision will need to account for these factors.

Without a real-world problem to address, speculation is really all that can be given. If you do have a more complex situation, I'd be happy to take a look and offer my thoughts.

edit: From my own observations, a Magento catalog page with flat data and over 200k products loads in about 10 - 20 seconds with no page caching enabled. When flat data was disabled and the EAV structure was used, it would take minutes. I'm not at work right now so I don't have my profiling data handy, but it's a testament to the fact that in real world applications there is no single best solution.

What's faster? php calculation or mysql query

For the individual operation the way to know is: Test it and be aware that performance on both sides can vary between versions and configurations.

On the larger system-level approach mind the following:

  • If you transfer data from the database to PHP to then do calculation you probably have extra cost due to networking, thus using SQL and calculating there has benefits.
  • Logic can be put into the database, using virtual columns, views or stored procedures/functions, thus multiple applications can share the logic
  • However for performance under scale it is simpler to add a new PHP host in front of a database than adding an extra database host.

For this specific question you also have to mind:

  • If you have to do the calculations every time maybe you can do this already while storing he data, thus taking more disk space but saving calculation time
  • Depending on the amount of data those costs could be quite neglectable and you should rather put it where it makes logically sense. (did you measure and see any problem at all or are you doing premature optimization?) Is the calculation more like "data retrival" or "business logic"? - This is a subjective choice.

where should I do the calculating stuff,PHP or Mysql?

Anything that can be done using a RDBMS (GROUPING, SUMMING, AVG) where the data can be filtered on the server side, should be done in the RDBMS.

If the calculation would be better suited in PHP then fine, go with that, but otherwise don't try to do in PHP what a RDBMS was made for. YOU WILL LOSE.

Are date calculations faster in PHP or MySQL?

The overhead of talking to the database would negate any and all advantages it may or may not have. It's simple: if you're in PHP anyway, do the calculations in PHP. If the data you want to do calculations on is in the database, do it in the database. Don't transition between systems just because unless you can really proof that it saves you a ton of time to do so (most likely it doesn't). What you're showing is child's play in either system, it hardly gets any faster as it is.

Do a calculation in SQL query or PHP?

If you want to sort by the results, then do the calculation in SQL.

Otherwise, you are just using the database as a "file store" and not taking advantage of the functionality that it offers. In addition, by doing the ordering in the database, you can limit the number of rows being returned -- another optimization.

PHP vs MySQL Performance ( if , functions ) in query

Here is a nice description of your question:
Doing calculations in MySQL vs PHP

In case of the second example the speed issue can be significant.
First of all you do not know how big are your comments, so in case of

$x = mysql_query("SELECT * FROM comments");

while( $res = mysql_fetch_assoc( $x ) ){
$min_comment = substr( $x['comment'],0,10 ) ;
}

you ask your server to return you everything (here I mean the whole length of the comment) and this can be significant. Multiplying by the number of rows in the table it can be quite big size of data, which you have to transfer between php and sql. In the second case this SELECT * , SUBSTR(comment, 0, 10) as min_comment FROM comments
this will be already done on the server and will not require additional memory.

In case of the first example, I think it is also better to do it on sql side, because you will still need to do additional loop afterwards. Apart from this, people who will be reading your code might be confused why exactly do you need that code.

Calculate age: PHP vs MySQL, which method is better?

Here is a test:

Create a table with 100K random dates

drop table if exists birthdays;
create table birthdays (
id int auto_increment primary key,
dob date
);
insert into birthdays (dob)
select '1950-01-01' + interval floor(rand(1)*68*365) day as dob
from information_schema.COLUMNS c1
, information_schema.COLUMNS c2
, information_schema.COLUMNS c3
limit 100000
;

Run this PHP script

<?php
header('Content-type: text/plain');
$db = new PDO("mysql:host=localhost;dbname=test", "test","");

### SQL
$starttime = microtime(true);

$stmt = $db->query("SELECT id, dob, TIMESTAMPDIFF(YEAR, dob, CURDATE()) AS age FROM birthdays");
$data = $stmt->fetchAll(PDO::FETCH_OBJ);

$runtime = microtime(true) - $starttime;
echo "SQL: $runtime \n";

### PHP
$starttime = microtime(true);

$stmt = $db->query("SELECT id, dob FROM birthdays");
$data = $stmt->fetchAll(PDO::FETCH_OBJ);
foreach ($data as $row) {
$row->age = date_diff(date_create($row->dob), date_create('today'))->y;
}

$runtime = microtime(true) - $starttime;
echo "PHP: $runtime \n";

Result:

SQL: 0.19094109535217 
PHP: 1.203684091568

It looks like the SQL solution is 6 times faster. But that is not quite true. If we remove the code which calculates the age from both solutions, we will get something like 0.1653790473938. That means the overhead for SQL is 0.025 sec, while for PHP it is 1.038 sec. So SQL is 40 times faster in this test.

Note: There are faster ways to calculate the age in PHP. For example

$d = date('Y-m-d');
$row->age = substr($d, 0, 4) - substr($row->dob, 0, 4) - (substr($row->dob, 5) > substr($d, 5) ? 1 : 0);

is like four times faster - while date('Y-m-d') consumes more than 80% of the time. If you find a way to avoid any date function, you might get close to the performance of MySQL.



Related Topics



Leave a reply



Submit