Simple, Fast SQL Queries for Flat Files

Simple, fast SQL queries for flat files

I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniqs "-f" option, which I had been unaware of:

cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1

The awk portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.

I'm still holding out hope for a SQL-like tool, though.

Random results. Which way is faster SQL-query or a flat file?

If your table has an incremental unique ID then just

SELECT * FROM table WHERE id = $r

with $r your unique number got by the suggestions above.

What is faster, flat files or a MySQL RAM database?

Flat files? Nooooooo...

Use a good DB engine (MySQL, SQLite, etc). Then, for maximum performance, use memcached to cache content.


In this way, you have the ease and reliability of sharing data between processes using proven server software that handles concurrency, etc... But you get the speed of having your data cached.

Keep in mind a couple things:

  1. MySQL has a query cache. If you are issuing the same queries repeteadly, you can gain a lot of performance without adding a caching layer.
  2. MySQL is really fast anyway. Have you load-tested to demonstrate it is not fast enough?

How can I make MySQL as fast as a flat file in this scenario?

Telling MySQL to ignore the primary (and only) index speeds both queries up.

For InnoDB it saves a second the queries. On MyISAM it keeps the query time consistently at the minimum time seen.

The cange is to add

ignore index(`PRIMARY`)   

after the tablename in the query.

EDIT:
I appreciate all the input but much of it was of the form "you shouldn't do this", "do something completely different", etc. None of it addressed the question at hand:

"So what's the best way I can have
MySQL behave like itself most of the
time, yet win over a flat file in the
above scenario?"

So far, the solution I have posted: use MyISAM and ignore the index, seems to be closest to flat file performance for this use case, while still giving me a database when I need the database.

Is it faster to access data from files or a database server?

I'll add to the it depends crowd.

This is the kind of question that has no generic answer but is heavily dependent on the situation at hand. I even recently moved some data from a SQL database to a flat file system because the overhead of the DB, combined with some DB connection reliability issues, made using flat files a better choice.

Some questions I would ask myself when making the choice include:

  1. How am I consuming the data? For example will I just be reading from the beginning to the end rows in the order entered? Or will I be searching for rows that match multiple criteria?

  2. How often will I be accessing the data during one program execution? Will I go once to get all books with Salinger as the author or will I go several times to get several different authors? Will I go more than once for several different criteria?

  3. How will I be adding data? Can I just append a row to the end and that's perfect for my retrieval or will it need to be resorted?

  4. How logical will the code look in six months? I emphasize this because I think this is too often forgotten in designing things (not just code, this hobby horse is actually from my days as a Navy mechanic cursing mechanical engineers). In six months when I have to maintain your code (or you do after working another project) which way of storing and retrieving data will make more sense. If going from flat files to a DB results in a 1% efficiency improvement but adds a week of figuring things out when you have to update the code have you really improved things.



Related Topics



Leave a reply



Submit