[oclug]PostgreSQL Hardware

Brad Barnett bb at L8R.net
Sat Oct 26 20:15:32 EDT 2002


On Sat, 26 Oct 2002 19:05:20 -0400
Rod Giffin <rod at giffinscientific.com> wrote:

> On Saturday 26 October 2002 17:13, Brad Barnett wrote:
> > > Anyway, Rod is right. This does not look like a huge amount of data
> > > for the hardware being dedicated to the task. The important thing is
> > > that
> > 
> > Again, I find it amusing when you have no idea how often this server
> > is being read from.  What if these 700 clients reading are always
> > reading at once, and constantly pulling down massive records.. up to
> > 70k of them?
> 
> That would be a design flaw in the applicaiton.  70 thousand records is
> a report, not a transaction.  See final point below.

What you call 70 thousand records does not change what it is ;)

> 
> > Pretending to be able to guess what our friend needs, without enough
> > data to be able to guess at the load is just plain silly.
> 
> The fact of the matter is, the web server will experience loading issues
> before the database server will - it is simply doing more work.  This is
> actually a purposful design feature of n-tier client/server technologies
> like web application servers.  I am not saying excess loading will
> happen in this case - I simply do not have enough information - just
> that server loading will happen on the middle tier before it occurs on
> either end.  It is designed to work that way.

I am glad that you at least quantify that statement above.  I can think of
many ways that the web server would _not_ run out of resources before the
database server.  Again, and this is the entire point I am stressing here,
we don't even know what this pair will be used for.  The entire _purpose_
of many of the features in databases today is to take the load off of the
web server.  An example is the simple WHERE statement you list below. 
While a minor example of what I am talking about, you obviously would not
get an entire column of data, and then use PHP or PERL to perform the same
function that your WHERE could do.  Offload that task to the database.

Most of what a database DOES is to remove such a load from the client, as
well as reduce amount of data transfered.  The client can even (as you
know) program their own functions into most database, including MySQL and
PostgresSQL now, thus removing the load from the client.

For you to indicate that the web server is going to run out of load first
is not sound reasoning, not with the lack of information we have.  On top
of that, databases are _very_ efficient, as long as the queries are
correctly written, in doing things _with_ their data.

In an ideal situation, the web server will have a lower load than the
database server.

> 
> > > the columns that appear in the "WHERE" clauses in your SQL
> > > statements be properly indexed, and even there there are potential
> > > tradeoffs in terms of faster reads vs. slower writes. But even the
> > > worst-case scenario you proposed should not be catastrophic.
> >
> > Again, you simply have no idea until you know about the frequency of
> > the writes and reads, the amount of data transfered with each
> > operation on average, and the consistancy of the requests.
> 
> No actually there is a general rule of thumb at play here though. 
> Resource savings on the order of 1000 fold or better are not uncommon
> from performing two simple database tasks - I say simple, except you
> have to be an expert in the data, and what the data is being used for in
> order to (1)normalize the database and (2) index the data correctly.  It
> is not usually an intuitive exercise and, generally this is done
> incorrectly.

Naturally indexing is important.  This doesn't take away from the fact
that the more ram you have in a box, the more the database will cache the
indexes, as well as the responses to queries.

> 
> However, just for fun - you contradicted yourself twice in your message.
>  
> First of all, it was you who suggested increasing the amount of RAM in
> the database server - in fact you suggested that at the expense of the
> web server, which by design is supposed to be taking more of the load
> than the database server will.  It was also you who suggested db caching
> without knowing how dynamic the data was.  
> 

You've got it backwords.  The database is supposed to take more of the
load than the web server.  The database is more efficient at taking data
based requests, and processing those requests.  

As for my suggestions, they were just that.  I made comments about some of
the things you can do to optimised a database.  Whether or not those
things are perfect for this particular application is moot.  Heck, the
only thing I said I would do (no matter what) is to move some of the ram
from the web server to the database server.  The rest were, again,
examples of some things I had done in the past.  I'm sure our friend can
easily decide if these things are best for him or not.


> Besides, db caching is usually query cashing, not table caching like you
> suggested.  It's far more efficient to cache the result of common
> -static- queries (say, language table queries.)  It is problematic,
> risky, inefficient and resource intensive to cache tables that are also
> stored on the disk. (However, if you do implement table caching, that
> would explain why you think the database would be under more of a load
> than the web server!)

Heh.  The best thing to have cached are the indexes, which are often the
first thing a database will leave cached in ram.  Again, that depends on
the type of queries.  If there are a lot of simular queries, index caching
won't help as much as query caching.

However. table caching is not inefficient, it once again depends on the
resources and what you are going to do with the data.  MySQL, for example,
will often cache the contents of a table in ram if it can.  I've seen it
happen.  This isn't inefficent, it's logical, and it makes sense if you
have extremely dynamic queries on a table that fits in ram.  The database
can obviously serve up more data without the head thrashing associated
with a hard drive, or even a raid.

Of course, once the raid becomes powerful enough, with enough drives and
speed, it does become less adventageous to have a table in ram.  A two
disk SCSI raid may not get to that point, however, even with the drive
speed listed above, if there are primarily random queries that are made.

You seem to think yourself quite the authority of databases.  Nothing
wrong with that, but I'd love to have a design contest with you sometime,
from the raw data stage to hardware.  I have a feeling I'd beat the pants
off of you, but this is just friendly compeition speaking.  Nothing
personal ;)

You are aware, btw, that google doesn't even store their data on hard
disks, but has everything stored in ram?  They don't care if their data is
lost, because in a worst case scenario, if all of their replicated
machines with the same data are lost at the same time, they will snag that
data back within 30 days.

You don't think google has a flawed policy, do you?  Maybe you should talk
to them.


> 
> Rod.












More information about the OCLUG mailing list