[oclug]PostgreSQL Hardware

Rod Giffin rod at giffinscientific.com
Sun Oct 27 13:27:23 EST 2002


On Sunday 27 October 2002 07:23, Brad Barnett wrote:
> On Sun, 27 Oct 2002 01:48:35 -0400
>
> Rod Giffin <rod at giffinscientific.com> wrote:
> "Did that."  "What came first."  I'm glad you agree with me, because these
> days the database server can often hit the wall first.  Of course, as I've
> said all along, this depends on the circumstances.  It is quite possible
> to write code on a web server that will hit that bottleneck first.

The thing is, I don't agree with you.  It's more likely that the web 
application server will experience loading before the database.  Yes it's 
possible that it happens in reverse, but the fact of the matter is that is 
usually a symptom of poor application design, and/or poor database design, 
and/or perhaps underpowered database hardware.
 
Since I'm not assuming that the system architect is an idiot, I'm assuming 
that the design of both the database and the applicaiton are appropriate for 
the requirement, and that the tables are properly indexed.  The original 
question also assumed this, in that the original question was essentially "Is 
this hardware configuration enough to handle this load"

The answer is, on the database: Given the data we currently have, chances are 
you're in overkill mode there, but don't change anything at the moment 
because it's certainly better to have extra horsepower than not enough.

The answer for the web applicaiton server: We don't know enough about the 
application.  However, typically you can use several web application servers 
to connect to one database, the technology is deisgned this way.  Also, 
typically, the load on the web server is higher than the load on the 
database, simply by virtue of the fact that the application server is doing 
more work, to wit:

The database server maintains part of the data layer.  
The applicaiton server on the other hand is maintaining the rest of the data 
layer as well as all of the business logic, the application logic, and the 
presentation layer, as well as maintaining a pool of  dynamic connections to 
the database, multiple connections to the up to 700 clients that are 
simultaneously connected, and all of their session management information.  
While the database may implement a security layer, it's only user is the web 
server.  The application server on the other hand may be implementing a 
security layer for 700 simultaneous users - in this case.

Every time the users so something in the applicaiton, the application server 
has to respond to it.  It has some work to do in all of it's layers.  The 
database, on the other hand only has to respond when a user wants to read or 
write data from or to the database.  This might happen on every page but, 
don't forget that the same data that leaves the database has to be processed 
by the application server as well.  The data has to be parsed, formatted, and 
validated before it can be sent on to the client, or to the database.  In any 
case, this happens inside of the application server, not the client, and not 
in the database.

> > Question for you... what the hell is the "raw data stage" in software
> > design?
>
> I don't know.  I was referring to taking a chunk of raw data, and then
> building the database for it, not about software design.  You've mentioned
> in the past that you need to design a database _for_ the data, so you
> obviously what I'm talking about in that respect.

Well, you've got the software design process backwards... well maybe not quite 
backwards but you've got the cart before the horse so to speak.  Other than 
reference data which you might already have (although it's not a prerequisite 
for database application design) the purpose of building the application is 
to gather, store, retrieve and process data.  You don't normally start out 
with data, you end up with data.  The reason you might need a database for 
the data is so you have some place convienient to put it once you get it - 
and allow you to access that data again.

It's also entirely possible that you can use flat files in your application - 
which would almost negate your requirement for a database... although the 
indexing capabilities of the database software can come awfully handy there 
too.  

You know I've never met Milan in person that I know of, but I have seen some 
of the results of his work.  I can tell you that he knows something of what 
he does for a living.  You'd do well to sit back and learn something from 
people who do this sort of stuff for a living, then maybe someday you can sit 
back and say, "I'm not an authority, I just do what I do."

> > > You are aware, btw, that google doesn't even store their data on hard
> > > disks, but has everything stored in ram?  They don't care if their
> > > data is lost, because in a worst case scenario, if all of their
> > > replicated machines with the same data are lost at the same time, they
> > > will snag that data back within 30 days.
> >
> > Google's cluster architecture is a rather well documented and not very
> > fancy affair.  They have at the moment something over 10,000 single CPU
> > systems in thier cluster, with between 256Mb and 1Gb of RAM, and two 40
> > or 80 Gb (Maxtor seems to be their favourite) hard disks.  Does that
> > sound like a RAM based system to you?  Oh.  They also run RedHat Linux.
>
> This is old news.  Last I heard they were up to 18k machines.  Not
> surprising with all of the extra features they've added lately, and the
> doubling in the number of sites they have indexed, over the years.

I got that info from Google press releases and interviews done within the last 
18 months.  In May 2001, they had, according to this information 8,000 
servers.  The most recent hardware information from their press releases and 
other information on their web site says 10,000 servers.  Obviously they're 
growing.

I think I remember an article that said the Google search appliance (sort of a 
small scale internal google for large enterprises) uses a ram disk to serve 
it's index when you search, but a ram disk is not a disk cache, and the 
appliance is not Google.

And in case you didn't catch it, I do know some of the folks at Google.  It 
has nothing to do with being a peacock.   Netscape too.  And Apple, 
Microsoft, Oracle, Sun, AOL.  I even know a fella who workd at RedHat, got 
rich and moved on to start his own business.  Some of them are family (a 
couple of cousins, a couple of cousin's husbands) in fact.  Funny thing about 
this industry is,  you never know who you're going to rub shoulders with 
next.

Rod.



More information about the OCLUG mailing list