[oclug]PostgreSQL Hardware
Rod Giffin
rod at giffinscientific.com
Sun Oct 27 13:27:23 EST 2002
On Sunday 27 October 2002 07:23, Brad Barnett wrote:
> On Sun, 27 Oct 2002 01:48:35 -0400
>
> Rod Giffin <rod at giffinscientific.com> wrote:
> "Did that." "What came first." I'm glad you agree with me, because these
> days the database server can often hit the wall first. Of course, as I've
> said all along, this depends on the circumstances. It is quite possible
> to write code on a web server that will hit that bottleneck first.
The thing is, I don't agree with you. It's more likely that the web
application server will experience loading before the database. Yes it's
possible that it happens in reverse, but the fact of the matter is that is
usually a symptom of poor application design, and/or poor database design,
and/or perhaps underpowered database hardware.
Since I'm not assuming that the system architect is an idiot, I'm assuming
that the design of both the database and the applicaiton are appropriate for
the requirement, and that the tables are properly indexed. The original
question also assumed this, in that the original question was essentially "Is
this hardware configuration enough to handle this load"
The answer is, on the database: Given the data we currently have, chances are
you're in overkill mode there, but don't change anything at the moment
because it's certainly better to have extra horsepower than not enough.
The answer for the web applicaiton server: We don't know enough about the
application. However, typically you can use several web application servers
to connect to one database, the technology is deisgned this way. Also,
typically, the load on the web server is higher than the load on the
database, simply by virtue of the fact that the application server is doing
more work, to wit:
The database server maintains part of the data layer.
The applicaiton server on the other hand is maintaining the rest of the data
layer as well as all of the business logic, the application logic, and the
presentation layer, as well as maintaining a pool of dynamic connections to
the database, multiple connections to the up to 700 clients that are
simultaneously connected, and all of their session management information.
While the database may implement a security layer, it's only user is the web
server. The application server on the other hand may be implementing a
security layer for 700 simultaneous users - in this case.
Every time the users so something in the applicaiton, the application server
has to respond to it. It has some work to do in all of it's layers. The
database, on the other hand only has to respond when a user wants to read or
write data from or to the database. This might happen on every page but,
don't forget that the same data that leaves the database has to be processed
by the application server as well. The data has to be parsed, formatted, and
validated before it can be sent on to the client, or to the database. In any
case, this happens inside of the application server, not the client, and not
in the database.
> > Question for you... what the hell is the "raw data stage" in software
> > design?
>
> I don't know. I was referring to taking a chunk of raw data, and then
> building the database for it, not about software design. You've mentioned
> in the past that you need to design a database _for_ the data, so you
> obviously what I'm talking about in that respect.
Well, you've got the software design process backwards... well maybe not quite
backwards but you've got the cart before the horse so to speak. Other than
reference data which you might already have (although it's not a prerequisite
for database application design) the purpose of building the application is
to gather, store, retrieve and process data. You don't normally start out
with data, you end up with data. The reason you might need a database for
the data is so you have some place convienient to put it once you get it -
and allow you to access that data again.
It's also entirely possible that you can use flat files in your application -
which would almost negate your requirement for a database... although the
indexing capabilities of the database software can come awfully handy there
too.
You know I've never met Milan in person that I know of, but I have seen some
of the results of his work. I can tell you that he knows something of what
he does for a living. You'd do well to sit back and learn something from
people who do this sort of stuff for a living, then maybe someday you can sit
back and say, "I'm not an authority, I just do what I do."
> > > You are aware, btw, that google doesn't even store their data on hard
> > > disks, but has everything stored in ram? They don't care if their
> > > data is lost, because in a worst case scenario, if all of their
> > > replicated machines with the same data are lost at the same time, they
> > > will snag that data back within 30 days.
> >
> > Google's cluster architecture is a rather well documented and not very
> > fancy affair. They have at the moment something over 10,000 single CPU
> > systems in thier cluster, with between 256Mb and 1Gb of RAM, and two 40
> > or 80 Gb (Maxtor seems to be their favourite) hard disks. Does that
> > sound like a RAM based system to you? Oh. They also run RedHat Linux.
>
> This is old news. Last I heard they were up to 18k machines. Not
> surprising with all of the extra features they've added lately, and the
> doubling in the number of sites they have indexed, over the years.
I got that info from Google press releases and interviews done within the last
18 months. In May 2001, they had, according to this information 8,000
servers. The most recent hardware information from their press releases and
other information on their web site says 10,000 servers. Obviously they're
growing.
I think I remember an article that said the Google search appliance (sort of a
small scale internal google for large enterprises) uses a ram disk to serve
it's index when you search, but a ram disk is not a disk cache, and the
appliance is not Google.
And in case you didn't catch it, I do know some of the folks at Google. It
has nothing to do with being a peacock. Netscape too. And Apple,
Microsoft, Oracle, Sun, AOL. I even know a fella who workd at RedHat, got
rich and moved on to start his own business. Some of them are family (a
couple of cousins, a couple of cousin's husbands) in fact. Funny thing about
this industry is, you never know who you're going to rub shoulders with
next.
Rod.
More information about the OCLUG
mailing list