[oclug]PostgreSQL Hardware
Brad Barnett
bb at L8R.net
Sat Oct 26 17:13:29 EDT 2002
On Sat, 26 Oct 2002 14:35:44 -0400
Milan Budimirovic <milan.budimirovic at sympatico.ca> wrote:
> Brad Barnett wrote:
> >
> > On Sat, 26 Oct 2002 12:04:17 -0400
> > Rod Giffin <rod at giffinscientific.com> wrote:
> >
> > > On Friday 25 October 2002 21:40, Timothy Brier wrote:
> > > So let me throw in one new question to
> > > > the mix, I have found that PostgreSQL is very fast with the
> > > > fsync() turned off. (As the manuals say it will be) What is the
> > > > opinion of having this off when you have file systems like ext3
> > > > available? Is tar or cpio a good way to do a complete backup?
> > >
> > > It depends on whether or not your application can be down for long
> > > enough to complete a manual backup. You can also do hardware
> > > mirroring, and back up the mirror drive. One of the nice things
> > > about allowing the database to back up itself is that in the event
> > > of data corruption, you can rebuild only part of the database rather
> > > than the entire table set. That is a little harder to do from a
> > > tarball.
> > >
> > > I'm sorry Timothy, I don't agree with the sentement expressed by
> > > everyone else here so far with regard to your server issues. You
> > > are not talking about a really serious load there. 600,000 new
> > > records/month is something like 1.04 new records/second over the
> > > course of an 8 hour day. Your web server log file would be growing
> > > faster than that. In fact, from a system engineering perspective,
> > > all that number should tell you is how much disk space you're going
> > > to need. It doesn't say anything about the load on the server.
> > >
> > > Rod.
> >
> > Actually Rod, no one was responding to his statement about the 600000
> > records per month _created_. He didn't post that until _after_ most
> > people responded to his initial statement. I'm still getting email
> > messages in a weird order too, so I feel for ya ;)
> >
> > People were responding to his initial query about 700 clients
> > connected at once, making queries that could be as high as 60k-70k
> > records each. He didn't specify the length of these records either.
> >
> > This also doesn't specify the frequency of the queries or the
> > consistency of them. You really shouldn't base your response above on
> > the amount of queries that may be written alone. Load is also
> > effected by read as well as write queries.
> >
> > Especially when the worst case scenario for the above is 700 people
> > reading at once, all doing queries of 70k records each, with some
> > writes thrown in. Since we don't know precisely what this is to be
> > used for, it is entirely possible that the specific application has a
> > consistent load.
>
> Not necessarily. A lot depends on how the indexing and caching is set
The above statements I made can be summarized into the following:
- you can not predict the load this server will be under by looking only
at the amount of commits/writes it will perform
- you can not predict the load this server will have unless you know more
about the specific situation, how large the queries are, and how often
they are made
How do you find this to be a "not necessarily" case?
> up. If you don't have a lot of people using the system you will end up
> with all of your data being read from disk instead of from memory,
> simply because nothing has been cached yet. A somewhat-busy server is
> often faster than one that's idling.
It depends on how randomized the queries are. I'm glad you used the word
"often", but often isn't the right word. What about servers with
terabytes of data, and random queries? What about servers with 100 megs
of ram, and a gig of data, and random queries.
There are probably just as many databases out there that fit the above
scenario, as ones that fit into the "always polling the same data" branch.
>
> Anyway, Rod is right. This does not look like a huge amount of data for
> the hardware being dedicated to the task. The important thing is that
Again, I find it amusing when you have no idea how often this server is
being read from. What if these 700 clients reading are always reading at
once, and constantly pulling down massive records.. up to 70k of them?
Pretending to be able to guess what our friend needs, without enough data
to be able to guess at the load is just plain silly.
> the columns that appear in the "WHERE" clauses in your SQL statements be
> properly indexed, and even there there are potential tradeoffs in terms
> of faster reads vs. slower writes. But even the worst-case scenario you
> proposed should not be catastrophic.
Again, you simply have no idea until you know about the frequency of the
writes and reads, the amount of data transfered with each operation on
average, and the consistancy of the requests.
More information about the OCLUG
mailing list