[oclug] A set of Linux/XML/database/Python questions
Marcin Kolbuszewski
marcink at magma.ca
Sun Feb 13 17:06:37 EST 2005
Hello,
I've just registered, and hope that this forum is appropriate for the questions I have.
If not, please genetly tell me to go away...
Essentially, I want to set a largish database, that will be populated from XML files,
will have a web interface to put XML in, take XML out, and do some searching. I'd like as
much design as possible to be 'automatic' and if needs to be manual, then in Python,
because that is something I am familiar with.
Naturally I do not want to try to unlock open door, so would appreciate your pointers and
advice. Im some sense I know how to do what I want to do at the low level, i.e how to
parse XML (SAX, DOM), how to build SQL, how to link to database from Python, how to write
HTML, but there must be better ways :-)
Below I put a description of the problem:
First I will have approx 100,000 million XML files that look more or less like this, Each
contains roughly a dozen '<inner'> tags. So the total number
of '<outerelement>' 'objects' will be 100,000. In the final version I'd like to try with
roughly a million...
<outerelement>
<outer1>AA</outer1>
<outer2>BB</outer2>
...
<outer10>CC</outer10>
<data>10 kilobytes ascii here</data>
<inner>
<inner1>a</inner1>
<inner2>b</inner2>
<inner3>c</inner3>
</inner>
<inner>
<inner1>aa</inner1>
<inner2>bbb</inner2>
<inner3>ccc</inner3>
</inner>
...
<inner>
<inner1>aaaaaaaa</inner1>
<inner2>bbbbbbbb</inner2>
<inner3>cccccccc</inner3>
</inner>
</outerelement>
I need to setup a database to hold it. The <inner> fields and <outer?> have
to be searchable, '<data>' free text searchable, but only after narrowing the
amount of data to a sensible amount (<500). Performance is secondary.
So, I need something to: suggest a database schema, possibly create database, parse XML,
create SQL, insert into database, help create a web search interface, with the least
amount of effort.
If I were to do it myself I'd either pay Oracle to do it - or reinvent the wheel and:
parse the files using SAX or DOM, manually code creation of SQL, open DB from Python, run
SQL etc.....
So, I'd appreciate pointers to resources that would help me do it as painless as possible.
Cheers,
Marcin Kolbuszewski Email: marcink at magma.ca
More information about the OCLUG
mailing list