[oclug] really basic perl help
Brian's Linux Box
b_mckee at myrealbox.com
Tue Jan 7 09:46:00 EST 2003
> To the original poster: I did not get the gist that you were parsing
> HTML. Otherwise I would have let you know this little secret sooner:
> Parsing out html comments by hand is a pain in the @$$
Ya, it's looking that way isn't it...
> There are a lot of HTML parsing modules on CPAN. I suggest you check
> them out.
...
> Seems to suggest that 'HTML::Parser' is what you want to install, as it
> comes with a slew of HTML::... classes to help you pull stuff out of
> html...
...
I hear you - and if I was self hosting I would do so - but the end point of
this 'project' is a series of non-profit web pages hosted by a local ISP for
reduced rates. I know that perl will be available to me, but I don't know
what modules they currently have installed and how receptive they will be to
me asking for more. As a result I was trying to do this with the modules
normally included in perl. Maybe I'm just being stubborn.
I do understand that my matching script can blow up in certain cases, but I
don't expect any of those cases to occur. [ (famous last words :-) ]
As long as the source HTML (which is generated by another perl script I
don't control) stays relatively the same I should be ok. I believe it has
been stable for over a year now. The comment tags should always be
separated by many lines and I am searching for just the start of the comment
rather than the whole thing to reduce the chance of line wrap in the
comment. If comment and data end up on the same line at the start it will
cause me trouble. If it becomes a problem I will revisit it (with chants of
'I told you so' in my ears), but I think it's good enough for now. It's not
mission critical stuff.
- side bar - how do you define mission critical? Important enough to get
you fired if you screw it up :-)
Thanks to everybody for your comments - I've learned quite a bit just
reading the comments and examples.
Next step - download the input as required and store the output (with file
locking)
Brian
More information about the OCLUG
mailing list