============================= The sgmlop accelerator module ============================= sgmlop contains an optimized SGML/XML parser, designed as an add-on to the sgmllib/htmllib and xmllib modules shipped with Python 1.5. using empty callbacks, this driver is about 6 times faster than the original xmllib implementation. when using sgmlop directly, it can be more than 50 times faster. for more information on benchmarking sgmlop, see below. Enjoy /F fredrik@pythonware.com http://www.pythonware.com -------------------------------------------------------------------- Copyright (c) 1998 by Secret Labs AB. Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted. This software is provided as is. -------------------------------------------------------------------- release info ------------ This is the third public release. Changes include: - added a starttag attribute parser written in C. this gives a considerable speedup on files using lots of tag attributes - the callback object can now have an sgmllib/xmllib interface (finish/handle) *or* a saxlib interface (see saxhack.py for an example). contents -------- README this file sgmllib.py a drop-in replacement for the sgmllib.py module distributed with Python 1.5 xmllib.py a drop-in replacement for the xmllib.py module distributed with Python 1.5 saxhack.py illustrates how to implement the SAX DocumentHandler interface directly with native sgmlop. this is over 30 times faster than a corresponding parser based on the original xmllib. sgmlop.dll a precompiled version for python 1.5 on win32 sgmlop.c accelerator source code sgmlop.mak makefile for MSVC++ 5.0 generated by opal/pymake. make sure to change the directory names before you use it on your own machine. bench*.py various test files and benchmarks test*.py benchmarks ---------- benchmarking the sgmlop parser is non-trivial; if you don't install any callbacks, it's some 300 times faster than the original xmllib (it can parse more than 10 MB/s on a fast Pentium II). this means that in a typical test, far more time is lost on the Python method call overhead than on the parsing proper. my earlier benchmarks used a 'collecting' parser, which stored all tags and elements in a list. with that setup, sgmlop is roughly 5 times faster than the original implementation. the benchxml.py script provided with this release uses empty parsers instead (that is, all callbacks exists, but they include only a 'pass' operation), in order to measure the parser and Python call overhead only. here's a typical test run (with the time for the original xmllib implementation set to 1): parser time -------------------------------------------------------------------- slow xmllib 1.0 fast xmllib 0.156 (6.4x) sgmlop dummy 0.019 (53.5x) sgmlop null 0.003 (297.8x) the null time is obtained by running the parser without any callbacks installed.