SGMLExtractor ============= SGMLExtractor ------------- Classes: SGMLExtractorHandle File object that strips tags and returns content from specified tags blocks. SGMLExtractor File handle decorator that scans for specified SGML tag pairs, removes any inner tags and returns the raw content. For example: handle = open( filename ) record_handle = SGMLExtractorHandle( handle, [ 'h1', ] ) would return "House that Jack built' handle = open( filename ) record_handle = SGMLExtractorHandle( handle, [ 'dt', ] ) would return 'ratcatdogcowmaiden' handle = open( filename ) record_handle = SGMLExtractorHandle( handle, [ 'dt', 'dd' ] ) would return 'rat that ate the malttcat ate the rat' etc

House that Jack Built

rat
ate the malt
cat
that ate the rat
dog
that worried the dats
cow
with crumpled horn
maiden
all forlorns