Class BlastXMLParser

  • All Implemented Interfaces:
    StAXContentHandler

    public class BlastXMLParser
    extends StAXContentHandlerBase
    This class parses NCBI Blast XML output.

    It has two modes:- i) single output document mode: this takes a document containing a single BlastOutput element and parses it. This is generated when a single query is searched against a sequence database.

    ii) multiple query document mode: unfortunately, NCBI BLAST concatenates the results of multiple searches in one file. This leads to an ill-formed document that violates every XML format known to the human race and other nearby civilisations. This parser will take a bowdlerised version of this output that is wrapped in a blast_aggregate element.

    The massaged form is generated by stripping the XML element and DOCTYPE elements and wrapping all the classes in a single blast_aggregate element. In Linux, this can be done with:-

     #!/bin/sh
     # Converts a Blast XML output to something vaguely well-formed
     # for parsing.
     # Use: blast_aggregate  
    
     # strips all <?xml> and <!DOCTYPE> tags
     # encapsulates the multiple <BlastOutput> elements into <blast_aggregator>
    
     sed '/>?xml/d' $1 | sed '/<!DOCTYPE/d' | sed '1i\
     <blast_aggregate>
     $a\
     </blast_aggregate>' > $2
    
    Author:
    David Huen
    • Field Detail

      • staxenv

        public org.biojava.bio.program.sax.blastxml.StAXFeatureHandler staxenv
        Nesting class that provides callback interfaces to nested class
    • Method Detail

      • setContentHandler

        public void setContentHandler​(org.xml.sax.ContentHandler listener)
        sets the ContentHandler for this object
      • startElement

        public void startElement​(java.lang.String nsURI,
                                 java.lang.String localName,
                                 java.lang.String qName,
                                 org.xml.sax.Attributes attrs,
                                 DelegationManager dm)
                          throws org.xml.sax.SAXException
        we override the superclass startElement method so we can determine the the start tag type and use it to set up delegation for the superclass.
        Specified by:
        startElement in interface StAXContentHandler
        Parameters:
        nsURI - Description of the Parameter
        localName - Description of the Parameter
        qName - Description of the Parameter
        attrs - Description of the Parameter
        dm - Description of the Parameter
        Throws:
        org.xml.sax.SAXException - Description of the Exception
      • endElementHandler

        public void endElementHandler​(java.lang.String nsURI,
                                      java.lang.String localName,
                                      java.lang.String qName,
                                      StAXContentHandler handler)
                               throws org.xml.sax.SAXException
        Throws:
        org.xml.sax.SAXException
      • addHandler

        protected void addHandler​(ElementRecognizer rec,
                                  org.biojava.bio.program.sax.blastxml.StAXHandlerFactory handler)
        Adds a feature to the Handler attribute of the StAXFeatureHandler object
        Parameters:
        rec - The feature to be added to the Handler attribute
        handler - The feature to be added to the Handler attribute
      • getListener

        public org.xml.sax.ContentHandler getListener()
        get the SeqIOListener for this parser
      • endElement

        public void endElement​(java.lang.String nsURI,
                               java.lang.String localName,
                               java.lang.String qName,
                               StAXContentHandler handler)
                        throws org.xml.sax.SAXException
        Handles basic exit processing.
        Specified by:
        endElement in interface StAXContentHandler
        Overrides:
        endElement in class StAXContentHandlerBase
        Parameters:
        nsURI - Description of the Parameter
        localName - Description of the Parameter
        qName - Description of the Parameter
        handler - Description of the Parameter
        Throws:
        org.xml.sax.SAXException - Description of the Exception