Class OpticalDuplicateFinder

    • Field Detail

      • opticalDuplicatePixelDistance

        public int opticalDuplicatePixelDistance
      • DEFAULT_OPTICAL_DUPLICATE_DISTANCE

        public static final int DEFAULT_OPTICAL_DUPLICATE_DISTANCE
        See Also:
        Constant Field Values
      • DEFAULT_BIG_DUPLICATE_SET_SIZE

        public static final int DEFAULT_BIG_DUPLICATE_SET_SIZE
        See Also:
        Constant Field Values
      • DEFAULT_MAX_DUPLICATE_SET_SIZE

        public static final int DEFAULT_MAX_DUPLICATE_SET_SIZE
        See Also:
        Constant Field Values
    • Constructor Detail

      • OpticalDuplicateFinder

        public OpticalDuplicateFinder​(String readNameRegex,
                                      int opticalDuplicatePixelDistance,
                                      htsjdk.samtools.util.Log log)
        Parameters:
        readNameRegex - see ReadNameParser.DEFAULT_READ_NAME_REGEX.
        opticalDuplicatePixelDistance - the optical duplicate pixel distance
        log - the log to which to write messages.
      • OpticalDuplicateFinder

        public OpticalDuplicateFinder​(String readNameRegex,
                                      int opticalDuplicatePixelDistance,
                                      long maxDuplicateSetSize,
                                      htsjdk.samtools.util.Log log)
        Parameters:
        readNameRegex - see ReadNameParser.DEFAULT_READ_NAME_REGEX.
        opticalDuplicatePixelDistance - the optical duplicate pixel distance
        maxDuplicateSetSize - the size of a set that is too big enough to process
        log - the log to which to write messages.
    • Method Detail

      • setBigDuplicateSetSize

        public void setBigDuplicateSetSize​(int bigDuplicateSetSize)
        Sets the size of a set that is big enough to log progress about. Defaults to 1000
        Parameters:
        bigDuplicateSetSize - the size of a set that is big enough to log progress about
      • setMaxDuplicateSetSize

        public void setMaxDuplicateSetSize​(long maxDuplicateSetSize)
        Sets the size of a set that is too big to process. Defaults to 300000
        Parameters:
        maxDuplicateSetSize - the size of a set that is too big enough to process
      • findOpticalDuplicates

        public boolean[] findOpticalDuplicates​(List<? extends PhysicalLocation> list,
                                               PhysicalLocation keeper)
        Finds which reads within the list of duplicates that are likely to be optical/co-localized duplicates of one another. Within each cluster of optical duplicates that is found, one read remains un-flagged for optical duplication and the rest are flagged as optical duplicates. The set of reads that are considered optical duplicates are indicated by returning "true" at the same index in the resulting boolean[] as the read appeared in the input list of physical locations.
        Parameters:
        list - a list of reads that are determined to be duplicates of one another
        keeper - a single PhysicalLocation that is the one being kept as non-duplicate, and thus should never be annotated as an optical duplicate. May in some cases be null, or a PhysicalLocation not contained within the list!
        Returns:
        a boolean[] of the same length as the incoming list marking which reads are optical duplicates