Class CollectIndependentReplicateMetrics
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.analysis.replicates.CollectIndependentReplicateMetrics
-
@DocumentedFeature @ExperimentalFeature public class CollectIndependentReplicateMetrics extends CommandLineProgram
A CLP that, given a BAM and a VCF with genotypes of the same sample, estimates the rate of independent replication of reads within the bam. That is, it estimates the fraction of the reads which look like duplicates (in the MarkDuplicates sense of the word) but are actually independent observations of the data. In the presence of Unique Molecular Identifiers (UMIs), various metrics are collected regarding the utility of the UMI's for the purpose of increasing coverage.The estimation is based on duplicate-sets of size 2 and 3 and gives separate estimates from each. The assumption is that the duplication rate (biological or otherwise) is independent of the duplicate-set size. A significant difference between the two rates may be an indication that this assumption is incorrect.
The duplicate sets are found using the mate-cigar tag (MC) which is added by
MergeBamAlignment
, orFixMateInformation
. This program will not work without the MC tag.Explanation of the calculation behind the estimation can be found in the
IndependentReplicateMetric
class.The calculation Assumes a diploid organism (more accurately, assumes that only two alleles can appear at a HET site and that these two alleles will appear at equal probabilities. It requires as input a VCF with genotypes for the sample in question. NOTE: This class is very much in alpha stage, and still under heavy development (feel free to join!)
-
-
Field Summary
Fields Modifier and Type Field Description String
BARCODE_BQ
String
BARCODE_TAG
boolean
FILTER_UNPAIRED_READS
File
INPUT
File
MATRIX_OUTPUT
Integer
MINIMUM_BARCODE_BQ
Integer
MINIMUM_BQ
Integer
MINIMUM_GQ
Integer
MINIMUM_MQ
File
OUTPUT
int
PROGRESS_STEP_INTERVAL
String
SAMPLE
Integer
STOP_AFTER
File
VCF
-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, SYNTAX_TRANSITION_URL, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description CollectIndependentReplicateMetrics()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected int
doWork()
Do the work after command line has been parsed.-
Methods inherited from class picard.cmdline.CommandLineProgram
checkRInstallation, customCommandLineValidation, getCommandLine, getCommandLineParser, getCommandLineParserForArgs, getDefaultHeaders, getFaqLink, getMetricsFile, getPGRecord, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
INPUT
@Argument(shortName="I", doc="Input (indexed) BAM/CRAM file.") public File INPUT
-
OUTPUT
@Argument(shortName="O", doc="Write metrics to this file") public File OUTPUT
-
MATRIX_OUTPUT
@Argument(shortName="MO", doc="Write the confusion matrix (of UMIs) to this file", optional=true) public File MATRIX_OUTPUT
-
VCF
@Argument(shortName="V", doc="Input VCF file") public File VCF
-
MINIMUM_GQ
@Argument(shortName="GQ", doc="minimal value for the GQ field in the VCF to use variant site.", optional=true) public Integer MINIMUM_GQ
-
MINIMUM_MQ
@Argument(shortName="MQ", doc="minimal value for the mapping quality of the reads to be used in the estimation.", optional=true) public Integer MINIMUM_MQ
-
MINIMUM_BQ
@Argument(shortName="BQ", doc="minimal value for the base quality of a base to be used in the estimation.", optional=true) public Integer MINIMUM_BQ
-
SAMPLE
@Argument(shortName="ALIAS", doc="Name of sample to look at in VCF. Can be omitted if VCF contains only one sample.", optional=true) public String SAMPLE
-
STOP_AFTER
@Argument(doc="Number of sets to examine before stopping.", optional=true) public Integer STOP_AFTER
-
BARCODE_TAG
@Argument(doc="Barcode SAM tag.", optional=true) public String BARCODE_TAG
-
BARCODE_BQ
@Argument(doc="Barcode Quality SAM tag.", optional=true) public String BARCODE_BQ
-
MINIMUM_BARCODE_BQ
@Argument(shortName="MBQ", doc="minimal value for the base quality of all the bases in a molecular barcode, for it to be used.", optional=true) public Integer MINIMUM_BARCODE_BQ
-
FILTER_UNPAIRED_READS
@Argument(shortName="FUR", doc="Whether to filter unpaired reads from the input.", optional=true) public boolean FILTER_UNPAIRED_READS
-
PROGRESS_STEP_INTERVAL
@Argument(fullName="PROGRESS_STEP_INTERVAL", doc="The interval between which progress will be displayed.", optional=true) public int PROGRESS_STEP_INTERVAL
-
-
Method Detail
-
doWork
protected int doWork()
Description copied from class:CommandLineProgram
Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWork
in classCommandLineProgram
- Returns:
- program exit status.
-
-