Package picard.illumina
Class ExtractBarcodesProgram
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.illumina.ExtractBarcodesProgram
-
- Direct Known Subclasses:
ExtractIlluminaBarcodes
,IlluminaBasecallsToFastq
,IlluminaBasecallsToSam
public abstract class ExtractBarcodesProgram extends CommandLineProgram
-
-
Field Summary
Fields Modifier and Type Field Description static String
BARCODE_COLUMN
Column header for the first barcode sequence (preferred).static String
BARCODE_NAME_COLUMN
Column header for the barcode name.static Set<String>
BARCODE_PREFIXES
static String
BARCODE_SEQUENCE_COLUMN
protected Map<String,BarcodeMetric>
barcodeToMetrics
File
BASECALLS_DIR
protected BclQualityEvaluationStrategy
bclQualityEvaluationStrategy
boolean
COMPRESS_OUTPUTS
DistanceMetric
DISTANCE_MODE
File
INPUT_PARAMS_FILE
protected ReadStructure
inputReadStructure
The read structure of the actual Illumina Run, i.e.List<Integer>
LANE
static String
LIBRARY_NAME_COLUMN
Column header for the library name.int
MAX_MISMATCHES
int
MAX_NO_CALLS
File
METRICS_FILE
int
MIN_MISMATCH_DELTA
int
MINIMUM_BASE_QUALITY
int
MINIMUM_QUALITY
protected BarcodeMetric
noMatchMetric
String
READ_STRUCTURE
-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, SYNTAX_TRANSITION_URL, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description ExtractBarcodesProgram()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String[]
collectErrorMessages(List<String> messages, String[] superErrors)
protected BarcodeExtractor
createBarcodeExtractor()
protected String[]
customCommandLineValidation()
Parses all barcodes from input files and validates all barcodes are the same length and uniquestatic void
finalizeMetrics(Map<String,BarcodeMetric> barcodeToMetrics, BarcodeMetric noMatchMetric)
protected void
outputMetrics()
protected static htsjdk.samtools.util.Tuple<Map<String,BarcodeMetric>,List<String>>
parseInputFile(File inputFile, ReadStructure readStructure)
Parses any one of the following types of files: ExtractIlluminaBarcodes BARCODE_FILE IlluminaBasecallsToFastq MULTIPLEX_PARAMS IlluminaBasecallsToSam LIBRARY_PARAMS This will validate to file format as well as populate a Map of barcodes to metrics.-
Methods inherited from class picard.cmdline.CommandLineProgram
checkRInstallation, doWork, getCommandLine, getCommandLineParser, getCommandLineParserForArgs, getDefaultHeaders, getFaqLink, getMetricsFile, getPGRecord, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
DISTANCE_MODE
@Argument(doc="The distance metric that should be used to compare the barcode-reads and the provided barcodes for finding the best and second-best assignments.") public DistanceMetric DISTANCE_MODE
-
MAX_MISMATCHES
@Argument(doc="Maximum mismatches for a barcode to be considered a match.") public int MAX_MISMATCHES
-
MIN_MISMATCH_DELTA
@Argument(doc="Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match.") public int MIN_MISMATCH_DELTA
-
MAX_NO_CALLS
@Argument(doc="Maximum allowable number of no-calls in a barcode read before it is considered unmatchable.") public int MAX_NO_CALLS
-
MINIMUM_BASE_QUALITY
@Argument(shortName="Q", doc="Minimum base quality. Any barcode bases falling below this quality will be considered a mismatch even if the bases match.") public int MINIMUM_BASE_QUALITY
-
MINIMUM_QUALITY
@Argument(doc="The minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown. The default of 2 is what the Illumina\'s spec describes as the minimum, but in practice the value has been observed lower.") public int MINIMUM_QUALITY
-
LANE
@Argument(doc="Lane number. This can be specified multiple times. Reads with the same index in multiple lanes will be added to the same output file.", shortName="L") public List<Integer> LANE
-
READ_STRUCTURE
@Argument(doc="A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Sample Barcode, M for molecular barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of \"28T8M8B8S28T\" then the sequence may be split up into four reads:\n* read one with 28 cycles (bases) of template\n* read two with 8 cycles (bases) of molecular barcode (ex. unique molecular barcode)\n* read three with 8 cycles (bases) of sample barcode\n* 8 cycles (bases) skipped.\n* read four with 28 cycles (bases) of template\nThe skipped cycles would NOT be included in an output SAM/BAM file or in read groups therein.", shortName="RS") public String READ_STRUCTURE
-
COMPRESS_OUTPUTS
@Argument(shortName="GZIP", doc="Compress output FASTQ files using gzip and append a .gz extension to the file names.") public boolean COMPRESS_OUTPUTS
-
BASECALLS_DIR
@Argument(doc="The Illumina basecalls directory. ", shortName="B") public File BASECALLS_DIR
-
METRICS_FILE
@Argument(doc="Per-barcode and per-lane metrics written to this file.", shortName="M", optional=true) public File METRICS_FILE
-
INPUT_PARAMS_FILE
@Argument(doc="The input file that defines parameters for the program. This is the BARCODE_FILE for `ExtractIlluminaBarcodes` or the MULTIPLEX_PARAMS or LIBRARY_PARAMS file for `IlluminaBasecallsToFastq` or `IlluminaBasecallsToSam`", optional=true) public File INPUT_PARAMS_FILE
-
BARCODE_COLUMN
public static final String BARCODE_COLUMN
Column header for the first barcode sequence (preferred).- See Also:
- Constant Field Values
-
BARCODE_SEQUENCE_COLUMN
public static final String BARCODE_SEQUENCE_COLUMN
- See Also:
- Constant Field Values
-
BARCODE_NAME_COLUMN
public static final String BARCODE_NAME_COLUMN
Column header for the barcode name.- See Also:
- Constant Field Values
-
LIBRARY_NAME_COLUMN
public static final String LIBRARY_NAME_COLUMN
Column header for the library name.- See Also:
- Constant Field Values
-
barcodeToMetrics
protected Map<String,BarcodeMetric> barcodeToMetrics
-
bclQualityEvaluationStrategy
protected final BclQualityEvaluationStrategy bclQualityEvaluationStrategy
-
noMatchMetric
protected BarcodeMetric noMatchMetric
-
inputReadStructure
protected ReadStructure inputReadStructure
The read structure of the actual Illumina Run, i.e. the readStructure of the input data
-
-
Method Detail
-
createBarcodeExtractor
protected BarcodeExtractor createBarcodeExtractor()
-
customCommandLineValidation
protected String[] customCommandLineValidation()
Parses all barcodes from input files and validates all barcodes are the same length and unique- Overrides:
customCommandLineValidation
in classCommandLineProgram
- Returns:
- null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
-
collectErrorMessages
protected String[] collectErrorMessages(List<String> messages, String[] superErrors)
-
outputMetrics
protected void outputMetrics()
-
finalizeMetrics
public static void finalizeMetrics(Map<String,BarcodeMetric> barcodeToMetrics, BarcodeMetric noMatchMetric)
-
parseInputFile
protected static htsjdk.samtools.util.Tuple<Map<String,BarcodeMetric>,List<String>> parseInputFile(File inputFile, ReadStructure readStructure)
Parses any one of the following types of files: ExtractIlluminaBarcodes BARCODE_FILE IlluminaBasecallsToFastq MULTIPLEX_PARAMS IlluminaBasecallsToSam LIBRARY_PARAMS This will validate to file format as well as populate a Map of barcodes to metrics.- Parameters:
inputFile
- The input file that is being parsedreadStructure
- The read structure for the reads of the run
-
-