Package picard.illumina.parser
Class IlluminaDataProviderFactory
- java.lang.Object
-
- picard.illumina.parser.IlluminaDataProviderFactory
-
public class IlluminaDataProviderFactory extends Object
IlluminaDataProviderFactory accepts options for parsing Illumina data files for a lane and creates an IlluminaDataProvider, an iterator over the ClusterData for that lane, which utilizes these options. Note: Since we tend to use IlluminaDataProviderFactory in multithreaded environments (e.g. we call makeDataProvider in a different thread per tile in IlluminaBasecallsToSam). I've made it essentially immutable. makeDataProvider/getTiles are now idempotent (well as far as IlluminaDataProviderFactory is concerned, many file handles and other things are opened when makeDataProvider is called). We may in the future want dataTypes to be provided to the makeDataProvider factory methods so configuration is not done multiple times for the same basecallDirectory in client code.
-
-
Field Summary
Fields Modifier and Type Field Description protected Map<IlluminaFileUtil.SupportedIlluminaFormat,Set<IlluminaDataType>>
formatToDataTypes
A Map of file formats to the dataTypes they will provide for this run.
-
Constructor Summary
Constructors Constructor Description IlluminaDataProviderFactory(File basecallDirectory, int lane, ReadStructure readStructure, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, Set<IlluminaDataType> dataTypes)
Create factory with the specified options, one that favors using QSeqs over all other filesIlluminaDataProviderFactory(File basecallDirectory, File barcodesDirectory, int lane, ReadStructure readStructure, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, Set<IlluminaDataType> dataTypes)
Create factory with the specified options, one that favors using QSeqs over all other files
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static Map<IlluminaFileUtil.SupportedIlluminaFormat,Set<IlluminaDataType>>
determineFormats(Set<IlluminaDataType> requestedDataTypes, IlluminaFileUtil fileUtil)
For all requestedDataTypes return a map of file format to set of provided data types that covers as many requestedDataTypes as possible and chooses the most preferred available formats possiblestatic IlluminaFileUtil.SupportedIlluminaFormat
findPreferredFormat(IlluminaDataType dt, IlluminaFileUtil fileUtil)
Given a data type find the most preferred file format even if files are not availablestatic Set<IlluminaDataType>
findUnmatchedTypes(Set<IlluminaDataType> requestedDataTypes, Map<IlluminaFileUtil.SupportedIlluminaFormat,Set<IlluminaDataType>> formatToMatchedTypes)
Given a set of formats to data types they provide, find any requested data types that do not have a format associated with them and return themList<Integer>
getAvailableTiles()
Return the list of tiles available for this flowcell and lane.ReadStructure
getOutputReadStructure()
Sometimes (in the case of skipped reads) the logical read structure of the output cluster data is different from the input readStructureBaseIlluminaDataProvider
makeDataProvider()
BaseIlluminaDataProvider
makeDataProvider(Integer requestedTile)
BaseIlluminaDataProvider
makeDataProvider(List<Integer> requestedTiles)
Call this method to create a ClusterData iterator over the specified tiles.void
setApplyEamssFiltering(boolean applyEamssFiltering)
Sets whether or not EAMSS filtering will be applied if parsing BCL files for bases and quality scores.
-
-
-
Field Detail
-
formatToDataTypes
protected final Map<IlluminaFileUtil.SupportedIlluminaFormat,Set<IlluminaDataType>> formatToDataTypes
A Map of file formats to the dataTypes they will provide for this run.
-
-
Constructor Detail
-
IlluminaDataProviderFactory
public IlluminaDataProviderFactory(File basecallDirectory, int lane, ReadStructure readStructure, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, Set<IlluminaDataType> dataTypes)
Create factory with the specified options, one that favors using QSeqs over all other files- Parameters:
basecallDirectory
- The baseCalls directory of a complete Illumina directory. Files are found by searching relative to this folder (some of them higher up in the directory tree).lane
- Which lane to iterate over.readStructure
- The read structure to which output clusters will conform. When not using QSeqs, EAMSS masking(see BclParser) is run on individual reads as found in the readStructure, if the readStructure specified does not match the readStructure implied by the sequencer's output than the quality scores output may differ than what would be found in a run's QSeq filesdataTypes
- Which data types to read
-
IlluminaDataProviderFactory
public IlluminaDataProviderFactory(File basecallDirectory, File barcodesDirectory, int lane, ReadStructure readStructure, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, Set<IlluminaDataType> dataTypes)
Create factory with the specified options, one that favors using QSeqs over all other files- Parameters:
basecallDirectory
- The baseCalls directory of a complete Illumina directory. Files are found by searching relative to this folder (some of them higher up in the directory tree).barcodesDirectory
- The barcodesDirectory with barcode files extracted by 'ExtractIlluminaBarcodes'. This will be set to `basecallsDirectory` if null.lane
- Which lane to iterate over.readStructure
- The read structure to which output clusters will conform. When not using QSeqs, EAMSS masking(see BclParser) is run on individual reads as found in the readStructure, if the readStructure specified does not match the readStructure implied by the sequencer's output than the quality scores output may differ than what would be found in a run's QSeq filesbclQualityEvaluationStrategy
- The basecall quality evaluation strategy that is applyed to decoded base calls.dataTypes
- Which data types to read
-
-
Method Detail
-
getOutputReadStructure
public ReadStructure getOutputReadStructure()
Sometimes (in the case of skipped reads) the logical read structure of the output cluster data is different from the input readStructure- Returns:
- The ReadStructure describing the output cluster data
-
getAvailableTiles
public List<Integer> getAvailableTiles()
Return the list of tiles available for this flowcell and lane. These are in ascending numerical order.- Returns:
- List of all tiles available for this flowcell and lane.
-
setApplyEamssFiltering
public void setApplyEamssFiltering(boolean applyEamssFiltering)
Sets whether or not EAMSS filtering will be applied if parsing BCL files for bases and quality scores.
-
makeDataProvider
public BaseIlluminaDataProvider makeDataProvider()
-
makeDataProvider
public BaseIlluminaDataProvider makeDataProvider(Integer requestedTile)
-
makeDataProvider
public BaseIlluminaDataProvider makeDataProvider(List<Integer> requestedTiles)
Call this method to create a ClusterData iterator over the specified tiles.- Returns:
- An iterator for reading the Illumina basecall output for the lane specified in the constructor.
-
findUnmatchedTypes
public static Set<IlluminaDataType> findUnmatchedTypes(Set<IlluminaDataType> requestedDataTypes, Map<IlluminaFileUtil.SupportedIlluminaFormat,Set<IlluminaDataType>> formatToMatchedTypes)
Given a set of formats to data types they provide, find any requested data types that do not have a format associated with them and return them- Parameters:
requestedDataTypes
- Data types that need to be providedformatToMatchedTypes
- A map of file formats to data types that will support them- Returns:
- The data types that go unsupported by the formats found in formatToMatchedTypes
-
determineFormats
public static Map<IlluminaFileUtil.SupportedIlluminaFormat,Set<IlluminaDataType>> determineFormats(Set<IlluminaDataType> requestedDataTypes, IlluminaFileUtil fileUtil)
For all requestedDataTypes return a map of file format to set of provided data types that covers as many requestedDataTypes as possible and chooses the most preferred available formats possible- Parameters:
requestedDataTypes
- Data types to be providedfileUtil
- A file util for the lane/directory we wish to provide data for- Returns:
- A Map
-
findPreferredFormat
public static IlluminaFileUtil.SupportedIlluminaFormat findPreferredFormat(IlluminaDataType dt, IlluminaFileUtil fileUtil)
Given a data type find the most preferred file format even if files are not available- Parameters:
dt
- Type of desired datafileUtil
- Util for the lane/directory in which we will find data- Returns:
- The file format that is "most preferred" (i.e. fastest to parse/smallest in memory)
-
-