PyConcat package¶
Submodules¶
PyConcat.Extractor module¶
PyConcat.Graphing module¶
-
PyConcat.Graphing.
buildNetworkGraph
(costMatrix)¶
-
PyConcat.Graphing.
createD3Diagram
(costMatrix, outputPath)¶
PyConcat.HMM module¶
-
class
PyConcat.HMM.
HMM
(targetFeatures, corpusFeatures)¶ Bases:
object
A modified HMM from here: http://www.langmead-lab.org/teaching-materials/
-
viterbi
()¶ - Given sequence of emissions, return the most probable path
- along with log2 of its probability. Just like viterbi(...) but in log2 domain.
Returns: The optimal path
-
PyConcat.UnitSelection module¶
-
PyConcat.UnitSelection.
computeDistanceMatrix
(matrixA, matrixB)¶ Compute the distance matrix quickly with cdist
Parameters: - matrixA – a 2D matrix
- matrixB – a 2D matrix
Returns: a distance matrix between matrixA and matrixB
-
PyConcat.UnitSelection.
computeDistanceWithWeights
(targetFeatures, corpusFeatures)¶ Perform distance computing with a different set of weights for the target and the corpus
Need to figure out a more flexible way of doing this.
Parameters: - targetFeatures –
- corpusFeatures –
Returns: the weighted distance matrices
-
PyConcat.UnitSelection.
fixDistanceMatrix
(mat, type='min')¶ Replace identical positions in the distance matrix with the max or min distance
Parameters: - mat – the matrix to fix
- type – whether you want to replace all values by the min or max
Returns: the fixed distance matrix
-
PyConcat.UnitSelection.
kdTree
(targetFeatures, corpusFeatures)¶ Faster than linearSearch
Parameters: - targetFeatures –
- corpusFeatures –
Returns: the best sequence
-
PyConcat.UnitSelection.
linearSearch
(targetFeatures, corpusFeatures)¶ Brute force linear search, made a bit easier with python cdist to precompute matrices
Parameters: - targetFeatures –
- corpusFeatures –
Returns: return the best sequence using brute force
-
PyConcat.UnitSelection.
normalise
(array, method)¶ Normalise the arrays using Min/Max or Standard Deviation
Parameters: - array – the array to normalise
- method – to normalise or standardise
Returns: the normalised array
-
PyConcat.UnitSelection.
unitSelection
(targetFeatures, corpusFeatures, method='kdtree', normalise='MinMax', topK=30)¶ Optionally normalise and use one of the methods to return a sequence of indices
Parameters: - targetFeatures –
- corpusFeatures –
- method – linearSearch, kdTree, viterbi, kViterbiExhaustive, kViterbiParallel, kViterbiGraph
- normalise – normalisation method
- topK – the number of paths to return (if using k-Best decoding)
Returns: the sequence path(s)
-
PyConcat.UnitSelection.
viterbiOld
(obs, states)¶ Modified version of Wikipedia Viterbi, adjusted for using costs
Parameters: - obs – the target features
- states – the corpus features
Returns: the optimal state sequence