PyConcat package

Submodules

PyConcat.Extractor module

PyConcat.Graphing module

PyConcat.Graphing.buildNetworkGraph(costMatrix)
PyConcat.Graphing.createD3Diagram(costMatrix, outputPath)

PyConcat.HMM module

class PyConcat.HMM.HMM(targetFeatures, corpusFeatures)

Bases: object

A modified HMM from here: http://www.langmead-lab.org/teaching-materials/

viterbi()
Given sequence of emissions, return the most probable path
along with log2 of its probability. Just like viterbi(...) but in log2 domain.
Returns:The optimal path

PyConcat.UnitSelection module

PyConcat.UnitSelection.computeDistanceMatrix(matrixA, matrixB)

Compute the distance matrix quickly with cdist

Parameters:
  • matrixA – a 2D matrix
  • matrixB – a 2D matrix
Returns:

a distance matrix between matrixA and matrixB

PyConcat.UnitSelection.computeDistanceWithWeights(targetFeatures, corpusFeatures)

Perform distance computing with a different set of weights for the target and the corpus

Need to figure out a more flexible way of doing this.

Parameters:
  • targetFeatures
  • corpusFeatures
Returns:

the weighted distance matrices

PyConcat.UnitSelection.fixDistanceMatrix(mat, type='min')

Replace identical positions in the distance matrix with the max or min distance

Parameters:
  • mat – the matrix to fix
  • type – whether you want to replace all values by the min or max
Returns:

the fixed distance matrix

PyConcat.UnitSelection.kdTree(targetFeatures, corpusFeatures)

Faster than linearSearch

Parameters:
  • targetFeatures
  • corpusFeatures
Returns:

the best sequence

PyConcat.UnitSelection.linearSearch(targetFeatures, corpusFeatures)

Brute force linear search, made a bit easier with python cdist to precompute matrices

Parameters:
  • targetFeatures
  • corpusFeatures
Returns:

return the best sequence using brute force

PyConcat.UnitSelection.normalise(array, method)

Normalise the arrays using Min/Max or Standard Deviation

Parameters:
  • array – the array to normalise
  • method – to normalise or standardise
Returns:

the normalised array

PyConcat.UnitSelection.unitSelection(targetFeatures, corpusFeatures, method='kdtree', normalise='MinMax', topK=30)

Optionally normalise and use one of the methods to return a sequence of indices

Parameters:
  • targetFeatures
  • corpusFeatures
  • method – linearSearch, kdTree, viterbi, kViterbiExhaustive, kViterbiParallel, kViterbiGraph
  • normalise – normalisation method
  • topK – the number of paths to return (if using k-Best decoding)
Returns:

the sequence path(s)

PyConcat.UnitSelection.viterbiOld(obs, states)

Modified version of Wikipedia Viterbi, adjusted for using costs

Parameters:
  • obs – the target features
  • states – the corpus features
Returns:

the optimal state sequence

Module contents