Apache-spark – Matrix Transpose on RowMatrix in Spark

apache-spark

Suppose I have a RowMatrix.

  1. How can I transpose it. The API documentation does not seem to have a transpose method.
  2. The Matrix has the transpose() method. But it is not distributed. If I have a large matrix greater that the memory how can I transpose it?
  3. I have converted a RowMatrix to DenseMatrix as follows

    DenseMatrix Mat = new DenseMatrix(m,n,MatArr);
    

    which requires converting the RowMatrix to JavaRDD and converting JavaRDD to an array.

Is there any other convenient way to do the conversion?

Thanks in advance

Best Solution

If anybody interested, I've implemented the distributed version @javadba had proposed.

  def transposeRowMatrix(m: RowMatrix): RowMatrix = {
    val transposedRowsRDD = m.rows.zipWithIndex.map{case (row, rowIndex) => rowToTransposedTriplet(row, rowIndex)}
      .flatMap(x => x) // now we have triplets (newRowIndex, (newColIndex, value))
      .groupByKey
      .sortByKey().map(_._2) // sort rows and remove row indexes
      .map(buildRow) // restore order of elements in each row and remove column indexes
    new RowMatrix(transposedRowsRDD)
  }


  def rowToTransposedTriplet(row: Vector, rowIndex: Long): Array[(Long, (Long, Double))] = {
    val indexedRow = row.toArray.zipWithIndex
    indexedRow.map{case (value, colIndex) => (colIndex.toLong, (rowIndex, value))}
  }

  def buildRow(rowWithIndexes: Iterable[(Long, Double)]): Vector = {
    val resArr = new Array[Double](rowWithIndexes.size)
    rowWithIndexes.foreach{case (index, value) =>
        resArr(index.toInt) = value
    }
    Vectors.dense(resArr)
  } 
Related Question