Quantcast
Viewing all articles
Browse latest Browse all 7

Answer by Amy M for best way to transpose data.table

Here's a solution that uses a wrapper to tidy up the output of the data.table transpose function.

With really large data sets this seems to be more efficient than the dcast/melt approach (I tested it on a 8000 row x 29000 column data set, the below function works in about 3 minutes but dcast/melt crashed R):

# Function to clean up output of data.table transpose:

transposedt <- function(dt, varlabel) {
  require(data.table)
  dtrows = names(dt)
  dtcols = as.list(c(dt[,1]))
  dtt = transpose(dt)
  dtt[, eval(varlabel) := dtrows]
  setnames(dtt, old = names(dtt), new = c(dtcols[[1]], eval(varlabel)))
  dtt = dtt[-1,]
  setcolorder(dtt, c(eval(varlabel), names(dtt)[1:(ncol(dtt) - 1)]))
  return(dtt)
}

# Some dummy data 
mydt <- data.table(col0 = c(paste0("row", seq_along(1:100))), 
                   col01 = c(sample(seq_along(1:100), 100)),
                   col02 = c(sample(seq_along(1:100), 100)),
                   col03 = c(sample(seq_along(1:100), 100)),
                   col04 = c(sample(seq_along(1:100), 100)),
                   col05 = c(sample(seq_along(1:100), 100)),
                   col06 = c(sample(seq_along(1:100), 100)),
                   col07 = c(sample(seq_along(1:100), 100)),
                   col08 = c(sample(seq_along(1:100), 100)),
                   col09 = c(sample(seq_along(1:100), 100)),
                   col10 = c(sample(seq_along(1:100), 100)))


# Apply the function:
mydtt <- transposedt(mydt, "myvariables")

# View the results:
> mydtt[,1:10]
    myvariables row1 row2 row3 row4 row5 row6 row7 row8 row9
 1:       col01   58   53   14   96   51   30   26   15   68
 2:       col02    6   72   46   62   69    9   63   32   78
 3:       col03   21   36   94   41   54   74   82   64   15
 4:       col04   68   41   66   30   31   78   51   67   26
 5:       col05   49   30   52   78   73   71    5   66   44
 6:       col06   89   35   79   67    6   88   62   97   73
 7:       col07   66   15   27   29   58   40   35   82   57
 8:       col08   55   47   83   30   23   65   48   56   87
 9:       col09   41   10   21   33   55   81   94   25   34
10:       col10   35   17   41   44   21   66   69   61   46

What is also useful is that columns (ex rows) occur in their original order and you can name the variables column something meaningful.


Viewing all articles
Browse latest Browse all 7

Trending Articles