Introduction
Saving your R dataframe to a .csv
can be useful; being able to view the data all at once can help to see the bigger picture. Often though, multiple dataframes, all pieces of the same project, need to be viewed this way and related back to one another. In this case viewing becomes far easier when these dataframes are written to .xlsx
across multiple sheets in a single workbook. Not to mention the time and energy saved when you no longer have to find and open multiple files.
Four packages in R are available to do just this. I generated some test data (a 30000 x 40 dataframe with sampled values between 1 and 100) and tested each one with varying levels of success.
Package use
The xlsx package
I ran into problems very early on with this package. The installation was dependent on the rJava
package which gave an error during installation. Given that we’re using a Unix platform, the fix to this was to run sudo apt-get install r-cran-rjava
in console. Executing this command succesfully installed the package and after restarting R-studio I was able to install and load the xlsx
package.
The next problem I ran into was a java out of memory error, specifically java.lang.OutOfMemoryError
. This time the fix was allocating more memory to Java trough the options(java.parameters = "-Xmx40000m")
setting. The memory increase amount (and therefore the amount of data that can be written) is dependent on the amount of memory your computer can allocate to R. In this case I have allocated 40gig, which most of us do not have laying around. Once this was done I was able write up to 10 worksheets without any problems, but again, I ran into memory problems when I tried to scale up.
The XLConnect package
This package had all the same problems as the previous with one added extra. During testing, the writing process was done many times; each time it was done the previous .xlsx
file saved as the same name needed to be deleted. If this wasn’t done R would continue attempting to write to the file indefinitely with no indications of stopping. There seems to be a fix for this using the createNames()
function but it didn’t seem worth the effort.
The WriteXLS package
This package was easy enough to use with no apparent errors. The problem was that because this method writes the data directly into a workbook, instead of first creating it locally within the R environment, it was extremely slow.
The openxlsx package
Finally, I looked to the opnexlsx
package. Again, no problems here; the most user friendly of the four.
Microbenchmark test findings
After using each of the four packages I decided, taking usability, memory issues and speed into account, not to continue further with the WriteXLS
and the XLConnect
packages. I then tested the speed of the remaining two, openxlsx
and xlsx
, using the microbenchmark
package with an evaluation number of 5 and varying sheet numbers.
1 Sheet with a 30000 x 40 dataframe run 5 times:
3 Sheets with a 30000 x 40 dataframe run 5 times:
5 Sheets with a 30000 x 40 dataframe run 5 times:
10 Sheets with a 30000 x 40 dataframe run 5 times:
Graphing the mean, median, min and max times for both the openxlsx
and the xlsx
packages show that other than min time, the openxlsx
package is faster.
Conclusion
After testing each of the four packages using a dataframe of 30000 x 40 random numbers bewteen 1 and 100, I found that 2 of the packages, the XLConnect
package and the WriteXLS
package, were either not very user friendly or were simply too slow. The remaining two, the openxlsx
package and the xlsx
package, were then tested using the microbenchmark
package with an evaluation number of 5 and various sheet numbers. The tests concluded that the openxlsx
package was overall faster. With this in mind and with the ease of use for this package, it comes in as the favorite.