Friday, January 13, 2012

Powershell (v2) - PSCX : Write-Zip (Mass Zipping Files and Lots of Them)

After nearly killing myself to figure out my own solution to writing zip files I decided to revisit the Write-Zip cmdlet in the PSCX module.  As I started working with it I found the second example in their help useless as it contains switches that don't even exist.  So, that left me with one working example.  It took me a little while to figure out how it worked, but, I finally did.  Next, was how to get it to work in large data sets.  At present I have a project with a few million files I need to zip, as one of the intermediary steps, and, this seemed like the perfect opportunity to figure out how to pull this off.  To test and be sure I got the syntax correct I made a dummy directory with *.txt files.  To process it I wrote this command:
PS C:\testing\powershell\testfiles> foreach($dir in (Get-ChildItem .)) {
  cd $dir.fullname
  Write-Zip .\*.tif "$($dir.Name).zip"
}
This example assumes PSCX is imported. As it loops through the subdirectories it looks for files matching the extension, in this case, *.tif, and copies them to a .zip file named after the current directory. For my production example this works perfectly as I named the folders sequentially to indicate the project and data set I was working on. Unfortunately, the original switch that allowed you to remove the original is not an option any longer. Nonetheless, I can accomplish the same thing after I run the previous command with one more line:
dir .\*.tif | Remove-Item
I had thought about running jobs to accelerate the process, but, wanted to really be sure this works as intended before I add another layer of complication/abstraction.

In my first actual production run I found I was zipping 1000 files in 14 seconds every time.  Using my own concoction, not only did I have corrupted zips, if the process got interrupted, but, it took much, much longer (read hours).  This way I might be able to finish my work in 1/20th of the time it took me before.  Thank you again Powershell and PSCX crew.

0 comments:

Post a Comment