Enable output compression in Scalding

I just wanted to enable final output compression in one of my Scalding jobs (because I needed to reorganize a some-TB-data set).

Unfortunately scalding always produced uncompressed files. After some googling, I came across a github issue that adressed exactly this problem. Via some links I got the sample code from this repo which can be used to write compressed TSVs.

Solution:

Set the parameters correctly as stated in the docs. Beware of your Hadoop version (Yarn vs MR1):

// http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.5.0/CDH4-Installation-Guide/cdh4ig_topic_23_3.html
// MR1
// Compress Map output
set("mapred.compress.map.output", "true")
set("mapred.map.output.compression.codec", "org.apache.hadoop.io.compress.SnappyCodec")
// compress final output
set("mapred.output.compress", "true")
set("mapred.output.compression.codec", "org.apache.hadoop.io.compress.SnappyCodec")

Get the CompressedDelimitedScheme and CompressedTsv from https://github.com/morazow/WordCount-Compressed

Pipe your output to a compressed TSV:

myPipe.write(CompressedTsv("/tmp/foo"))

Check your output and the content:
```
 hadoop fs -ls /tmp/foo
```
it should list a /tmp/foo/part-00000.snappy
```
hadoop fs -text -cat /tmp/foo/part-00000.snappy
```