Tag: Scala

Scalding hiding NPEs in “operator Each failed executing operation”

Yesterday I was surprised by a failing Scalding task. Everything worked fine locally and all I git was like “job failed, see cluster log”. In the cluster log I saw the following:

2014-10-24 14:38:41,222 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201410101555_2230_m_000005_3: cascading.pipe.OperatorException: [com.twitter.scalding.T…][com.twitter.scalding.RichPipe.each(RichPipe.scala:471)] operator Each failed executing operation
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:107)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:80)
at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
at cascading.operation.Identity$2.operate(Identity.java:137)
at cascading.operation.Identity.operate(Identity.java:150)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)

(more…)

26. October 2014
Enable output compression in Scalding

I just wanted to enable final output compression in one of my Scalding jobs (because I needed to reorganize a some-TB-data set).

Unfortunately scalding always produced uncompressed files. After some googling, I came across a github issue that adressed exactly this problem. Via some links I got the sample code from this repo which can be used to write compressed TSVs.

(more…)

22. September 2014

Scalding Exception: diverging implicit expansion for type com.twitter.algebird.Semigroup[T]

I was just doing a again some scalding jobs and again got an .. interesting exception:

In a groupBy operation, I wanted to sum something up using:

.groupBy('a) {
  _.sum('a -> 'c)
}

And was rewarded with this one:

[error] example.scala:20: diverging implicit expansion for type com.twitter.algebird.Semigroup[T]
[error] starting with method eitherSemigroup in object Semigroup
[error]       _.sum('a -> 'c)
[error]            ^
[error] one error found
[error] (compile:compile) Compilation failed

WTF??

Solution:

Spot the mistake? It’s the missing type hint at sum:

.groupBy('a) {
  _.sum<strong>[Int]</strong>('a -> 'c)  //  <-- [Int]
}

12. September 2014

Tag: Scala

Scalding hiding NPEs in “operator Each failed executing operation”

Enable output compression in Scalding

Scalding Exception: diverging implicit expansion for type com.twitter.algebird.Semigroup[T]

Solution: