Tag: Hadoop

  • Scalding hiding NPEs in “operator Each failed executing operation”

    Yesterday I was surprised by a failing Scalding task. Everything worked fine locally and all I git was like “job failed, see cluster log”. In the cluster log I saw the following:

    2014-10-24 14:38:41,222 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201410101555_2230_m_000005_3: cascading.pipe.OperatorException: [com.twitter.scalding.T…][com.twitter.scalding.RichPipe.each(RichPipe.scala:471)] operator Each failed executing operation
    at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:107)
    at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
    at cascading.flow.stream.FunctionEachStage$1.collect(FunctionEachStage.java:80)
    at cascading.tuple.TupleEntryCollector.safeCollect(TupleEntryCollector.java:145)
    at cascading.tuple.TupleEntryCollector.add(TupleEntryCollector.java:133)
    at cascading.operation.Identity$2.operate(Identity.java:137)
    at cascading.operation.Identity.operate(Identity.java:150)
    at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
    at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
    at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
    at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
    at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:130)

    (more…)

  • Enable output compression in Scalding

    I just wanted to enable final output compression in one of my Scalding jobs (because I needed to reorganize a some-TB-data set).

    Unfortunately scalding always produced uncompressed files. After some googling, I came across a github issue that adressed exactly this problem. Via some links I got the sample code from this repo which can be used to write compressed TSVs.

    (more…)

  • Compiling Cascading: FAILURE: Build failed with an exception.

    Today I ran into a really stupid error message when I tried to recompile cascading-jdbc:

    Evaluating root project ‘cascading-jdbc’ using build file ‘/home/…/cascading-jdbc/build.gradle’.

    FAILURE: Build failed with an exception.

    * Where:
    Build file ‘/home/…/cascading-jdbc/build.gradle’ line: 68

    * What went wrong:
    A problem occurred evaluating root project ‘cascading-jdbc’.
    > Could not find method create() for arguments [fatJarPrepareFiles, class eu.appsatori.gradle.fatjar.tasks.PrepareFiles] on task set.

    * Try:
    Run with –stacktrace option to get the stack trace. Run with –debug option to get more log output.

    BUILD FAILED

    Total time: 5.355 secs

    Solution

    Check your gradle version … I ran a brand new Ubuntu with the shipped gradle version 1.4. Well the cascading readme states that gradle 1.8 is required … and it really is.