What to do in case of org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes

I’m currently gathering my first experiences with Apache Spark and in particular Spark SQL.

While I was playing a bit with Spark SQL Joins I suddenly faced an exception like Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: foo.
Followed by the parsed SQL statement etc …

Well, in MySQL the error message would have been
"Unknown column 'foo' in field list"
Aka: You are accessing a column/field foo where this field does not exist.
I was already a bit too close to the problem in order to see it at once – and I only found descriptions dealing with nested structures etc (which wasn’t the case in my situation). So it took me a couple of minutes to realize what Spark want to tell me.

Maybe this helps someone else, too.

Scalding Exception: diverging implicit expansion for type com.twitter.algebird.Semigroup[T]

I was just doing a again some scalding jobs and again got an .. interesting exception:

In a groupBy operation, I wanted to sum something up using:

.groupBy('a) {
  _.sum('a -> 'c)
}

And was rewarded with this one:

[error] example.scala:20: diverging implicit expansion for type com.twitter.algebird.Semigroup[T]
[error] starting with method eitherSemigroup in object Semigroup
[error]       _.sum('a -> 'c)
[error]            ^
[error] one error found
[error] (compile:compile) Compilation failed

WTF??

Solution:

Spot the mistake? It’s the missing type hint at sum:

.groupBy('a) {
  _.sum<strong>[Int]</strong>('a -> 'c)  //  <-- [Int]
}

Scalding: unable to compare stream elements in position: 0

I’m currently working quite a bit with Twitter’s Scalding.
Recently I split up a job into sub-jobs and suddenly got an Exception in my join:

Caused by: cascading.CascadingException: unable to compare stream elements in position: 0

If I had remembered the Fields API in detail, I would have thought about this paragraph (it’s about sorting, but the consequence is the same):

Note: When reading from a CSV, the data types are set to String,hence the sorting will be alphabetically, therefore to sort by age, an int, you need to convert it to an integer. For example …

Solution:

Ensure you are joining the correct data types and possibly convert them before. For example:

.map ('myField-> 'myField) {x:Int => x}

Compiling Cascading: FAILURE: Build failed with an exception.

Today I ran into a really stupid error message when I tried to recompile cascading-jdbc:

Evaluating root project ‘cascading-jdbc’ using build file ‘/home/…/cascading-jdbc/build.gradle’.

FAILURE: Build failed with an exception.

* Where:
Build file ‘/home/…/cascading-jdbc/build.gradle’ line: 68

* What went wrong:
A problem occurred evaluating root project ‘cascading-jdbc’.
> Could not find method create() for arguments [fatJarPrepareFiles, class eu.appsatori.gradle.fatjar.tasks.PrepareFiles] on task set.

* Try:
Run with –stacktrace option to get the stack trace. Run with –debug option to get more log output.

BUILD FAILED

Total time: 5.355 secs

Solution

Check your gradle version … I ran a brand new Ubuntu with the shipped gradle version 1.4. Well the cascading readme states that gradle 1.8 is required … and it really is.