Stuff I came across…

Building and running Spark 1.0 on Ubuntu

This article describes the step-by-step approach to build and run Apache Spark 1.0.0-SNAPSHOT. I personally use a virtual machine for testing out different big data softwares (Hadoop, Spark, Hive, etc.) and I’ve used LinuxMint 16 on VirtualBox 4.3.10 for the purpose of this blog post.

Install JDK 7

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

Verify the Java installation:

$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

Create a symlink for easier configuration later

$ cd /usr/lib/jvm/
$ sudo ln -s java-7-oracle jdk

Download Spark

Note: parambirs is my user name as well as group name on the ubuntu machine. Please replace this with your own user/group name

$ cd ~/Downloads
$ git clone https://github.com/apache/spark.git
$ sudo mv spark /usr/local
$ cd /usr/local
$ sudo chown -R parambirs:parambirs spark

Build

$ cd /usr/local/spark
$ sbt/sbt clean assembly

Run an Example

$ cd /usr/local/spark
$ ./bin/run-example org.apache.spark.examples.SparkPi 
...
Pi is roughly 3.1399
...

Run Spark Shell

$ ./bin/spark-shell

Try out some commands in the spark shell

scala> val textFile=sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12
scala> textFile.count
res0: Long = 126
scala> textFile.filter(_.contains("the")).count
res1: Long = 28
scala> exit

10 responses to “Building and running Spark 1.0 on Ubuntu”

  1. Running Spark-1.0.0-SNAPSHOT on Hadoop/YARN 2.4.0 | Param Gyaan

    […] Building and running Spark 1.0.0-SNAPSHOT on Ubuntu […]

  2. al Avatar
    al

    Hi, I hit error when executing this command : $ sbt/sbt clean assembly. Based on the logs :
    # There is insufficient memory for the Java Runtime Environment to continue.
    # Native memory allocation (malloc) failed to allocate 1431699456 bytes for committing reserved memory.

    Mind to advise on how to overcome this?
    Thanks

    1. Param Avatar
      Param

      Hi,

      It seems your system doesn’t have enough memory. How much RAM do you have? On my mac, sbt easily uses more than 1GB RAM while executing. So, you might need to try this out on another machine. If you’re using a VM, try allocating more memory to this instance.

      Thanks
      Param

      1. danielsack Avatar
        danielsack

        running $sbt/sbt clean assembly -mem 512 should do the trick

  3. al Avatar
    al

    Hi, I hit below error when executing sbt/sbt clean assembly. Mind to assist?thanks

    [warn] ::::::::::::::::::::::::::::::::::::::::::::::
    [warn] :: UNRESOLVED DEPENDENCIES ::
    [warn] ::::::::::::::::::::::::::::::::::::::::::::::
    [warn] :: org.scala-lang#scala-library;2.10.2: not found
    [warn] ::::::::::::::::::::::::::::::::::::::::::::::
    sbt.ResolveException: unresolved dependency: org.scala-lang#scala-library;2.10.2: not found

  4. Spark : Futur of the past | BigData, Synthesis and Algorithmic
  5. frank Avatar
    frank

    Thanks for sharing these clear and uncluttered steps! Worked great for me on a new Ubuntu 14.04 LTS VM instance.

  6. Spark : Futur of the past | BigData Synthesis and Algorithmic
  7. Cennet Avatar
    Cennet

    Hi Thanks for sharing but it doesnt built
    [error] (streaming-kafka-assembly/*:assembly) java.util.zip.ZipException: duplicate entry: META-INF/MANIFEST.MF
    [error] (streaming-flume-sink/avro:generate) org.apache.avro.SchemaParseException: Undefined name: “strıng”
    [error] (assembly/*:assembly) java.util.zip.ZipException: duplicate entry: META-INF/MANIFEST.MF
    How can i solve this problem

  8. redcat34 Avatar
    redcat34

    org.apache.avro.SchemaParseException: Undefined name: “strıng”
    at org.apache.avro.Schema.parse(Schema.java:1075)
    at org.apache.avro.Schema.parse(Schema.java:1158)
    at org.apache.avro.Schema.parse(Schema.java:1116)
    at org.apache.avro.Protocol.parseTypes(Protocol.java:438)
    at org.apache.avro.Protocol.parse(Protocol.java:400)
    at org.apache.avro.Protocol.parse(Protocol.java:390)
    at org.apache.avro.Protocol.parse(Protocol.java:380)
    at sbtavro.SbtAvro$$anonfun$sbtavro$SbtAvro$$compile$2.apply(SbtAvro.scala:81)
    at sbtavro.SbtAvro$$anonfun$sbtavro$SbtAvro$$compile$2.apply(SbtAvro.scala:78)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at sbtavro.SbtAvro$.sbtavro$SbtAvro$$compile(SbtAvro.scala:78)
    at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1$$anonfun$1.apply(SbtAvro.scala:112)
    at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1$$anonfun$1.apply(SbtAvro.scala:111)
    at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:186)
    at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:186)
    at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:200)
    at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:196)
    at sbt.Difference.apply(Tracked.scala:175)
    at sbt.Difference.apply(Tracked.scala:157)
    at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:196)
    at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:195)
    at sbt.Difference.apply(Tracked.scala:175)
    at sbt.Difference.apply(Tracked.scala:151)
    at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:195)
    at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:193)
    at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1.apply(SbtAvro.scala:114)
    at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1.apply(SbtAvro.scala:108)
    at scala.Function5$$anonfun$tupled$1.apply(Function5.scala:35)
    at scala.Function5$$anonfun$tupled$1.apply(Function5.scala:34)
    at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
    at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
    at sbt.std.Transform$$anon$4.work(System.scala:63)
    at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
    at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
    at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
    at sbt.Execute.work(Execute.scala:235)
    at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
    at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
    at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
    at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
    [error] (streaming-flume-sink/avro:generate) org.apache.avro.SchemaParseException: Undefined name: “strıng”
    [error] Total time: 1555 s, completed 09.Tem.2015 09:32:57

    any suggestion please.
    ty.

Leave a reply to redcat34 Cancel reply