Param Gyaan

Stuff I came across…

Building and running Spark 1.0 on Ubuntu

20.5.2014

This article describes the step-by-step approach to build and run Apache Spark 1.0.0-SNAPSHOT. I personally use a virtual machine for testing out different big data softwares (Hadoop, Spark, Hive, etc.) and I’ve used LinuxMint 16 on VirtualBox 4.3.10 for the purpose of this blog post.

Install JDK 7

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

Verify the Java installation:

$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

Create a symlink for easier configuration later

$ cd /usr/lib/jvm/
$ sudo ln -s java-7-oracle jdk

Download Spark

Note: parambirs is my user name as well as group name on the ubuntu machine. Please replace this with your own user/group name

$ cd ~/Downloads
$ git clone https://github.com/apache/spark.git
$ sudo mv spark /usr/local
$ cd /usr/local
$ sudo chown -R parambirs:parambirs spark

Build

$ cd /usr/local/spark
$ sbt/sbt clean assembly

Run an Example

$ cd /usr/local/spark
$ ./bin/run-example org.apache.spark.examples.SparkPi 
...
Pi is roughly 3.1399
...

Run Spark Shell

$ ./bin/spark-shell

Try out some commands in the spark shell

scala> val textFile=sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12
scala> textFile.count
res0: Long = 126
scala> textFile.filter(_.contains("the")).count
res1: Long = 28
scala> exit

22.02

A Tour of Go – Exercise Solutions
21.10

How to write a Docker based Haskell web service — from scratch
06.08

Downloading sources for maven and sbt project dependencies
27.07

Failing fast in bash scripts
20.05

Writing and Running a Standalone App with Spark 1.0 and YARN 2.4.0
20.05

Running Spark 1.0 on Hadoop/YARN 2.4.0

10 responses to “Building and running Spark 1.0 on Ubuntu”

Running Spark-1.0.0-SNAPSHOT on Hadoop/YARN 2.4.0 | Param Gyaan

2014.05.20

[…] Building and running Spark 1.0.0-SNAPSHOT on Ubuntu […]

Reply
al

2014.07.23

Hi, I hit error when executing this command : $ sbt/sbt clean assembly. Based on the logs :
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1431699456 bytes for committing reserved memory.

Mind to advise on how to overcome this?
Thanks

Reply
1. Param
  
  2014.07.24
  
  Hi,
  
  It seems your system doesn’t have enough memory. How much RAM do you have? On my mac, sbt easily uses more than 1GB RAM while executing. So, you might need to try this out on another machine. If you’re using a VM, try allocating more memory to this instance.
  
  Thanks
  Param
  
  Reply
  1. danielsack
    
    2015.02.11
    
    running $sbt/sbt clean assembly -mem 512 should do the trick
al

2014.08.04

Hi, I hit below error when executing sbt/sbt clean assembly. Mind to assist?thanks

[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.scala-lang#scala-library;2.10.2: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.scala-lang#scala-library;2.10.2: not found

Reply
Spark : Futur of the past | BigData, Synthesis and Algorithmic

2014.11.07

[…] https://parambirs.wordpress.com/2014/05/20/building-and-running-spark-1-0-0-snapshot-on-ubuntu/ […]

Reply
frank

2014.12.10

Thanks for sharing these clear and uncluttered steps! Worked great for me on a new Ubuntu 14.04 LTS VM instance.

Reply
Spark : Futur of the past | BigData Synthesis and Algorithmic

2015.01.05

[…] https://parambirs.wordpress.com/2014/05/20/building-and-running-spark-1-0-0-snapshot-on-ubuntu/ […]

Reply
Cennet

2015.02.12

Hi Thanks for sharing but it doesnt built
[error] (streaming-kafka-assembly/*:assembly) java.util.zip.ZipException: duplicate entry: META-INF/MANIFEST.MF
[error] (streaming-flume-sink/avro:generate) org.apache.avro.SchemaParseException: Undefined name: “strıng”
[error] (assembly/*:assembly) java.util.zip.ZipException: duplicate entry: META-INF/MANIFEST.MF
How can i solve this problem

Reply
redcat34

2015.07.09

org.apache.avro.SchemaParseException: Undefined name: “strıng”
at org.apache.avro.Schema.parse(Schema.java:1075)
at org.apache.avro.Schema.parse(Schema.java:1158)
at org.apache.avro.Schema.parse(Schema.java:1116)
at org.apache.avro.Protocol.parseTypes(Protocol.java:438)
at org.apache.avro.Protocol.parse(Protocol.java:400)
at org.apache.avro.Protocol.parse(Protocol.java:390)
at org.apache.avro.Protocol.parse(Protocol.java:380)
at sbtavro.SbtAvro$$anonfun$sbtavro$SbtAvro$$compile$2.apply(SbtAvro.scala:81)
at sbtavro.SbtAvro$$anonfun$sbtavro$SbtAvro$$compile$2.apply(SbtAvro.scala:78)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at sbtavro.SbtAvro$.sbtavro$SbtAvro$$compile(SbtAvro.scala:78)
at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1$$anonfun$1.apply(SbtAvro.scala:112)
at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1$$anonfun$1.apply(SbtAvro.scala:111)
at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:186)
at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:186)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:200)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:196)
at sbt.Difference.apply(Tracked.scala:175)
at sbt.Difference.apply(Tracked.scala:157)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:196)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:195)
at sbt.Difference.apply(Tracked.scala:175)
at sbt.Difference.apply(Tracked.scala:151)
at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:195)
at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:193)
at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1.apply(SbtAvro.scala:114)
at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1.apply(SbtAvro.scala:108)
at scala.Function5$$anonfun$tupled$1.apply(Function5.scala:35)
at scala.Function5$$anonfun$tupled$1.apply(Function5.scala:34)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$4.work(System.scala:63)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.Execute.work(Execute.scala:235)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[error] (streaming-flume-sink/avro:generate) org.apache.avro.SchemaParseException: Undefined name: “strıng”
[error] Total time: 1555 s, completed 09.Tem.2015 09:32:57

any suggestion please.
ty.

Reply