This article describes the step-by-step approach to build and run Apache Spark 1.0.0-SNAPSHOT. I personally use a virtual machine for testing out different big data softwares (Hadoop, Spark, Hive, etc.) and I’ve used LinuxMint 16 on VirtualBox 4.3.10 for the purpose of this blog post.
Install JDK 7
$ sudo add-apt-repository ppa:webupd8team/java $ sudo apt-get update $ sudo apt-get install oracle-java7-installer
Verify the Java installation:
$ java -version java version "1.7.0_55" Java(TM) SE Runtime Environment (build 1.7.0_55-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
Create a symlink for easier configuration later
$ cd /usr/lib/jvm/ $ sudo ln -s java-7-oracle jdk
Download Spark
Note: parambirs is my user name as well as group name on the ubuntu machine. Please replace this with your own user/group name
$ cd ~/Downloads
$ git clone https://github.com/apache/spark.git
$ sudo mv spark /usr/local
$ cd /usr/local
$ sudo chown -R parambirs:parambirs spark
Build
$ cd /usr/local/spark $ sbt/sbt clean assembly
Run an Example
$ cd /usr/local/spark
$ ./bin/run-example org.apache.spark.examples.SparkPi
...
Pi is roughly 3.1399
...
Run Spark Shell
$ ./bin/spark-shell
Try out some commands in the spark shell
scala> val textFile=sc.textFile("README.md") textFile: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at <console>:12 scala> textFile.count res0: Long = 126 scala> textFile.filter(_.contains("the")).count res1: Long = 28 scala> exit
Pingback: Running Spark-1.0.0-SNAPSHOT on Hadoop/YARN 2.4.0 | Param Gyaan
Hi, I hit error when executing this command : $ sbt/sbt clean assembly. Based on the logs :
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1431699456 bytes for committing reserved memory.
Mind to advise on how to overcome this?
Thanks
Hi,
It seems your system doesn’t have enough memory. How much RAM do you have? On my mac, sbt easily uses more than 1GB RAM while executing. So, you might need to try this out on another machine. If you’re using a VM, try allocating more memory to this instance.
Thanks
Param
running $sbt/sbt clean assembly -mem 512 should do the trick
Hi, I hit below error when executing sbt/sbt clean assembly. Mind to assist?thanks
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.scala-lang#scala-library;2.10.2: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.scala-lang#scala-library;2.10.2: not found
Pingback: Spark : Futur of the past | BigData, Synthesis and Algorithmic
Thanks for sharing these clear and uncluttered steps! Worked great for me on a new Ubuntu 14.04 LTS VM instance.
Pingback: Spark : Futur of the past | BigData Synthesis and Algorithmic
Hi Thanks for sharing but it doesnt built
[error] (streaming-kafka-assembly/*:assembly) java.util.zip.ZipException: duplicate entry: META-INF/MANIFEST.MF
[error] (streaming-flume-sink/avro:generate) org.apache.avro.SchemaParseException: Undefined name: “strıng”
[error] (assembly/*:assembly) java.util.zip.ZipException: duplicate entry: META-INF/MANIFEST.MF
How can i solve this problem
org.apache.avro.SchemaParseException: Undefined name: “strıng”
at org.apache.avro.Schema.parse(Schema.java:1075)
at org.apache.avro.Schema.parse(Schema.java:1158)
at org.apache.avro.Schema.parse(Schema.java:1116)
at org.apache.avro.Protocol.parseTypes(Protocol.java:438)
at org.apache.avro.Protocol.parse(Protocol.java:400)
at org.apache.avro.Protocol.parse(Protocol.java:390)
at org.apache.avro.Protocol.parse(Protocol.java:380)
at sbtavro.SbtAvro$$anonfun$sbtavro$SbtAvro$$compile$2.apply(SbtAvro.scala:81)
at sbtavro.SbtAvro$$anonfun$sbtavro$SbtAvro$$compile$2.apply(SbtAvro.scala:78)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at sbtavro.SbtAvro$.sbtavro$SbtAvro$$compile(SbtAvro.scala:78)
at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1$$anonfun$1.apply(SbtAvro.scala:112)
at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1$$anonfun$1.apply(SbtAvro.scala:111)
at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:186)
at sbt.FileFunction$$anonfun$cached$1.apply(Tracked.scala:186)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:200)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3$$anonfun$apply$4.apply(Tracked.scala:196)
at sbt.Difference.apply(Tracked.scala:175)
at sbt.Difference.apply(Tracked.scala:157)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:196)
at sbt.FileFunction$$anonfun$cached$2$$anonfun$apply$3.apply(Tracked.scala:195)
at sbt.Difference.apply(Tracked.scala:175)
at sbt.Difference.apply(Tracked.scala:151)
at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:195)
at sbt.FileFunction$$anonfun$cached$2.apply(Tracked.scala:193)
at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1.apply(SbtAvro.scala:114)
at sbtavro.SbtAvro$$anonfun$sourceGeneratorTask$1.apply(SbtAvro.scala:108)
at scala.Function5$$anonfun$tupled$1.apply(Function5.scala:35)
at scala.Function5$$anonfun$tupled$1.apply(Function5.scala:34)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
at sbt.std.Transform$$anon$4.work(System.scala:63)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:226)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
at sbt.Execute.work(Execute.scala:235)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:226)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[error] (streaming-flume-sink/avro:generate) org.apache.avro.SchemaParseException: Undefined name: “strıng”
[error] Total time: 1555 s, completed 09.Tem.2015 09:32:57
any suggestion please.
ty.