So, I'm running this simple program on a 16 core multicore system. I run it
by issuing the following.
spark-submit --master local[*] pi.py
And the code of that program is the following.
#"""pi.py""" from pyspark import SparkContext import random N = 12500000 def sample(p): x, y = random.random(), random.random() return 1 if x*x + y*y < 1 else 0 sc = SparkContext("local", "Test App") count = sc.parallelize(xrange(0, N)).map(sample).reduce(lambda a, b: a + b) print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
When I use top to see CPU
consumption, only 1 core is being utilized. Why is it so? Seconldy, spark
documentation says that the default parallelism is contained in property
spark.default.parallelism. How can I read this property from within my