I have an issue similar to many other posted questions regarding PySpark, but none of those solutions seem applicable to my problem, so I'm posting a new question.
I'm following this tutorial https://github.com/emiljdd/Tweepy-SparkTwitterI, but can't seem to get step 7 in Phase II to work.
Running this code:
count = 0
while count < 10:
time.sleep( 3 )
top_10_tweets = sqlContext.sql( 'Select tag, count from tweets' )
top_10_df = top_10_tweets.toPandas() # Dataframe library
display.clear_output(wait=True) #Clears the output, if a plot exists.
sns.plt.figure( figsize = ( 10, 8 ) )
sns.barplot( x="count", y="tag", data=top_10_df)
sns.plt.show()
count = count + 1
I get the following error:
Py4JJavaError: An error occurred while calling o24.sql.
: org.apache.spark.sql.AnalysisException: Table or view not found: tweets; line 1 pos 23
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:665)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:617)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:647)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:640)...
I can of course post more code if it would help, but I really am just following the tutorial without any changes.
The streaming-setup from Phase I seems fine, as I can see the Tweets being sent.
Any suggestions?
Thanks!