Skip to content

akaanirban/KafkaSparkEdge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

A container for deploying an Apache Kafka+Spark container on the edge

All python files are in /PythonCodes

Start the zookeeper and kafka server as usua;l

To start the dummy producer, first do pip3 install kafka-python to install the necessary python library, then just run

python3 PythonCodes/dummyProducer.py localhost:9092 <topicname> <filename>

After having that done, run the spark streaming program as

park-submit --jars PythonCodes/spark-streaming-kafka-0-8-assembly_2.11-2.0.0-preview.jar PythonCodes/pythonSpark.py localhost:2181 <topicname>

or to redirect the output to a text file:

spark-submit --jars PythonCodes/spark-streaming-kafka-0-8-assembly_2.11-2.0.0-preview.jar PythonCodes/pythonSpark.py
localhost:2181 <topicname> > output.txt 

TODO:

  • [] Add the jar in the docker container
  • [] Install pip3 for python3 and kafka-python in docker

Python Dependencies to install

  • pip3 install tweepy
  • pip3 install kafka-python
  • pip3 install python-twitter
  • pip3 pip3 install pyspark
To run locally, enter bootstrap_server=localhost:9092 for kafka server port, and zookeeper:port=localhost:2181

About

A container to deploy Apache Kafka+Spark on the edge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published