Skip to content

aferritto/KafkaSparkEdge

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A container for deploying an Apache Kafka+Spark container on the edge

Helpful Docker tips

You can find the container id with docker ps

Copy a file to docker with docker cp <local file> <container id>:<container filepath>

Open another terminal into container: docker exec -it <container id> bash

You may need to run these commands as root.

Cleanup

Delete all containers: docker rm $(docker ps -a -q)

Delete all images: docker rmi $(docker images -q)

Running Kafka with Spark Streaming

Download: docker pull ferria/kafkaspark

Run: docker run -p 2181:2181 -p 9092:9092 -it ferria/kafkaspark

Leave this terminal running and open two more terminals side by side with docker exec -it <container id> bash.

Ports

ZooKeeper: 2181

Kafka: 9092

Shell Scripts

Terminal producer: ./produce.sh <topic>

Terminal Consumer: ./consume.sh <topic>

Running programs with spark streaming: ./run.sh <program> <args...>

Python Scripts

Spark Streaming (Consumer)

Word Count: wc.py localhost:2181 <topic>

Top Hashtags: state-tweet-count.py localhost:2181 <topic>

User Count Demo: state-user-demo.py localhost:2181 <topicname> <username>

  • A completed demo is available in state-user-count.py

Realtime Twitter Stream: realtimeTwitterStream.py


The above scripts get passed to run.sh. For example, ./run.sh wc.py localhost:2181 <topic>

Producer

Tweeter: python tweeter.py <topic> <include username?>

Usernames for Demo

  • Emma
  • Noah
  • Olivia
  • Liam
  • Ava
  • William

About

A container to deploy Apache Kafka+Spark on the edge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.2%
  • Shell 4.8%