Sistemas distribuidos escalablesTutorial
Miguel Carcamo VasquezDaniel Wladdimiro Cottet
Profesores: Erika Rosas OlivosNicolas Hidalgo Castillo
Departamento de Ingenierıa InformaticaUniversidad de Santiago de Chile
November, 2014
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 1 / 31
KafkaWhat is Kafka?
Apache Kafka is publish-subscribe messaging rethought as a distributedcommit log.
• Fast
• Hundreds of megabytes
• Scalable
• Elastically• Transparently
• Durable
• Persisted on disk
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 2 / 31
KafkaArchitecture
It is a distributed, partitioned, replicated commit log service. It providesthe functionality of a messaging system, but with a unique design.
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 3 / 31
KafkaArchitecture
A two server Kafka cluster hosting four partitions (P0-P3) with twoconsumer groups. Consumer group A has two consumer instances andgroup B has four.
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 4 / 31
KafkaZookeper
zookeeperServer.sh
bin/zookeeper-server-start.sh ../config/zookeeper.properties
Configuration
• dataDir
• clientPort
• maxClientCnxns
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 5 / 31
KafkaKafka Server
kafkaServer.sh
bin/kafka-server-start.sh ../config/server.properties
Mandatory configuration
• broker.id
• log.dirs
• zookeeper.connect
Optional configuration
• Log basics
• num.partition
• Log Retention Policy
• log.retention.hours• log.flush.interval.messages• log.flush.interval.ms
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 6 / 31
KafkaCreate Topics
createTopics.sh
bin/kafka-topics.sh –create –zookeeper localhost:2181 –replication-factor 1–partitions 1 –topic $1
Parameters
• replication-factor
• partitions
• topic
Configuration –config
• max.message.bytes
• index.interval.bytes
• flush.messages
• flush.ms
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 7 / 31
KafkaCheck Topics
checkTopics.sh
bin/kafka-topics.sh –list –zookeeper localhost:2181
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 8 / 31
KafkaProducer
createProducer.sh
bin/kafka-console-producer.sh –broker-list localhost:9092 –topic $1
Mandatory configuration
• metadata.broker.list
• request.required.acks
• producer.type
• serializer.class
Optional configuration
• compression.codec
• request.timeout.ms
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 9 / 31
KafkaConsumer
createConsumer.sh
bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topic $1–from-beginning
Mandatory configuration
• group.id
• zookeeper.connect
Optional configuration
• fetch.message.max.bytes
• consumer.id
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 10 / 31
KafkaClients
Producer Daemon StormPython Scala DSLGo (AKA golang) HTTP RESTC JRubyC++ Perl.NET ClojureRuby Node.js
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 11 / 31
KafkaMulti-Broker
createMultiBroker.sh
cp config/server.properties config/server-1.propertiescp config/server.properties config/server-2.properties
config/server-1.properties:broker.id=1port=9093log.dir=/tmp/kafka-logs-1
config/server-2.properties:broker.id=2port=9094log.dir=/tmp/kafka-logs-2
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 12 / 31
KafkaCreate Kafka Server
Kafka Server 1
../bin/kafka-server-start.sh config/server-1.properties &
Kafka Server 2
../bin/kafka-server-start.sh config/server-2.properties &
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 13 / 31
Topic with replication
Create new topic
../bin/kafka-topics.sh –create –zookeeper localhost:2181–replication-factor 3 –partitions 1 –topic my-replicated-topic
Show topic
../bin/kafka-topics.sh –describe –zookeeper localhost:2181 –topicmy-replicated-topic
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 14 / 31
Fault Tolerance
Kill replication
ps -ef — grep server-1.propertieskill -9 # pid
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 15 / 31
Storm
What is Storm?
• Computation platform for stream data processing
• Fault Tolerant• Scalable• Distributed• Reliable• Learn, code and run
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 16 / 31
Architecture
Fig. 1: Storm Cluster
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 17 / 31
Spouts & Bolts
Fig. 2: Spouts & Bolts
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 18 / 31
Physical & Logical
Fig. 3: Physical & Logical Architecture
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 19 / 31
Before coding
• Install maven or graddle
• Install Eclipse (only if you want to)
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 20 / 31
Coding a Spout
Structure
• import libraries
• public class ”SpoutName” extends BaseRichSpout
• class variables• public void open(Map conf, TopologyContext topologyContext,
SpoutOutCollector collector)• public void nextTuple()• public void declareOutputFields(OutputFields declarer)• Your methods
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 21 / 31
Coding a Bolt
Structure
• import libraries
• public class ”BoltName” extends BaseRichBolt
• class variables• public ”BoltName”() (Constructor)• public void prepare(Map map, TopologyContext topologyContext,
OutputCollector collector)• public void execute(Tuple input)• public void declareOutputFields(OutputFields declarer)• Your methods
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 22 / 31
Coding a Topology
Structure
• import libraries
• public class Topology
• class variables• public static void main(String[] args)
• Config config = new Config()• TopologyBuilder b = new TopologyBuilder()• b.setSpout(”SpoutName”, new ”SpoutName”)• b.setBolt(”BoltName”, new
”BoltName”.shuffleGroping(”SpoutName”))• final LocalCluster cluster = new LocalCluster()• cluster.submitTopology(”TopologyName”, config, b.createTopology())
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 23 / 31
Compile & Run
• Download a Storm release , unpack it, and put the unpacked bin/directory on your PATH.
• cd myapp
• mvn package
• storm jar target/my-app-1.0-SNAPSHOT.jarcom.mycompany.app.App
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 24 / 31
Grouping
• Shuffle: Stream tuples are randomly distributed such that each bolt isguaranteed to get an equal number of tuples.
• Fields: Stream tuples are partitioned by the fields specified in thegrouping.
• All grouping: Stream tuples are replicated across all the bolts.
• Global grouping: entire stream goes to a single bolt.
• Direct Grouping: the source decides which component will receive thetuple.
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 26 / 31
Project Topology
Fig. 5: Project Topology
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 27 / 31
Web ServicesNode.js
Install Node.js
https://github.com/joyent/node/archive/master.zip
./configuremakemake install
Run web services
node server.js
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 28 / 31
KafkaServer Start
Stages
1 zookeeperServer.sh
2 kafkaServer.sh
3 createTopics.sh voteLog
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 29 / 31
Web ServicesConnection Kafka
Install API Kafka-Python
pip install ./kafka-python
runKafkaLogs.sh
./tail2kafka/tail2kafka -l ../logs/vote-info.log -t voteLog -s localhost -p9092 -d 5
Final stage
createProducer.sh voteLog
M. Carcamo & D. Wladdimiro (USACH) Kafka & Storm November, 2014 30 / 31