Zookeeper

Deepak Pant

4 min readJan 19, 2020

Coordination service, service discovery, Leader election (multimaster)

A classsical Distributed System. Master is single point of failure (GFS)

Making your system fault tolerent (HBASE does this way)

Coordination service can be imagined as Zookeper cluster.

Leader Election: Between primary and backup.

Dynamic configuraion: for service discovery <service_name> → IP:Port

Status monitoring: monitor process and see who is dead.

Critical sections: Which process goes in critical section.

Zookeeper gives you basic set of API to build all above feature. It does not gives you all the above features out of the box but you have to build it using its API. Its not a database dont use it as database!!! (very small amount of data)

Where to use Zookeeper: Where data is small size, performace sensitive and dynamic (keep changing)and it is critical for your application uptime.

One such use case is configuration data about which service lies where?

You need a odd number of zookeper server (>1) to be fault tolerant. Need this for quoroum. Client library connect to zookeper service. Client can connect to any zookeper service/server [No central bottleneck]

Zookeeper stores things like a file system

Imagine this a in-memory DB and changes get logged in disk. Every things is a path. File system like API.

Connecting a client to Zookeper ==> Creating a Session

When you create a client (where you want use the failover feature-Like multiple GFS master) it creates a session with ZK. You create ephimeral node with the client. persistent znode remains there, ephimeral nodes comes and go away when the client joins or dies.

Zookeper use a configuraable timeout (specified when you create session). If zookeper does not listen from the client in that timewindows it assumes the client is dead the ephimeral node is removed.

Notification service: if anything changes noficiation is sent to the clients.

Notification is the Key and they don’t block write

Update go through one leader (searlized through leader). Only respond to client until the majority of server have acknowledged.

Zookeeper API

Create: creates a path, data should not be too big. flag tells if the node is persistent or ephimeral

getData: gets data from the path and have watch set for notification.

exists: if the path comes up notify me.

getchildren: if nodes get added when removed. used in case of group membership/ elastic expansion of cluster.

Sync: When you are going comminucation between client (without involving master)

multi: multipath update atomically

Use case #1. Config storage (service directory)

Yahoo search stores the shards metadata in Zookeeper.

Can use it for adding new nodes in the cluster

#of request coming from client need to handled so that it is fair.

Upcoming:

To make notification subscription is permanent. Streaming updateds in Znode.

Session creation is a quorum protocol, its make session creations heavy.

Local session concept added by FB. 1000’s of client coming up will create local session (not global)

In Practise

How zookeeper is configured in multi-master mode (Spark+Zookeper)

Spark Standalone cluster is a Master-Slave architecture. Having a single standalone master creates a single point of failure. In order to remedy this single point of failure, we will be using one of the available HA options which is based on Zookeeper’s Standby Masters (Standby Masters with ZooKeeper).

Zookeeper provides us the mechanism for leader election where you can configure multiple masters in the cluster for HA purposes which will be connected to the same Zookeeper instance. One master instance will take the role of a master and others would be in the standby mode. If the current master dies, Zookeeper will elect another standby instance as a Master, recover the older Master’s state and then resume the scheduling . The process will usually take around 1–2 minutes .Since the information has been persisted on the filesystem, including Worker, Driver and Application information, this only impacts the scheduling of the new jobs without impacting any current running jobs.

Configuration

First we need to have an established Zookeeper cluster. Start the Spark Master on multiple nodes and ensure that these nodes have the same Zookeeper configuration for ZooKeeper URL and directory. The master can be added or removed at any time. If a failover of the active master occurs, the newly elected Master will contact all the previously registered Applications and Workers to inform them that a new master has been elected and the registered applications and Workers gets registered with the newly elected Master.

References:

Zookeeper

Written by Deepak Pant