Cassandra Installation and Cluster Setup

Cassandra Introduction

Apache Cassandra is a free and open-source distributed NoSQL database management system. It is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Cassandra prerequisites

Minimum hardware requirements:

CPU: 2 cores

RAM: 8 GB

Software requirements:

Java 8

Installation and Cluster Setup Steps

To setup a cassandra cluster, we need to install cassandra on each server which will be the part of the cluster.

Here, we are setting up 4 node cassandra cluster.

Suppose we are having 4 nodes, namely,

XXX.XX.XX.1

XXX.XX.XX.2

XXX.XX.XX.3

XXX.XX.XX.4

Note: In order to set up a cluster, perform the following steps on each involved server.

Installation

  1. Install Java 8 on each server using following commands on the terminal. (Optional if already installed)
    1. To make the Open JDK package available, you’ll have to add a Personal Package Archives (PPA) using this command.
      sudo add-apt-repository ppa:openjdk-r/ppa
    2. Update the package database.
      sudo apt-get update
    3. Now install the Java 8.
      sudo apt-get install openjdk-8-jdk
  2. Install cassandra on each servers using the following list of commands on the terminal. Here we are installing Cassandra version 3.10.
    1. echo "deb http://www.apache.org/dist/cassandra/debian 310x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list

      Note 310x means we are installing cassandra of version 3.10. You can change this value if you want to install different version of Cassandra.

    2. Add the Apache Cassandra repository keys using following command.
      curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -
    3. Update the package database using following command.
      sudo apt-get update

      Note If you encounter any public key related error, such as “The following signatures couldn’t be verified because the public key is not available: NO_PUBKEY A278B781FE4B2BDA”.

      Then add the public key A278B781FE4B2BDA as follows:

      sudo apt-key adv –keyserver pool.sks-keyservers.net –recv-key A278B781FE4B2BDA

      Now repeat ‘sudo apt-get update‘. The actual key may be different, you get it from the error message itself.

    4. Now install the Cassandra using following command.
      sudo apt-get install cassandra

      By now, you have installed cassandra on each involved server.
      Go through the following steps to setup cluster.

Cluster Setup Steps

  1. Stop cassandra service on each server (XXX.XX.XX.1,XXX.XX.XX.2,XXX.XX.XX.3,XXX.XX.XX.4) using following command.
    sudo service cassandra stop
  2. Delete default cassandra data from each server located in data directory of cassandra using following command.
    sudo rm -rf /var/lib/cassandra/data/system/*
  3. Now open the cassandra.yaml file located in /etc/cassandra directory on each server one after another to configure cluster.
    Note Following are the parameters from cassandra.yaml file that needs to be configured in order to setup cluster.
    cluster_name : You change the cluster name by setting this parameter. It is optional.
    -seeds : This is a comma-delimited list of the IP address of each node in the cluster.
    listen_address : This is IP address that other nodes in the cluster will use to connect to this one. It defaults to localhost and needs changed to the IP address of the node.
    rpc_address : This is the IP address for remote procedure calls. It defaults to localhost. If the server’s hostname is properly configured, leave this as is. Otherwise, change to server’s IP address or the loopback address (127.0.0.1).
    endpoint_snitch : Name of the snitch, which is what tells Cassandra about what its network looks like. This defaults to SimpleSnitch, which is used for networks in one datacenter. In our case, we’ll change it to GossipingPropertyFileSnitch, which is preferred for production setups.
    auto_bootstrap : This directive is not in the configuration file, so it has to be added and set to false. This makes new nodes automatically use the right data. It is optional if you’re adding nodes to an existing cluster, but required when you’re initializing a fresh cluster, that is, one with no data.

    1. Open cassandra.yaml on XXX.XX.XX.1 node.
      vi /etc/cassandra/cassandra.yaml

      Search and set the parameters as follows,

      cluster_name: ‘Helical’
       - seeds: "XXX.XX.XX.1,XXX.XX.XX.2,XXX.XX.XX.3,XXX.XX.XX.4"
       listen_address: XXX.XX.XX.1
       rpc_address: XXX.XX.XX.1
       endpoint_snitch: GossipingPropertyFileSnitch
       auto_bootstrap: true
    2. Open cassandra.yaml on XXX.XX.XX.2 node.
      vi /etc/cassandra/cassandra.yaml

      Search and set the parameters as follows,

      cluster_name: ‘Helical’
       - seeds: "XXX.XX.XX.2,XXX.XX.XX.1,XXX.XX.XX.3,XXX.XX.XX.4"
       listen_address: XXX.XX.XX.2
       rpc_address: XXX.XX.XX.2
       endpoint_snitch: GossipingPropertyFileSnitch
       auto_bootstrap: true
    3. Open cassandra.yaml on XXX.XX.XX.3 node.
      vi /etc/cassandra/cassandra.yaml

      Search and set the parameters as follows,

      cluster_name: ‘Helical’
       - seeds: "XXX.XX.XX.3,XXX.XX.XX.1,XXX.XX.XX.2,XXX.XX.XX.4"
       listen_address: XXX.XX.XX.3
       rpc_address: XXX.XX.XX.3
       endpoint_snitch: GossipingPropertyFileSnitch
       auto_bootstrap: true
    4. Open cassandra.yaml on XXX.XX.XX.4 node.
      vi /etc/cassandra/cassandra.yaml

      Search and set the parameters as follows,

      cluster_name: ‘Helical’
       - seeds: "XXX.XX.XX.4,XXX.XX.XX.1,XXX.XX.XX.2,XXX.XX.XX.3"
       listen_address: XXX.XX.XX.4
       rpc_address: XXX.XX.XX.4
       endpoint_snitch: GossipingPropertyFileSnitch
       auto_bootstrap: true
  4. Start the Cassandra daemon on each server on each server.
    sudo service cassandra start

    Note You can check the status of cassandra using following command.

    sudo service cassandra status
  5. Check the status of the cassandra cluster using nodetool utility command. Fire command on one node say XXX.XX.XX.1 as follows,
    sudo nodetool status

    You will get output like,

    Datacenter: DC1
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address     Load       Tokens Owns (effective) Host ID                              Rack
    UN XXX.XX.XX.1 30.94 GiB  256    100.0%           ae45c3c5-30e3-4d60-9e5c-3f0004bbfea4 rack1

    Note You’ll find that only the local node is listed, because it’s not yet able to communicate with the other nodes.

  6. Allow communication by opening the following network ports on each node using the following commands.
    sudo ufw allow 7000

    7000 is the TCP port for commands and data.

    sudo ufw allow 9042

    9042 is the TCP port for the native transport server. cqlsh, the Cassandra command line utility, will connect to the cluster through this port.
    Note You can change the port number as required by changing it in cassandra.yaml
    storage_port: 7000
    native_transport_port: 9042

  7. Now Check the status of the cassandra cluster again.
    sudo nodetool status

    Output would be like this,

    Datacenter: dc1
    
    ===============
    
    Status=Up/Down
    
    |/ State=Normal/Leaving/Joining/Moving
    
    -- Address     Load       Tokens Owns (effective) Host ID                              Rack
    
    UN XXX.XX.XX.3 31.3 GiB   256    100.0%           20412cb7-b9bf-4373-bf6f-d0958a993c95 rack1
    
    UN XXX.XX.XX.4 30.78 GiB  256    100.0%           a49ee892-f816-4e0d-84b4-3b3bf1421f33 rack1
    
    UN XXX.XX.XX.1 30.94 GiB  256    100.0%           ae45c3c5-30e3-4d60-9e5c-3f0004bbfea4 rack1
    
    UN XXX.XX.XX.2 32.5 GiB   256    100.0%           f9084fd5-06f4-4408-ba90-32c9afd60ede rack1

You can see that all the nodes are up and normal.

In this way, you can install and setup cassandra cluster.

Conclusion

In this blog, we have discussed how can we install and setup the cassandra cluster.

Thanks,

Vishal

Leave a Reply