Thursday, August 14, 2014

Script to manage MaxScale

MaxScale 1.0 from SkySQL is now in Beta and there are some cool features in it, I guess some adventurous people has already put it into production. There are still some rough edges and stuff to be fixed, but it is clearly close to GA. One thing missing though are something to manage starting and stopping MaxScale in a somewhat controlled way, which is what this blog is all about.

I have developed two simple scripts that should help you manage MaxScale in a reasonable way, but before we go into the actual scripts, there are a few things I need to tell you. To begin with, if you haven't yet downloaded MaxScale 1.0 beta, you can get it from MariaDB.com, just go to Resources->MaxScale and to get to the downloads you first need to register (which is free). Here are downloads to rpms and source, but if you are currently looking for a tarball, there seems to be none, well actually there is, the first link under "Source Tarball" actually is a binary tarball. I have reported this so by the time you read this, this might have been fixed. Of course you can always get the source from github and build it yourself.

Anyway, for MaxScale to start, you need a configuration file and you have to set the MaxScale home directory. If you are on CentOS or RedHat and install the rpms (which is how I set it us), MaxScale is installed in /usr/local/skysql/maxscale, and this is also what MAXSCALE_HOME needs to be set to. MaxScale can take a configuration file argument, but if this isn't passed, then MAXSCALE_HOME/etc/MaxScale.cnf will be used. In addition, you will probably have to add MAXSCALE_HOME/lib to your LD_LIBRARY_PATH variable.

All in all, there some environment variables to set before we can start MaxScale, and this is the job of the first script, maxenv.sh. For this to work, it has to be placed in the bin directory under MAXSCALE_HOME. In this script we also set MAXSCALE_USER, and this is used by MaxScale start / stop script to be explained later, and this is the linux user that will run MaxScale. You can set this to an empty string to run maxscale as the current user, which is the normal way that MaxScale runs, but in that case you need to make sure that the user in question has write access to MAXSCALE_HOME and subdirectories.

So, here we go, here is maxenv.sh, and you can copy this into the bin directory under your MAXSCALE_HOME directory and use it like that (note that I set the user to run MaxScale to mysql, so if you don't have that user, then create that or modify maxenv.sh accordingly):
#!/bin/bash
#
export MAXSCALE_HOME=$(cd `dirname ${BASH_SOURCE[0]}`/..; pwd)
export MAXSCALE_USER=mysql
PATH=$PATH:$MAXSCALE_HOME/bin
# Add MaxScale lib directory to LD_LIBRARY_PATH, unless it is already there.
if [ `echo $LD_LIBRARY_PATH | awk -v RS=: '{print $0}' | grep -c "^$MAXSCALE_HOME/lib$"` -eq 0 ]; then
   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MAXSCALE_HOME/lib
fi
export MAXSCALE_PIDFILE=$MAXSCALE_HOME/log/maxscale.pid


Now it's time for maxctl, which is the script that starts and stops MaxScale, and this also must be placed in the MAXSCALE_HOME/bin directory, and note that this script relies on maxenv.sh above, so to use maxctl you also need maxenv.sh as above. The script is rather long, but there are probably a thing or two missing anyway, but for me this has been useful:

#!/bin/bash
#
# Script to start and stop MaxScale.
#
# Set up the environment
. $(cd `dirname $0`; pwd -P)/maxenv.sh

# Set default variables
NOWAIT=0
HELP=0
QUIET=0
MAXSCALE_PID=0

# Get pid of MaxScale if it is running.
# Check that the pidfile exists.
if [ -e $MAXSCALE_PIDFILE ]; then
   MAXSCALE_PID=`cat $MAXSCALE_PIDFILE`
# Check if the process is running.
   if [ `ps --no-heading -p $MAXSCALE_PID | wc -l` -eq 0 ]; then
      MAXSCALE_PID=0
   fi
fi


# Function to print output
printmax() {
   if [ $QUIET -eq 0 ]; then
      echo $* >&2
   fi
}

# Function to print help
helpmax() {
    echo "Usage: $0 start|stop|status|restart"
    echo "Options:"
    echo "-f - MaxScale config file"
    echo "-h - Show this help"
    echo "-n - Don't wait for operation to finish before exiting"
    echo "-q - Quiet operation"
}


# Function to start maxscale
startmax() {
# Check if MaxScale is already running.
   if [ $MAXSCALE_PID -ne 0 ]; then
      printmax "MaxScale is already running"
      exit 1
   fi

# Check that we are running as root if a user to run as is specified.
   if [ "x$MAXSCALE_USER" != "x" -a `id -u` -ne 0 ]; then
      printmax "$0 must be run as root"
      exit 1
   fi

# Check that we can find maxscale
   if [ ! -e $MAXSCALE_HOME/bin/maxscale ]; then
      printmax "Cannot find MaxScale executable ($MAXSCALE_HOME/bin/maxscale)"
      exit 1
   fi

# Check that the config file exists, if specified.
   if [ "x$MAXSCALE_CNF" != "x" -a ! -e "$MAXSCALE_CNF" ]; then
      printmax "MaxScale configuration file ($MAXSCALE_CNF) not found"
      exit 1
   fi

# Start MaxScale
   if [ "x$MAXSCALE_USER" == "x" ]; then
      $MAXSCALE_HOME/bin/maxscale -c $MAXSCALE_HOME ${MAXSCALE_CNF:+-f $MAXSCALE_CNF}
   else
      su $MAXSCALE_USER -m -c "$MAXSCALE_HOME/bin/maxscale -c $MAXSCALE_HOME ${MAXSCALE_CNF:+-f $MAXSCALE_CNF}"
   fi
}


# Function to stop maxscale
stopmax() {
   NOWAIT=1
   if [ "x$1" == "-n" ]; then
      NOWAIT=0
   fi

# Check that we are running as root if a user to run as is specified.
   if [ "x$MAXSCALE_USER" != "x" -a `id -u` -ne 0 ]; then
      printmax "$0 must be run as root"
      exit 1
   fi

# Check that the pidfile exists.
   if [ ! -e $MAXSCALE_PIDFILE ]; then
      printmax "Can't find MaxScale pidfile ($MAXSCALE_PIDFILE)"
      exit 1
   fi
   MAXSCALE_PID=`cat $MAXSCALE_PIDFILE`

# Kill MaxScale
   kill $MAXSCALE_PID

   if [ $NOWAIT -ne 0 ]; then
# Wait for maxscale to die.
      while [ `ps --no-heading -p $MAXSCALE_PID | wc -l` -ne 0 ]; do
      usleep 100000
      done
      MAXSCALE_PID=0
   fi
}


# Function to show the status of MaxScale
statusmax() {
# Check that the pidfile exists.
   if [ $MAXSCALE_PID -ne 0 ]; then
      printmax "MaxScale is running (pid: $MAXSCALE_PID user: `ps -p $MAXSCALE_PID --no-heading -o euser`)"
      exit 0
   fi
   printmax "MaxScale is not running"
   exit 1
}

# Process options.
while getopts ":f:hnq" OPT; do
   case $OPT in
      f)
         MAXSCALE_CNF=$OPTARG
         ;;
      h)
         helpmax
         exit 0
         ;;
      n)
         NOWAIT=1
         ;;
      q)
         QUIET=1
         ;;
      \?)
         echo "Invalid option: -$OPTARG"
         ;;
   esac
done

# Process arguments following options.
shift $((OPTIND - 1))
OPER=$1

# Check that an operation was passed
if [ "x$1" == "x" ]; then
   echo "$0: your must enter an operation: start|stop|restart|status" >&2
   exit 1
fi


# Handle the operations.
case $OPER in
   start)
      startmax
      ;;
   stop)
      stopmax
      ;;
   status)
      statusmax
      ;;
   restart)
      if [ $MAXSCALE_PID -ne 0 ]; then
         NOWAITSAVE=$NOWAIT
         NOWAIT=0
         stopmax
         NOWAIT=$NOWAITSAVE
      fi
      startmax
      ;;
   *)
      echo "Unknown operation: $OPER. Use start|stop|restart|status"
      exit 1
esac


To use this script, you call it with one of the 4 main operations: start, stop, restart or status. If you are to run as a specific user, you have top run it as root (or using sudo):
sudo /usr/local/skysql/maxscale/maxctl start
Also, there are some options you can pass:
  • -h - Print help
  • -f - Run with the spcified configuration file instead of MAXSCALE_HOME/etc/MaxScale.cnf
  • -n - Don't want for stop to finish before returning. Stopping MaxScale by sending a SIGTERM is how it is to be done, but it takes a short time for MaxScale to stop completely. By default maxctl will wait until MaxScale is stopped before returning, by passing this option MaxScale will return immediately though.
  • -q - Quiet operation
/Karlsson

Monday, August 4, 2014

What is HandlerSocket? And why would you use it? Part 1

HandlerSocket is included with MariaDB and acts like a simple NoSQL interface to InnoDB, XtraDB and Spider and I will describe it a bit more in this and a few upcoming blogs.

So, what is HandlerSocket? Adam Donnison wrote a great blog on how to get started with it, but if you are developing MariaDB applications using C, C++, PHP or Java what good does HandlerSocket do you?

HandlerSocket in itself is a MariaDB plugin, of a type that is not that common as is is a daemon plugin. Adam shows in his blog how to enable it and install it, so I will not cover that here. Instead I will describe what it does, and doesn't do.

A daemon plugin is a process that runs "inside" the MariaDB. A daemon plugin can implement anything really, as long as it is relevant to MariaDB. One daemon plugin is for examples the Job Queue Daemon and another then is HandlerSocket. A MariaDB Plugin has access to the MariaDB internals, so there is a lot of things that can be implemented in a daemon plugin, even if it is a reasonable simple concept.

Inside MariaDB there is an internal "handler" API which is used as an interface to the MariaDB Storage Engines, although not all engines supports this interface. MariaDB then has a means to "bypass" the SQL layer and access the Handler interface directly, and this is done by using the HANDLER commands with MariaDB. Note that we are not talking about a different API to uise the handler commands, instead they are executed using the same SQL interface as you are used it's, it's just that you use a different set of commands. One limitation of these commands is that they only allow reads, even though the internal Handler interface supports writes as well as reads.

When you use the HANDLER commands, you have to know not only the tables name that you access, but also the index you will be using when you access that table (assuming then you want to use an index, but you probably do). A simple example of using the HANDLER commands follows here:

# Open the orders table, using the alias o
HANDLER orders OPEN AS o;

# Read an order record using the primary key.
HANDLER o READ `PRIMARY` = (156);

# Close the order table.
HANDLER o CLOSE;

In the above example, I guess this is just overcomplication something basically simple, as all this does is the same as
SELECT * FROM orders WHERE id = 156;

The advantage of using the handler interface though is performance, for large datasets using the Handler interface is much faster.

All this brings up three questions:
  • First, if we are bypassing the SQL layer, by using the HANDLER Commands, would it not be faster to bypass the SQL level protocol altogether, and just use a much simple protocol?
  • Secondly, while we are at it, why don't we allow writes, as this is the biggest issues, we can always speed reads by scaling out anyway?
  • And last, how much faster is this, really?
The answer to the first two questions then is the Handler Socket plugin, which was the whole deal with this blog, as this use a separate, very simple, protocol and allows writes! For the third question, this is where I come in, I have done some simple INSERT style benchmarks using HandlerSocket, and I have some results for you in my next blog. So don't touch that dial, I'll be right back!

/Karlsson