Tuesday, October 11, 2016

Getting data out of SQL Server in a MariaDB friendly way.

I know what you are thinking "Oh my, it's that time again, Karlsson insist that data should get out of some other database and into MariaDB. What is he up to this time?" and that is exactly right, this time data is coming out of SQL Server, out of that expensive, closed source, Windows-only software and into the lightweight, fast and cost effective MariaDB.

I have already shown how this works when getting data out of Oracle in releasing MyOraDump, so inline with that the corresponding program to get data out of SQL Server is called MyMSSQLDump and it has features along same same lines as MyOraDump. To connect to SQL Server (or Sybase, but this I haven't tested) I use FreeTDS, which is a nice Open Source SQL Server / Sybase driver. The output formats supported are MySQL (i.e. A file with (INSERT INTO...), MSSQL (same as MySQL format but aligned for MSSQL), JSON, JSON Array (similar to JSON but all data in one array) and CSV. All formats are very flexible and the common SQL Server / Sybase datatypes are supported.

As usual, the code is written in C and use autotools for building. I have only tested building it on CentOS (6 and 7) so far, but there is nothing magic to it so it should work on other platforms too. Download it from sourceforge.

Happy SQL'ing
/Karlsson

Monday, September 19, 2016

Replication from Oracle to MariaDB - Part 3

This is the third part of a series of blogs on how to do Change Data Capture from Oracle in an effort to be able to replicate that data to MariaDB. Part 1 dealt with some basics on the Oracle side of things regarding the Oracle redo log which we use as a source for out CDC efforts and Part 2 was about how to get this to work. We left part2 with an example of how to Replicate INSERTs from Oracle to MariaDB, but that is hardly good enough, we also have to deal with UPDATE and DELETE to make it at least somewhat complete, and this is what we will have a look at in this post.

And I know what you are asking your selves now, you spent all that time explaining the Oracle ROWID and then you never used that. That was an awful waste of time. And no, it wasn't a waste, we will put it to good use now when we look at UPDATEs.

For an INSERT, things are reasonably simple, right, we just add a row with some columns to the table. As for an UPDATE or a DELETE though, we need a means to identify the row to be updated. To reiterate one thing, the Oracle redo log is not like the MariaDB binlog here, and here I am assuming that the binlog format is STATEMENT. Let's look at an example, and for that let's insert some data into out EMP table (see the previous port for details).
SQL> INSERT INTO emp VALUES(2, 'John', 'Developer', 12000, 20);
SQL> INSERT INTO emp VALUES(3, 'Georgina', 'Design', 11000, 20);
SQL> INSERT INTO emp VALUES(4, 'Anne', 'Assistant', 8000, 30);
SQL> INSERT INTO emp VALUES(5, 'Marge', 'HR Mgr', 14000, 30);
SQL> COMMIT;
With that in place, let's now ensure that this is also replicated to MariaDB by running the script from the last post:
$ sqlplus -S / as sysdba @cdcemp.sql | mysql -u anders -panders anders
And to ensure that we got this right, let's check this out in MariaDB now:
$ mysql -u anders -panders anders
MariaDB [anders]> SELECT * FROM emp;
+-------+----------+-----------+----------+--------+
| empno | ename    | job       | sal      | deptno |
+-------+----------+-----------+----------+--------+
|     1 | Anders   | Sales     | 10000.00 |     10 |
|     2 | John     | Developer | 12000.00 |     20 |
|     3 | Georgina | Design    | 11000.00 |     20 |
|     4 | Anne     | Assistant |  8000.00 |     30 |
|     5 | Marge    | HR Mgr    | 14000.00 |     30 |
+-------+----------+-----------+----------+--------+
5 rows in set (0.00 sec)
OK, seems like we got it right so far. Then let's try an UPDATE and raise the salary for the two developers in Department 20:
$ sqlplus anders/anders
SQL> UPDATE emp SET sal = sal + 100 WHERE deptno = 20;
Now, what does this result in? If this was MariaDB with STATEMENT binlog format, the binlog would have something like this in it:
# at 9535
#160919 13:00:44 server id 112  end_log_pos 9652 CRC32 0x987f9dea       Query   thread_id=101   exec_time=0     error_code=0
SET TIMESTAMP=1474282844/*!*/;
UPDATE emp SET sal = sal + 100 WHERE deptno = 20
/*!*/;
We've seen this before, no surprises there. Now, let's try this from SQL*Plus and see what the SQL_REDO column in V$LOGMNR_CONTENTS looks like.
update "ANDERS"."EMP" set "SAL" = '12100' where "SAL" = '12000' and ROWID = 'AAAE7RAAEAAAAK9AAB';
update "ANDERS"."EMP" set "SAL" = '11100' where "SAL" = '11000' and ROWID = 'AAAE7RAAEAAAAK+AAA';

Whoa, that doesn't look like our UPDATE at all, what's going on? Well, I already told you that the Oracle redo log doesn't work like the binlog. In this case, we get an SQL statement from it, but it is not the same SQL statement that we asked Oracle to execute for us. No, this SQL is reconstructed from the redo log record, which logs updates row by row, and uses the ROWID to identify the row that was changed. We have already discussed that the ROWID is unique in a table and to an extent even acroess tables. The issue now is we don't have a ROWID in MariaDB. But the ROWID in the redo log is available in the ROW_ID column in the V$LOGMNR_CONTENTS table. And if we assume that all tables that we replicate has to have a PRIMARY KEY, then we could use the ROWID to look up that PRIMARY KEY and use that for UPDATE operations in MariaDB, right? Something like this in our script:
SELECT empno INTO v_empno FROM anders.emp WHERE rowid = v_row_id;
That works but there is one issue with it. We get out data from the redo log, which is historic, so to speak, but we look at the PRIMARY KEY using the ROW ID based on how data looks right now, which means that if you are in the habit of updating your PRIMARY KEYs (don't do this, by the way, this is a bad habit even if we even if we exclude the specific use case we look at here), then this isn't going to fly. Sorry then, I will not fix that for you, if you update a row and then update it's primary key, the first update will fail.

Then let's look at DELETEs. Can we use the same method as used for UPDATEs, it's tempting, I know, but it is not going to work at all. Think about it, we just determined that we get the ROWID for a row that is historic, but in the case of a DELETE there is no "current row" as we DELETEd it? Right? Whoa, how do we solve that? Are we stuck now, out in the desert, the hot Oracle sun is shining on our unprotected pale skin and it's getting hotter and hotter. Our old Studebaker has broken down by the side of the abandoned dirt road, and you stare into the unforgiving sun and you know this is it?

Nope. This can be fixed, luckily! The Oracle redo log contains redo data, that is what we have been using so far. The thing is that it also contains undo data! Hey! And then we think about this amazing concept a bit and consider what is necessary to undo a delete? Frankly, the value of ALL columns in the deleted row, right? Our friendly DBMS_LOGMNR.MINE_VALUE function comes to the rescue, but this time not to operate on the redo value but on the undo value!

Armed with all this, we can now revisit out cdcemp.sql script and add stuff to handle UPDATE and DELETE operations also:
SET serveroutput ON
SET linesize 4000
SET feedback off
DECLARE
   v_scn NUMBER;
   v_scnstart NUMBER;
   v_scnend NUMBER;
   v_redo_value NUMBER;
   v_undo_value NUMBER;
   v_operation VARCHAR2(32);
   v_row_id VARCHAR2(18);
   v_sqlstmt VARCHAR2(4000);
   v_firstcol BOOLEAN;
   v_empno NUMBER;
   CURSOR cur1 IS
     SELECT scn, redo_value, undo_value, operation, row_id
       FROM v$logmnr_contents
       WHERE seg_owner = 'ANDERS' AND table_name = 'EMP'
         AND operation IN('INSERT', 'UPDATE', 'DELETE');
BEGIN
-- Start LogMiner.
   SELECT NVL(MIN(l.scn) + 1, MIN(e.ora_rowscn)) INTO v_scnstart
     FROM anders.emp e, scnlog l
     WHERE l.table_name = 'EMP';
   SELECT MAX(e.ora_rowscn) INTO v_scnend
     FROM anders.emp e;
   DBMS_OUTPUT.PUT_LINE('-- SCN Range: ' || v_scnstart || ' - ' || v_scnend);
   DBMS_LOGMNR.START_LOGMNR(STARTSCN => v_scnstart,
     ENDSCN => v_scnend,
     OPTIONS => DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.CONTINUOUS_MINE +
     DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
-- Dummy update to insert a more recent SCN to ensure that we don't have to wait
-- for the next update in the following select.
   UPDATE scnlog SET table_name = table_name WHERE table_name = 'EMP';
   COMMIT;

-- Open cursor to get data from LogMiner.
   OPEN cur1;

-- Loop for all the rows in the redo log since the last time we ran this.
   LOOP
      FETCH cur1 INTO v_scn, v_redo_value, v_undo_value, v_operation, v_row_id;
      EXIT WHEN cur1%NOTFOUND;
      IF v_operation = 'INSERT' THEN
         v_sqlstmt := 'INSERT INTO emp(EMPNO, ENAME, JOB, SAL, DEPTNO) VALUES(';
         v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.EMPNO') || ', ';
         v_sqlstmt := v_sqlstmt || '''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.ENAME') || ''', ';
         v_sqlstmt := v_sqlstmt || '''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.JOB') || ''', ';
         v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.SAL') || ', ';
         v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.DEPTNO') || ')';
      ELSIF v_operation = 'UPDATE' THEN
         v_sqlstmt := 'UPDATE emp SET ';
         v_firstcol := TRUE;

-- Build UPDATE statement using only column in redo log record.
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.EMPNO') = 1 THEN
            v_sqlstmt := v_sqlstmt || 'EMPNO = ' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.EMPNO');
            v_firstcol := FALSE;
         END IF;
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.ENAME') = 1 THEN
            v_sqlstmt := v_sqlstmt || CASE WHEN NOT v_firstcol THEN ', ' ELSE '' END;
            v_sqlstmt := v_sqlstmt || 'ENAME = ''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.ENAME')
              || '''';
            v_firstcol := FALSE;
         END IF;
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.JOB') = 1 THEN
            v_sqlstmt := v_sqlstmt || CASE WHEN NOT v_firstcol THEN ', ' ELSE '' END;
            v_sqlstmt := v_sqlstmt || 'JOB = ''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.JOB')
              || '''';
            v_firstcol := FALSE;
         END IF;
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.SAL') = 1 THEN
            v_sqlstmt := v_sqlstmt || CASE WHEN NOT v_firstcol THEN ', ' ELSE '' END;
            v_sqlstmt := v_sqlstmt || 'SAL = ' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.SAL');
            v_firstcol := FALSE;
         END IF;
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.DEPTNO') = 1 THEN
            v_sqlstmt := v_sqlstmt || CASE WHEN NOT v_firstcol THEN ', ' ELSE '' END;
            v_sqlstmt := v_sqlstmt || 'DEPTNO = ' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.DEPTNO');
            v_firstcol := FALSE;
         END IF;

-- Get empno from emp using rowid.
         SELECT MAX(empno) INTO v_empno FROM anders.emp WHERE rowid = v_row_id;
         IF v_empno IS NULL THEN
           CONTINUE;
         END IF;
         v_sqlstmt := v_sqlstmt || ' WHERE empno = ' || v_empno;
      ELSIF v_operation = 'DELETE' THEN
-- Get the empno from the undo record.
         v_empno := DBMS_LOGMNR.MINE_VALUE(v_undo_value, 'ANDERS.EMP.EMPNO');
         IF v_empno IS NULL THEN
           CONTINUE;
         END IF;
         v_sqlstmt := 'DELETE FROM emp WHERE empno = ' || v_empno;
      END IF;

      DBMS_OUTPUT.PUT_LINE('-- SCN = ' || v_scn);
      DBMS_OUTPUT.PUT_LINE(v_sqlstmt || ';');
   END LOOP;
   IF v_scn IS NOT NULL THEN
     UPDATE scnlog SET scn = v_scn WHERE table_name = 'EMP';
     COMMIT;
   END IF;
   CLOSE cur1;
END;
/
EXIT

Let's try this script, and see if the UPDATE to the sal column that I did at the beginning of this post are properly replicated to MariaDB:
$ sqlplus -S / as sysdba @cdcemp.sql | mysql -u anders -panders anders
And then we check what we have in MariaDB:
$ mysql -u anders -panders anders
MariaDB [anders]> SELECT * FROM emp;
+-------+----------+-----------+----------+--------+
| empno | ename    | job       | sal      | deptno |
+-------+----------+-----------+----------+--------+
|     1 | Anders   | Sales     | 10000.00 |     10 |
|     2 | John     | Developer | 12100.00 |     20 |
|     3 | Georgina | Design    | 11100.00 |     20 |
|     4 | Anne     | Assistant |  8000.00 |     30 |
|     5 | Marge    | HR Mgr    | 14000.00 |     30 |
+-------+----------+-----------+----------+--------+
5 rows in set (0.00 sec)
And then see if we can also handle a DELETE. In SQL*Plus we run this:
SQL> DELETE FROM emp WHERE ename = 'John';
SQL> COMMIT;
The we run out script:
$ sqlplus -S / as sysdba @cdcemp.sql | mysql -u anders -panders anders
And then we check what we have in MariaDB:
$ mysql -u anders -panders anders
MariaDB [anders]> SELECT * FROM emp;
+-------+----------+-----------+----------+--------+
| empno | ename    | job       | sal      | deptno |
+-------+----------+-----------+----------+--------+
|     1 | Anders   | Sales     | 10000.00 |     10 |
|     3 | Georgina | Design    | 11100.00 |     20 |
|     4 | Anne     | Assistant |  8000.00 |     30 |
|     5 | Marge    | HR Mgr    | 14000.00 |     30 |
+-------+----------+-----------+----------+--------+
4 rows in set (0.00 sec)

Bazinga, it worked! Are we done then? I'm afraid not, we still have transactions to take care of, but that has to wait a day or two.

Happy SQL'ing
/Karlsson

Friday, September 16, 2016

Replication from Oracle to MariaDB - Part 2

In part 1 on this series of blogs on how to replcate from Oracle to MariaDB, we looked at how to get data out from Oracle, and in an effort to look more modern than I really am, let's say that what we were attempting was CDC (Change Data Capture). Hey, I feel like I'm 18 again! Modern, cool! Maybe I should start to wear a baseball cap backwards and have my jeans no higher than my knees. Or again, maybe not. But CDC it is.

When I left you in the previous post we have just started to get some data. reluctantly, out of Oracle. But the SQL statement we got from the LogMiner was hardly useful. But fact is, we can fix that. Before we do that we need to look at a few other functions in LogMiner. The two functions we want to loot at are called DBMS_LOGMNR.COLUMN_PRESENT and DBMS_LOGMNR.MINE_VALUE. These two functions are simple enough, they both take two arguments, first a reference to a redo log record, which is the value of the column redo_value in the V$LOGMNR_CONTENTS table, and the other argument is the fully qualified name of the table in question. The table name of the redo log record is, surprisingly, in the table_name column. The reason we need two functions for this is that the second of these functions can return NULL, both when the column is question doesn't exist and well as when the value is NULL.

I will for the rest of this post assume that there exists an Oracle user called anders with the password anders and that the same thing exists in MariaDB, where there is also a database called anders (you already guess that, right?) where the user anders has full access.

So with the table_name of the value of the involved columns, we are ready to go. Sure. let's look at an insert.  First we need a table to test this on, and let's use the classic Oracle EMP sample table for this, but I have simplified this a bit to make my example easier to grasp without getting into too many details. So this is what it looks on the Oracle side of things, and let's call this script cretabora.sql:
CREATE TABLE anders.emp(
  empno NUMBER NOT NULL PRIMARY KEY,
  ename VARCHAR2(10) NOT NULL,
  job VARCHAR2(9) NOT NULL,
  sal NUMBER(7,2) NOT NULL,
  deptno NUMBER NOT NULL) TABLESPACE users;
What we also need is some way to keep track of processing, in this case we want to keep track of the SCN that was last used, so we know where to start when we get the next chunk of CDCs (He, I used it again. Boy am I young and hot today). And add this to the script above:
CREATE TABLE scnlog(
  table_name VARCHAR2(30) NOT NULL PRIMARY KEY,
  scn NUMBER) TABLESPACE users;
INSERT INTO anders.scnlog
  SELECT 'EMP', current_scn FROM v$database;
I also allocated space for the EMP table and set the SCN to the current database SCN. As I am using the V$DATABASE virtual table, I have to run the above as a privileged user.
$ sqlplus / as sysdba @cretabora.sql

OK, now I have some tables to work with. Before to go on let's also create the same objects in a MariaDB database. The minimum thing we need is something is, let's call this script cretabmaria.sql:
CREATE TABLE emp(
  empno INTEGER NOT NULL PRIMARY KEY,
  ename VARCHAR(10) NOT NULL,
  job VARCHAR(9) NOT NULL,
  sal DECIMAL(7,2) NOT NULL,
  deptno INTEGER NOT NULL);
I will not explain this in detail, you get it already I think:
$ mysql -u root
MariaDB [(none)]> CREATE USER 'anders'@'%' IDENTIFIED BY 'anders';
MariaDB [(none)]> CREATE DATABASE anders;
MariaDB [(none)]> GRANT ALL ON anders.* TO 'anders'@'%';
MariaDB [(none)]> exit
$ mysql -u anders -panders anders < cretabmaria.sql
The above really shows that MariaDB is easier to deal with than Oracle, right? But that wasn't what we were looking for here, and we're getting closer now, aren't you excited?

Let's insert a single row of data in the EMP table, just to get started:
$ sqlplus anders/anders
SQL> INSERT INTO emp VALUES(1, 'Anders', 'Sales', 10000, 10);
SQL> COMMIT;

Now we are ready to extract data from the redo log. For this we will run a small script that runs LogMiner, extracts rows from the redo log and converts that to valid MariaDB SQL syntax. Create a script called cdcemp.sql with this content:
SET serveroutput ON
SET linesize 4000
SET feedback off
DECLARE
   v_scn NUMBER;
   v_scnstart NUMBER;
   v_scnend NUMBER;
   v_redo_value NUMBER;
   v_sqlstmt VARCHAR2(4000);
   CURSOR cur1 IS
     SELECT scn, redo_value
       FROM v$logmnr_contents
       WHERE seg_owner = 'ANDERS' AND table_name = 'EMP'
         AND operation = 'INSERT';
BEGIN
-- Start LogMiner.
   SELECT NVL(MIN(l.scn) + 1, MIN(e.ora_rowscn)) INTO v_scnstart
     FROM anders.emp e, scnlog l
     WHERE l.table_name = 'EMP';
   SELECT MAX(e.ora_rowscn) INTO v_scnend
     FROM anders.emp e;
   DBMS_OUTPUT.PUT_LINE('-- SCN Range: ' || v_scnstart || ' - ' || v_scnend);
   DBMS_LOGMNR.START_LOGMNR(STARTSCN => v_scnstart,
     ENDSCN => v_scnend,
     OPTIONS => DBMS_LOGMNR.COMMITTED_DATA_ONLY +
       DBMS_LOGMNR.CONTINUOUS_MINE +
       DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
-- Dummy update to insert a more recent SCN to ensure
-- that we don't have to wait for the next update in the
-- following select.
   UPDATE scnlog SET table_name = table_name WHERE table_name = 'EMP';
   COMMIT;

-- Open cursor to get data from LogMiner.
   OPEN cur1;

-- Loop for all the rows in the redo log since the last time we ran this.
   LOOP
      FETCH cur1 INTO v_scn, v_redo_value;
      EXIT WHEN cur1%NOTFOUND;
      v_sqlstmt := 'INSERT INTO emp(EMPNO, ENAME, JOB, SAL, DEPTNO)'
        || 'VALUES(';
      v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value,
         'ANDERS.EMP.EMPNO') || ', ';
      v_sqlstmt := v_sqlstmt || '''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value,
        'ANDERS.EMP.ENAME') || ''', ';
      v_sqlstmt := v_sqlstmt || '''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value,
        'ANDERS.EMP.JOB') || ''', ';
      v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value,
        'ANDERS.EMP.SAL') || ', ';
      v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value,
        'ANDERS.EMP.DEPTNO') || ')';

      DBMS_OUTPUT.PUT_LINE('-- SCN = ' || v_scn);
      DBMS_OUTPUT.PUT_LINE(v_sqlstmt || ';');
   END LOOP;
   IF v_scn IS NOT NULL THEN
      UPDATE scnlog SET scn = v_scn WHERE table_name = 'EMP';
      COMMIT;
   END IF;
   CLOSE cur1;
END;
/
EXIT

The first three lines in this script are SQL*Plus specific settings. Then we declare some variables and a cursor for the V$LOGMNR_CONTENTS table. Following that we get the starting and ending SCN that we want and use that to start LogMiner and output some information on what we are doing. Then comes a wacko UPDATE. This seems to be necessary to ensure that the redo log is  progressed past the last log record so that we can get the data. Without this the SELECT from the cursor would wait until someone else generated some redo log content. This is a kludge, I know, but it seems to work. Also, Oracle isn't smart enough to skip these kinds of dummy updates.

Following this the cursor is opened and we get the rows from it and build a suitable SQL statement for MariaDB. Finally when the loop is over I update the SCN log table so I know where to start the next run. This script can be done to generate incremental updates from Oracle to be inserted into MariaDB. And with all that, let's now finally run the script, and here I will run it and extract the data to MariaDB:
$ sqlplus -S / as sysdba @cdcemp.sql | mysql -u anders -panders anders

Before we finish, let's check if this worked:
$ mysql -u anders -panders anders
MariaDB [anders]> SELECT * FROM emp;
+-------+--------+-------+----------+--------+
| empno | ename  | job   | sal      | deptno |
+-------+--------+-------+----------+--------+
|     1 | Anders | Sales | 10000.00 |     10 |
+-------+--------+-------+----------+--------+
1 row in set (0.00 sec)

That's it for this time. There are few few more things to look into though, like managing transactions on the MariaDB side for example. And UPDATEs and DELETEs, those require some special care. But we are getting there now, right? So stay tuned for the third post in this series.

Happy SQL'ing
/Karlsson



Thursday, September 15, 2016

Replication from Oracle to MariaDB - Part 1

If you have read my blog before, you know that I have shown a couple of ways of getting data from an Oracle database and into something more approachable, such as a text-file or even, surprise, MariaDB (or MySQL for that matter). Not that the Oracle database really likes you doing this and some tweaking is necessary. One way I have shown before to do this is by using triggers, a UDF, UPD-communication and a simple server, this was a series of blogposts starting with this one: Replication from Oracle to MariaDB the simple way. Another means is to exports data from Oracle in some text format, something Oracle really doesn't like you do it, it really likes to keep the data to itself, but yours truly worked around this and now you can use the Oracle dump utility and this has been developed quite a bit and now works really well and for most standard Oracle datatypes.

But replicating properly, without triggers or anything, would be nice to do using only Open Source stuff, or maybe we can do without any special tools at all? One way would be to put a proxy in front of Oracle. Useful, but takes too much time to develop and requires that one figure out the Oracle communication protocol, which is not something I am prepared to do (I am lazy, I know).

So this time around I will try something different, a way to replicate straight out of Oracle. I will eventually show you a practical example of doing this, without too much code, but before we get there I'll explain how it works. The more difficult part here is getting the data replicated properly out from Oracle, so this first installment of a series of blog post will be more on Oracle than MariaDB specifically, but this is all for a good cause (i.e. get stuff done without having to pay) so let's get started.

To begin with, there is no binlog in Oracle (nor is there an update log, if you remember that puppy). What Oracle does have though is a redo log, which is also there in InnoDB / XtraDB by the way, and it serves the same purpose, to be able to recover the database should there be a crash. This log is, because of the purpose it is used for, logging on a much lower level compared to the binlog. Looking at the Oracle redo log, there are log records on any modifying event in the database, including a lot of internal Oracle processing, Oracle does do a lot of stuff in the background and manages a lot more stuff that MariaDB instead leaves to the operating system, like file space management and such things. This is not to say Oracle is better than MariaDB in this respect, nor the other way around, but Oracle was born at a time where you had to support so many different operating systems, so some layers were necessary to create some transparency. Also Oracle was born at a time when having 5Mb of disk space was a lot of disk and when real estate was something that you handled conservatively (which is a giveaway of my age. Yes, I was born before "My Boy Lollipop" was a hit with Mille Small. And contrary to popular belief, it is not Rod Stewart that plays the harmonica solo in that song. And that ends todays "useless knowledge" lesson).

Before we go into looking at the Oracle redo log in detail, there are a couple of things I want to explain first. There is another file in Oracle, a small but important one called the control file. The control file is what keeps track of all the other files and related settings in the database. As for the redo log files, these can be used in two ways. The basic way is what is called NOARCHIVELOG mode, which means that the log files are reused. This is similar to how the InnoDB / XtraDB logfiles are used, and the purpose of this is to provide a means to recover after a crash. Thinking about it, you realize that these should then also be able to be used for, say, point in time recovery, assuming that we saved them, i.e. starting with a physical backup and then applying the saved files. With MariaDB we instead perform point in time recovery using the binlog. To do a point in time recovery with Oracle, and use the redo log for this purpose, we have to save the redo log files, and this is what happens when you run in ARCHIVELOG mode, in which case a separate Oracle process will "archive", which is Oracle lingo for "saving in some other place", the redo log files when they are full and a switch is done to the next logfile. All of this, the current and archived redo log files, are recorded in the Oracle control file.

OK, A lot of Oracle stuff, and I warned you, but this is critical to understanding how the code I present later works. There is one more thing I want to discuss briefly which is the Oracle ROWID. A ROWID in oracle is a unique identifier of a row in a table. The ROWID is actually a physical adress and there are cases when the same ROWID can be used for two different tables, which is when the tables are clustered, but for a single table the ROWID uniquely identifies a row. To use the ROWID of a table, one can use the ROWID pseudocolumn, which can be used in the WHERE clause as well as in the SELECT list.

So, with all those things explained, lets put it all together. The redo log of Oracle contains all the changes to the database, so it really should be possible to use this for the same purpose that we use the binlog in MariaDB, i.e. replication. To do that we first need to be able to read the redo log, and we could either read the redo log file itself, but this is rather complex and I'm not even sure this really is a good idea, but if you insist you can read this document for example. Instead we are going to use an Oracle utility. For the DML that interests us, all we need is the INSERT, UPDATE and DELETE statements that deal with the table we want to replicate and then the COMMIT when the transaction in question ends. For these operations, some of the data we want that is in the redo log are:
  • The type of operation
  • An identifier of the table (not the actual table name though)
  • The changed columns and values
  • The ROWID of the row in question
  • And the SCN (System Change Number).
And then I have to describe the SCN I guess. The SCN is simple, it's a 48-bit integer value that identified the transaction and that is incrementing for each transaction. You can get the SCN of the last change in a table, which is actually the SCN of the block where the row exists (but this you can change to track the actual ROW. Oracle has sooo many strange options to cater for all the special needs of it's users).

Before I finish this blog post, I want to show some code on how to get data from the redo log. And before you ask, no, there is no oraredolog utility or something like that, this isn't MariaDB, this is Oracle. A utility in Oracle is mostly not a utility per se, but rather a Oracle PL/SQL package, in this case this is LogMiner, or as a package DBMS_LOGMNR. For LogMiner to work we need to run in archivelog mode, so if you have a fresh Oracle installation, log in as a priviledged user:
$ sqlplus / as sysdba
And run the commands
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;
SQL> ALTER DATABASE ARCHIVELOG;
SQL> ALTER DATABASE OPEN;
Also, to get all the data we want from LogMiner we have to add "supplemental data" to the redo log. What this means exactly I haven't looked in to but trust me, you want it. So from the same SQL*Plus session, run:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

With that in place we are ready to start mining the log, this consists of two steps, first we start logminer itself, and then we select from a table that logminer populates. Let's look at how we can start logminer with an sql script, let's call this script startlog.sql:
column min_scn new_value startscn
column current_scn new_value endscn
SELECT MIN(FIRST_CHANGE#) min_scn FROM v$log;
SELECT current_scn FROM v$database;
EXECUTE DBMS_LOGMNR.START_LOGMNR(STARTSCN => &startscn, -
  ENDSCN => &endscn, -
  OPTIONS => DBMS_LOGMNR.COMMITTED_DATA_ONLY + -
  DBMS_LOGMNR.CONTINUOUS_MINE + -
  DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
SET ARRAYSIZE 1
And then we run this script and as there is no explicit exit in it, you get the SQL*Plus prompt back after running it, which is not how the MariaDB command line works:
$ sqlplus / as sysdba @startlog.sql
And now we can get some log data, let's say we have a table called T1 that we have done some changes to, then we can get the relevant records like this:
SQL> SELECT sql_redo FROM v$logmnr_contents WHERE table_name = 'T1'  AND seg_owner = 'ANDERS' AND operation IN ('INSERT', 'UPDATE', 'DELETE');
SQL_REDO
--------------------------------------------------------------------------------
insert into "ANDERS"."T1"("C1","C2") values ('1','2');
insert into "ANDERS"."T1"("C1","C2") values ('2','2');
update "ANDERS"."T1" set "C2" = '3' where "C2" = '2' and ROWID = 'AAAE5KAAEAAAAFfAAA';
update "ANDERS"."T1" set "C2" = '3' where "C2" = '2' and ROWID = 'AAAE5KAAEAAAAFfAAB';

This hardly looks complete? But there is something in there that we possibly could make some sense from, right? Let's try that in the next blog post in this series. And this post has been all about Oracle, more or less, now in the next post we will get closer to replicating to MariaDB, which is where we want our data to reside anyway? Right?

Happy SQL'ing
/Karlsson

Monday, August 29, 2016

Creating a MariaDB MaxScale router module

I wanted to do some tests with MariaDB MaxScale and realized that the two existing routers (beyond the binlog router that is, which is a bit special) didn't do what I wanted them to do. What I was looking for was a simple round-robin feature and none of readconnroute nor readwritesplit could be configured to do this, they are just too smart for my simple experiment.

Why would you want a round-robin router then? Well, one use case is when you are INSERTing a lot of data and you just want to persist it. You don't have the use case where you have to SELECT data from all servers, but in the case you need it, you just select from all servers until you find what you need. Let's think about log data that you don't care much about but that  you for some reason need to retain, maybe for corporate policy reasons or legal reasons. Using round-robin could, in theory, give you better performance, but that would require something way smarter than what I am proposing here. Rather, you get INSERT availability, i.e. you will always have some server to insert into and secondly, you get INSERT sharding, which is basic but useful, you only store so much data on each server.

So, let's get to work. To begin with you need the MaxScale source tree, place yourself in some directory where you want this and do this:
$ git clone https://github.com/mariadb-corporation/MaxScale.git
Now you should have a directory called MaxScale, so pop in there are, create a build directory and then run cmake to configure MaxScale itself:
$ cd MaxScale
$ mkdir build
$ cd build
$ cmake ..
$ make
These are the quick instructions and you will probably find that you lack some dependencies. The full instructions for how to do this is available as part of the sample code as presented later in this document, and that is available from Sourceforge. Browse to https://sourceforge.net/projects/mxsroundrobin/files and then click on roundrobin 1.0 where you find a pdf with detailed instructions. Also there is a tgz there will all the sourcecode presented late in this blog.

So, now we have something to work with and the plan is to introduce a new router module in this tree. To begin with pop over to where routers module code is and create a directory for our code there:
$ cd ../server/modules/routing
$ mkdir roundrobin
$ cd roundrobin
Before we can start building some code, let's look at the basics of what kind of code gets into a module.

A plugin is a shared object that is loaded by MaxScale core when it starts. Early on when MaxSCale starts it reads the configuration file, /etc/maxscale.cnf by default, and in there each service defines a router. Note that several services can use the same router so our code we write later has to take this into account. Look at this extract of a service section for example:
[Read-Write Service]
type=service
router=readwritesplit
The router here tells MaxScale to look for a readwritesplit module, or in technical terms, it will load the shared library: libreadwritesplit.so. After loading this library successfully, MaxScale has to figure out a few things about this module, like it's name and version and above all, the entry points for the functions that MaxScale will call when processing a connection. In addition we need to define a few structs that are passed around these different calls to give the different router functions some context. Lets start with a header file rooundrobin.h in the roundrobin directory:
#ifndef ROUNDROBIN_H
#define ROUNDROBIN_H
#include <server.h>

typedef struct tagROUNDROBININST *PROUNDROBININST;

typedef struct tagROUNDROBIN_CLIENT_SES {
  SPINLOCK lock;
  bool bClosed;
  SERVER **pBackends;
  SESSION *pSession;
  DCB **pdcbClients;
  unsigned int nBackends;
  unsigned int nCurrBackend;
  PROUNDROBININST pRouter;
  struct tagROUNDROBIN_CLIENT_SES *pNext;
} ROUNDROBIN_CLIENT_SES, *PROUNDROBIN_CLIENT_SES;

typedef struct tagROUNDROBININST {
  SERVICE *pService;
  PROUNDROBIN_CLIENT_SES pConnections;
  SPINLOCK lock;
  SERVER **pBackends;
  unsigned int nBackends;
  struct tagROUNDROBININST *pNext;
} ROUNDROBININST;
#endif
As you can see, the main thing here is that I define and typedef two structs. As I said, I have mostly been looking at other existing routers and grabbed the stuff in there, so I can't explain all aspects of these structs, but let's look at a few members:
  • These structs are in a linked list and the pNext member is a pointer to the next element in this list.
  • The lock members is a reference to a spinlock associated with the struct.
  • The pBackends member is a pointer to an array of pointers to the database SERVERS that this service is attached to.
  • The pbcdClients member is an array of pointers to DCDs. A DCB is the Descriptor Control Block which is a generic descriptor of a connection inside MaxScale, be it a server or a client. In this case this is the DCBs to the SERVERs in pBackends.
  • The nBackends is the number of elements in the pBackends and pdcbClients arrays.
  • The pRouter member is a pointer to the ROUNDROBININST for the connection.
That is the most of that, the next step now is to start with the more exiting stuff of the actual code take make up this module. The main source file we work with here is roundrobin.c and we need a few basics in this. Let's have a look the beginning of roundrobin.c:
#include <my_config.h>
#include <router.h>
#include <query_classifier.h>
#include <mysql_client_server_protocol.h=>
#include "roundrobin.h"

/* Macros. */
#define ROUNDROBIN_VERSION "1.0.0"

/* Globals. */
MODULE_INFO info = {
  MODULE_API_ROUTER,
  MODULE_GA,
  ROUTER_VERSION,
  "A simple roundrobin router"
};
static PROUNDROBININST pInstances;

/* Function prototypes for API. */
static ROUTER *CreateInstance(SERVICE *service, char **options);
static void *CreateSession(ROUTER *pInstance, SESSION *session);
static void CloseSession(ROUTER *pInstance, void *session);
static void FreeSession(ROUTER *pInstance, void *session);
static int RouteQuery(ROUTER *pInstance, void *session, GWBUF *queue);
static void Diagnostic(ROUTER *pInstance, DCB *dcb);
static void ClientReply(ROUTER *pInstance, void *router_session,
  GWBUF *queue, DCB *backend_dcb);
static void HandleError(ROUTER *pInstance, void *router_session,
  GWBUF *errmsgbuf, DCB *backend_dcb, error_action_t action,
  bool *succp);
static int GetCapabilities();

static ROUTER_OBJECT RoundRobinRouter = {
  CreateInstance,
  CreateSession,
  CloseSession,
  FreeSession,
  RouteQuery,
  Diagnostic,
  ClientReply,
  HandleError,
  GetCapabilities
};
Let's now look at what is going on here. To begin with I include a few necessary files, including roundrobin.h that we created above and then a macro is defined. Then the MODULE_INFO struct follows. The information in this is used by MaxScale to get information on the router, but if you leave this out, currently MaxScale will start anyway. The command show modules in maxadmin will return the information in this struct for the module.


Then follows a number of function prototypes, and these are needed here before the ROUTER_OBJECT struct, and this is the key to the router as it provides the entry points for MariaDB itself. Again, I will not specify exactly what all of these do, I have mostly just grabbed code from other routers.

Following this we need some basic functions that all routers implement, to initialize the module, get the version and a function to return the ROUTER OBJECT defined above:
/*
 * Function: ModuleInit()
 * Initialize the Round Robin router module.
 */
void ModuleInit()
   {
   MXS_NOTICE("Initialise roundrobin router module version " ROUNDROBIN_VERSION ".");
   pInstances = NULL;
   } /* End of ModuleInit(). */


/*
 * Function: version()
 * Get the version of the roundrobin router
 */
char *version()
   {
   return ROUNDROBIN_VERSION;
   } /* End if version(). */


/*
 * Function: GetModuleObject()
 * Get the object that describes this module.
 */
ROUTER_OBJECT *GetModuleObject()
   {
   return &RoundRobinRouter;
   } /* End of GetModuleObject(). */
With that we have completed the housekeeping code and are ready to look at the functions that implement the actual functionality. We'll look at CreateInstance first which, as the name implies, creates an snstance of RoundRobin. Note that within a running MaxScale there might well be more than one instance, one for each RoundRobin service.
/*
 * Function: CreateInstance()
 * Create an instance of RoundRobing router.
 */
ROUTER *CreateInstance(SERVICE *pService, char **pOpts)
   {
   PROUNDROBININST pRet;
   PROUNDROBININST pTmp;
   SERVER_REF *pSvcRef;
   unsigned int i;

   MXS_NOTICE("Creating roundrobin router instance.");
/* Allocate the RoundRobin instance struct. */
   if((pRet = malloc(sizeof(ROUNDROBININST))) == NULL)
      return NULL;
   pRet->pService = pService;
   pRet->pConnections = NULL;
   pRet->pNext = NULL;
   pRet->nBackends = 0;

/* Count the number of backend servers we manage. */
   for(pSvcRef = pService->dbref; pSvcRef != NULL; pSvcRef = pSvcRef->next)
      pRet->nBackends++;

/* Allocate space for the backend servers and add to the instance struct. */
   if((pRet->pBackends = calloc(pRet->nBackends, sizeof(SERVER *))) == NULL)
      {
      free(pRet);
      return NULL;
      }

   spinlock_init(&pRet->lock);

/* Set up list of servers. */
   for(i = 0, pSvcRef = pService->dbref; pSvcRef != NULL; i++, pSvcRef = pSvcRef->next)
      pRet->pBackends[i] = pSvcRef->server;

/* Set up instance in list. */
   if(pInstances == NULL)
      pInstances = pRet;
   else
      {
      for(pTmp = pInstances; pTmp->pNext != NULL; pTmp = pTmp->pNext)
         ;
      pTmp->pNext = pRet;
      }

   MXS_NOTICE("Created roundrobin router instance.");
   return (ROUTER *) pRet;
   } /* End of CreateInstance(). */
Again, nothing really exiting is happening, I create a struct that defines the instance, initialize it and add it to the linked list of instances that I maintain. Also I get references to the backend servers that this instance use and set up the array for the and I also initialize the spinlock. With that, we are done. Then there is the issue of creating a session, and this function gets called when a client connects to MaxScale through the port that is linked to RoundRobin.
/*
 * Function: CreateSession()
 * Create a session in the RoundRobin router.
 */
void *CreateSession(ROUTER *pInstance, SESSION *session)
   {
   PROUNDROBIN_CLIENT_SES pRet;
   PROUNDROBIN_CLIENT_SES pTmp;
   PROUNDROBININST pRoundRobinInst = (PROUNDROBININST) pInstance;
   unsigned int i;

/* Allocating session struct. */
   if((pRet = malloc(sizeof(ROUNDROBIN_CLIENT_SES))) == NULL)
      return NULL;
   spinlock_init(&pRet->lock);
   pRet->pNext = NULL;
   pRet->nCurrBackend = 0;
   pRet->pSession = session;
   pRet->pRouter = pRoundRobinInst;
   pRet->nBackends = pRoundRobinInst->nBackends;

/* Allocating backends and DCBs. */
   if((pRet->pBackends = calloc(pRet->nBackends, sizeof(SERVER *))) == NULL)
      {
      free(pRet);
      return NULL;
      }
   if((pRet->pdcbClients = calloc(pRet->nBackends, sizeof(DCB *))) == NULL)
      {
      free(pRet->pBackends);
      free(pRet);
      return NULL;
      }

/* Set servers and DCBs. */
   for(i = 0; i < pRet->nBackends; i++)
      {
      pRet->pBackends[i] = pRoundRobinInst->pBackends[i];
      pRet->pdcbClients[i] = NULL;
      }

/* Place connecting last in list of connections in instance. */
   spinlock_acquire(&pRoundRobinInst->lock);
   if(pRoundRobinInst->pConnections == NULL)
      pRoundRobinInst->pConnections = pRet;
   else
      {
      for(pTmp = pRoundRobinInst->pConnections; pTmp->pNext != NULL; pTmp = pTmp->pNext)
         ;
      pTmp->pNext = pRet;
      }
   spinlock_release(&pRoundRobinInst->lock);

   return (void *) pRet;
   } /* End of CreateSession(). */
This is also pretty basic stuff, the server pointers are copied from the instance (do I need to do this you ask? Answer is, I don't know but I do know that what I do here works). I also clear the DCB pointers, these are created on an as-needed base later in the code.

Following this are a couple of basic housekeeping functions that I am not showing here, actually I'm just going to show one more function, which is RouteQuery. This is, as the name implies, the function that gets called to do what we are actually writing this code for, routing queries. Before I show that code, I have to explain that this is very simplistic code. To being with, it doesn't implement "session commands", these are commands that really should be run on all backends, like setting the current database, handling transactions and such things. As I said, I do not implement this and this is one of the major shortcomings on this code that makes it much less generally applicable. But it still has use cases. Secondly, I have tried to make sure that the code works, more than optimizing it to death, so maybe I grab the spinlock too often and maybe I am too picky with allocating/deallocating the DCBs, I let others answer that.

The role of the function at hand is to handle an incoming query and pass it along to one of the servers defined for the service in question. In the general case, the most complicated part of this is selection of which server to route the query to and handling of session commands. I have simplified this by only having a very simple routing algorithm where I store the index of the last used backed for a connection in the nCurrBackend member, and for each query this is incremented until nBackends is reached where it is reset to 0. And for the complexity of session commands, I just don't implement them.

So, lets have a look at what the RouteQuery function looks like:
/*
 * Function: RouteQuery()
 * Route a query in the RoundRobin router.
 */
int RouteQuery(ROUTER *instance, void *session, GWBUF *queue)
   {
   PROUNDROBIN_CLIENT_SES pSession = (PROUNDROBIN_CLIENT_SES) session;
   DCB *pDcb;
   int nRet;
   unsigned int nBackend;

   MXS_NOTICE("Enter RoundRobin RouteQuery.");
   queue = gwbuf_make_contiguous(queue);

   spinlock_acquire(&pSession->lock);
/* Check for the next running backend. Set non-running backend DCBs to NULL. */
   for(nBackend = pSession->nCurrBackend; nBackend < pSession->nBackends; nBackend++)
      {
/* If this server is up, then exit this loop now. */
      if(!SERVER_IS_DOWN(pSession->pBackends[nBackend]))
         break;

/* If the server is down and the DCB is non-null, then free the DCB and NULL it now. */
      if(pSession->pdcbClients[nBackend] != NULL)
         {
         dcb_close(pSession->pdcbClients[nBackend]);
         pSession->pdcbClients[nBackend] = NULL;
         }
      }
/* If I couldn't find a backend after the current, then look through the ones before. */
   if(nBackend >= pSession->nBackends)
      {
      for(nBackend = 0; nBackend <= pSession->nCurrBackend; nBackend++)
         {
         if(!SERVER_IS_DOWN(pSession->pBackends[nBackend]))
            break;
         if(pSession->pdcbClients[nBackend] != NULL)
            {
            dcb_close(pSession->pdcbClients[nBackend]);
            pSession->pdcbClients[nBackend] = NULL;
            }
         }

/* Check that I really found a suitable backend. */
      if(nBackend > pSession->nCurrBackend)
         {
         spinlock_release(&pSession->lock);
         MXS_NOTICE("No suitable RoundRobin running server found in RouteQuery.");
         return 0;
         }
      }

   pDcb = pSession->pdcbClients[nBackend];
/* If backend DCB wasn't set, then do that now. */
   if(pDcb == NULL)
      pDcb = pSession->pdcbClients[nBackend] = dcb_connect(pSession->pBackends[nBackend],
        pSession->pSession,
        pSession->pBackends[nBackend]->protocol);
   spinlock_release(&pSession->lock);

/* Route the query. */
   nRet = pDcb->func.write(pDcb, queue);

/* Move to next dcb. */
   pSession->nCurrBackend = nBackend;
   if(++pSession->nCurrBackend >= pSession->nBackends)
      pSession->nCurrBackend = 0;

   MXS_NOTICE("Exit RoundRobin RouteQuery.");
   return 1;
   } /* End of RouteQuery(). */
So, what is going on here? First I check for a backend, first the ones starting with the current one (which is badly named, this is actually the one after the current) and then until I find a server that is running. If I find a non-Running server I skip that one, after having closed the associated DCB. If I can't find a server after the current one, I start again from the first, processing servers in the same way.

Following this I should have a server, then I check if the DCB is open, and if not I open it now. After that I do the actual routing of the query, move not the next backend and then return. Simple as that. As I have stated, this is a very simple router, but it does work, within the given limitations, and it should be good enough as a crude example.

Before I can test my code, I have to set it up for inclusion in the build process and do a few other mundane tasks, but that is all documented in the pdf that comes with the code, download the package from Sourceforge.

Happy SQLing
/Karlsson

Wednesday, August 10, 2016

Installing MariaDB MaxScale the hard way

If you are like me (let's for everyones sake hope you are not, though) you like to do things the hard way, in particular when it comes to testing things. For example when installing things on your Linux box, just to try them out, you might not want to do a yum install an rpm -ivh or an apt-get to have some files spread all over your system, instead you want to tar xvf some tarball and possibly, if you are in a good mood or you want to be a nice so you get some gifts for christmas or maybe because it is just that day, you unpack that tarball in /usr/local instead of in /home/bofh/junk. And this will usually get you in some trouble, but as we have already determined that we are truly bad (maybe we should get a tattoo or two also, or is the right to death-metal antics reserved for IT security personel only? Sure seems so) we can ignore that and get to work.

Here I will show you how to install MariaDB MaxScale from a tar-ball and get it running, without touching any system directories or anything if you want to test it or if you, even in production, want to install it in some non-standard location (like /usr/local. I actually like to have stuff there, I don't know what's so wrong with that. I'm a rebel, I know).

To begin with, let's download MariaDB MaxScale tarball (rpm's are for wussies), for example from mariadb.com where you should register and then go to "my portal"->"Downloads"->"MariaDB MaxScale" and download an appropriate .tar.gz for your operating system of choice. In my case I download it for CentOS 6 / RHEL 6 and as the current MariaDB MaxScale version is 1.4.3 I issue the command in my home directory (/home2/anders):
$ wget https://downloads.mariadb.com/enterprise/<my tag>/mariadb-maxscale/1.4.3/rhel/6/x86_64/maxscale-1.4.3-1.rhel.6.x86_64.tar.gz
With <my tag> replaced by a generated tag on mariadb.com. Following this we are stuck with a tarball named maxscale-1.4.3-1.rhel.6.x86_64.tar.gz and we unpack that as usual and then create a more readable link to the created directory:
$ tar xvfz maxscale-1.4.3-1.rhel.6.x86_64.tar.gz
$ ln -s maxscale-1.4.3-1.rhel.6.x86_64 maxscale143
So far nothing magic has happened. The next step is to create a few directories in our new maxscale143 directory where MariaDB MaxScale will keep temporary, stuff, logs etc:
$ cd maxscale143
$ mkdir cache data log

The next step after this is to create a MariaDB MaxScale configuration file. There is a template for this in the etc subdirectory so we just have to copy that:
$ cp etc/maxscale.cnf.template etc/maxscale.cnf
The supplied config file will start MariaDB MaxScale with just 1 server defined, and unless you have this server running on a non-standard port or on another machine than the one where MariaDB MaxScale itself is running, you can leave this configuration file alone, and if not you have to edit the [server1] section appropriately.

Another thing to look for is iptables / firewalld settings, but this you already know about I guess. You might want to turn them off (which is not recommended at all) or configure it appropriately. As per the default configuration with MariaDB MaxScale 1.4.3, ports 4006, 4008 and 6603 will be listened to, so you configure iptables / firewalld appropriately. And don't turn them of, do this the right way for once. I turned iptables off by the way, just to annoy you.

Now, MariaDB MaxScale will connect to the server we defined in the configuration file above, and we need to allow it to connect and execute a few commands. There are two users that MariaDB MaxScale can use, one to connect and get authentication data, like usernames and passwords, and another separate one to monitor the state of the server. In the supplied configuration template these two users use the same account, namely myuser using mypwd as the password, and this is what I use in the following where are set up the appropriate user and grant in the MariaDB server I am connecting to, and also note that I am assuming that MariaDB MaxScale and the MariaDB server on question run on the same node. So connect to MariaDB and issue the following commands:
MariaDB> CREATE USER 'myuser'@'localhost'  IDENTIFIED BY 'mypwd';
MariaDB> GRANT SELECT ON mysql.user TO 'myuser'@'localhost';
MariaDB> GRANT SELECT ON mysql.db TO 'myuser'@'localhost';
MariaDB> GRANT SELECT ON mysql.tables_priv TO 'myuser'@'localhost';
MariaDB> GRANT SHOW DATABASES ON *.*TO 'myuser'@'localhost';
MariaDB> GRANT REPLICATION SLAVE ON *.* TO 'myuser'@'localhost';
MariaDB> GRANT REPLICATION CLIENT ON *.* TO 'myuser'@'localhost';

With this in place we are ready to start MariaDB MaxScale, but this is an itsy bitsy more complex than you think. The issue is that the default locations for a lot of stuff that MariaDB MaxScale wants to use is somewhere in the global file system, and they are also not relative to some basedir as is conveniently the case with MariaDB server itself. to support this, instead of putting all this in the global section in the MariaDB MaxScale config file I'll instead put any necessary arguments to get MaxScale going on the command line, and for that I have created three scripts, 1 to set up the environment, one to start MariaDB MaxScale and one to stop it. Let's start with the environment one. This is places in the MariaDB MaxScale home directory (maxscale143) is called maxenv.sh and has the following contents:
#!/bin/bash
#
MAXSCALE_HOME=$(cd $(dirname $BASH_SOURCE) ; pwd)
PATH=$PATH:$MAXSCALE_HOME/usr/bin
export LD_LIBRARY_PATH=$MAXSCALE_HOME/usr/lib64/maxscale


The next file to create is the script to start MariaDB MaxScale, this is called startmax.sh, is again placed in the MariaDB MaxScale root directory and has this content:
#!/bin/bash
#
. `dirname $0`/maxenv.sh

$MAXSCALE_HOME/usr/bin/maxscale \
  --config=$MAXSCALE_HOME/etc/maxscale.cnf \
  --logdir=$MAXSCALE_HOME/log \
  --language=$MAXSCALE_HOME/var/lib/maxscale \
  --datadir=$MAXSCALE_HOME/data \
  --libdir=$MAXSCALE_HOME/usr/lib64/maxscale \
  --piddir=$MAXSCALE_HOME --syslog=no \
  --cachedir=$MAXSCALE_HOME/cache

As you can see this invokes maxenv.sh before going on to start MariaDB MaxScale. The only parameter that I really don't have to set here, but which I set anyway, again  just to be annoying to the world in general, is --syslog=no as we are only testing things here and logging to syslog is then not really appropriate (but it is the default).

All we need now is script to stop MariaDB MaxScale, and for this create a file called stopmax.sh in the MariaDB MaxScale home directory with this content:
#!/bin/bash
#
. `dirname $0`/maxenv.sh

if [ -e "$MAXSCALE_HOME/maxscale.pid" ]; then
   kill -term `cat $MAXSCALE_HOME/maxscale.pid`
fi


Following this, the one thing that remains to be done is to make the scripts we just created executable:
$ chmod +x maxenv.sh startmax.sh stopmax.sh

Now we are ready to try things, let's start MariaDB MaxScale first:
$ ./startmax.sh
And then let's see if we can connect to the MariaDB server through MariaDB MaxScale:
$ mysql -h 127.0.0.1 -P 4006 -u myuser -pmypwd
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 6950
Server version: 10.0.0 1.4.3-maxscale MariaDB Server

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]>

As you can see, I am not connecting as root here, as this is not allowed by MariaDB MaxScale by default. Also, I am not connecting to localhost as that assumes I am connecting using a socket, which is not what we want to do here.

This is all for today, now I'll need to start my Harley-Davidson and head down town to hang with the other tough guys (OK, I'm really taking my beaten-up Ford and pick up the kids from Kindergarten, I admit it).

Keep on SQL'ing
/Karlsson

Monday, August 8, 2016

MyQuery 3.5.6 released

I released version 3.5.6 of MyQuery, and there are quite a number of new features and fixes in there. The #1 bugfix is that the annoying access warnings that poped up from Windows when saving to the registry are gone, as I have now moved the registry to a more Windows 10 acceptable place. Among the new features are:
  • JSON format output when saving results.
  • More flexible CSV format output with many new options.
  • Ability to save Dyncol as JSON in CSV and JSON output.
  • Nicer formatting of numbers in status dialogs.
  • Auto refresh of status dialogs
 As this is a version with many new features, I still consider this a Beta. I have built it on Windows 10 and tested it on Windows 10 and 7, 64-bit, although MyQuery itself is still a 32-bit windows application.

Happy SQL'ing
/Karlsson

Tuesday, February 16, 2016

Loading JSON into MariaDB or even MySQL - mysqljsonimport 2.0 is available

It was a long time since I updated mysqljsonimport or mysqljsonexport and I had a few things I wanted to do with them. This release is significant enough for me to bump it up to 2.0, and the same is in the works for mysqljsonexport. The one big thing that is now implemented is reasonably advanced support for MariaDB Dynamic Columns, and it is actually pretty flexible, allowing you to load a nested JSON object into a MariaDB Dynamic Column. But don't worry, it will still link and run with MySQL if that is what you want to do (but then you will not have the dynamic column features, for obvious reasons),

Download from Sourceforge as usual (yes, I know I am oldfashioned and that I should have put it on github). Also as usual is the documentation in pdf format that is also downloadable separately.

/Karlsson