Monday, September 19, 2016

Replication from Oracle to MariaDB - Part 3

This is the third part of a series of blogs on how to do Change Data Capture from Oracle in an effort to be able to replicate that data to MariaDB. Part 1 dealt with some basics on the Oracle side of things regarding the Oracle redo log which we use as a source for out CDC efforts and Part 2 was about how to get this to work. We left part2 with an example of how to Replicate INSERTs from Oracle to MariaDB, but that is hardly good enough, we also have to deal with UPDATE and DELETE to make it at least somewhat complete, and this is what we will have a look at in this post.

And I know what you are asking your selves now, you spent all that time explaining the Oracle ROWID and then you never used that. That was an awful waste of time. And no, it wasn't a waste, we will put it to good use now when we look at UPDATEs.

For an INSERT, things are reasonably simple, right, we just add a row with some columns to the table. As for an UPDATE or a DELETE though, we need a means to identify the row to be updated. To reiterate one thing, the Oracle redo log is not like the MariaDB binlog here, and here I am assuming that the binlog format is STATEMENT. Let's look at an example, and for that let's insert some data into out EMP table (see the previous port for details).
SQL> INSERT INTO emp VALUES(2, 'John', 'Developer', 12000, 20);
SQL> INSERT INTO emp VALUES(3, 'Georgina', 'Design', 11000, 20);
SQL> INSERT INTO emp VALUES(4, 'Anne', 'Assistant', 8000, 30);
SQL> INSERT INTO emp VALUES(5, 'Marge', 'HR Mgr', 14000, 30);
SQL> COMMIT;
With that in place, let's now ensure that this is also replicated to MariaDB by running the script from the last post:
$ sqlplus -S / as sysdba @cdcemp.sql | mysql -u anders -panders anders
And to ensure that we got this right, let's check this out in MariaDB now:
$ mysql -u anders -panders anders
MariaDB [anders]> SELECT * FROM emp;
+-------+----------+-----------+----------+--------+
| empno | ename    | job       | sal      | deptno |
+-------+----------+-----------+----------+--------+
|     1 | Anders   | Sales     | 10000.00 |     10 |
|     2 | John     | Developer | 12000.00 |     20 |
|     3 | Georgina | Design    | 11000.00 |     20 |
|     4 | Anne     | Assistant |  8000.00 |     30 |
|     5 | Marge    | HR Mgr    | 14000.00 |     30 |
+-------+----------+-----------+----------+--------+
5 rows in set (0.00 sec)
OK, seems like we got it right so far. Then let's try an UPDATE and raise the salary for the two developers in Department 20:
$ sqlplus anders/anders
SQL> UPDATE emp SET sal = sal + 100 WHERE deptno = 20;
Now, what does this result in? If this was MariaDB with STATEMENT binlog format, the binlog would have something like this in it:
# at 9535
#160919 13:00:44 server id 112  end_log_pos 9652 CRC32 0x987f9dea       Query   thread_id=101   exec_time=0     error_code=0
SET TIMESTAMP=1474282844/*!*/;
UPDATE emp SET sal = sal + 100 WHERE deptno = 20
/*!*/;
We've seen this before, no surprises there. Now, let's try this from SQL*Plus and see what the SQL_REDO column in V$LOGMNR_CONTENTS looks like.
update "ANDERS"."EMP" set "SAL" = '12100' where "SAL" = '12000' and ROWID = 'AAAE7RAAEAAAAK9AAB';
update "ANDERS"."EMP" set "SAL" = '11100' where "SAL" = '11000' and ROWID = 'AAAE7RAAEAAAAK+AAA';

Whoa, that doesn't look like our UPDATE at all, what's going on? Well, I already told you that the Oracle redo log doesn't work like the binlog. In this case, we get an SQL statement from it, but it is not the same SQL statement that we asked Oracle to execute for us. No, this SQL is reconstructed from the redo log record, which logs updates row by row, and uses the ROWID to identify the row that was changed. We have already discussed that the ROWID is unique in a table and to an extent even acroess tables. The issue now is we don't have a ROWID in MariaDB. But the ROWID in the redo log is available in the ROW_ID column in the V$LOGMNR_CONTENTS table. And if we assume that all tables that we replicate has to have a PRIMARY KEY, then we could use the ROWID to look up that PRIMARY KEY and use that for UPDATE operations in MariaDB, right? Something like this in our script:
SELECT empno INTO v_empno FROM anders.emp WHERE rowid = v_row_id;
That works but there is one issue with it. We get out data from the redo log, which is historic, so to speak, but we look at the PRIMARY KEY using the ROW ID based on how data looks right now, which means that if you are in the habit of updating your PRIMARY KEYs (don't do this, by the way, this is a bad habit even if we even if we exclude the specific use case we look at here), then this isn't going to fly. Sorry then, I will not fix that for you, if you update a row and then update it's primary key, the first update will fail.

Then let's look at DELETEs. Can we use the same method as used for UPDATEs, it's tempting, I know, but it is not going to work at all. Think about it, we just determined that we get the ROWID for a row that is historic, but in the case of a DELETE there is no "current row" as we DELETEd it? Right? Whoa, how do we solve that? Are we stuck now, out in the desert, the hot Oracle sun is shining on our unprotected pale skin and it's getting hotter and hotter. Our old Studebaker has broken down by the side of the abandoned dirt road, and you stare into the unforgiving sun and you know this is it?

Nope. This can be fixed, luckily! The Oracle redo log contains redo data, that is what we have been using so far. The thing is that it also contains undo data! Hey! And then we think about this amazing concept a bit and consider what is necessary to undo a delete? Frankly, the value of ALL columns in the deleted row, right? Our friendly DBMS_LOGMNR.MINE_VALUE function comes to the rescue, but this time not to operate on the redo value but on the undo value!

Armed with all this, we can now revisit out cdcemp.sql script and add stuff to handle UPDATE and DELETE operations also:
SET serveroutput ON
SET linesize 4000
SET feedback off
DECLARE
   v_scn NUMBER;
   v_scnstart NUMBER;
   v_scnend NUMBER;
   v_redo_value NUMBER;
   v_undo_value NUMBER;
   v_operation VARCHAR2(32);
   v_row_id VARCHAR2(18);
   v_sqlstmt VARCHAR2(4000);
   v_firstcol BOOLEAN;
   v_empno NUMBER;
   CURSOR cur1 IS
     SELECT scn, redo_value, undo_value, operation, row_id
       FROM v$logmnr_contents
       WHERE seg_owner = 'ANDERS' AND table_name = 'EMP'
         AND operation IN('INSERT', 'UPDATE', 'DELETE');
BEGIN
-- Start LogMiner.
   SELECT NVL(MIN(l.scn) + 1, MIN(e.ora_rowscn)) INTO v_scnstart
     FROM anders.emp e, scnlog l
     WHERE l.table_name = 'EMP';
   SELECT MAX(e.ora_rowscn) INTO v_scnend
     FROM anders.emp e;
   DBMS_OUTPUT.PUT_LINE('-- SCN Range: ' || v_scnstart || ' - ' || v_scnend);
   DBMS_LOGMNR.START_LOGMNR(STARTSCN => v_scnstart,
     ENDSCN => v_scnend,
     OPTIONS => DBMS_LOGMNR.COMMITTED_DATA_ONLY + DBMS_LOGMNR.CONTINUOUS_MINE +
     DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
-- Dummy update to insert a more recent SCN to ensure that we don't have to wait
-- for the next update in the following select.
   UPDATE scnlog SET table_name = table_name WHERE table_name = 'EMP';
   COMMIT;

-- Open cursor to get data from LogMiner.
   OPEN cur1;

-- Loop for all the rows in the redo log since the last time we ran this.
   LOOP
      FETCH cur1 INTO v_scn, v_redo_value, v_undo_value, v_operation, v_row_id;
      EXIT WHEN cur1%NOTFOUND;
      IF v_operation = 'INSERT' THEN
         v_sqlstmt := 'INSERT INTO emp(EMPNO, ENAME, JOB, SAL, DEPTNO) VALUES(';
         v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.EMPNO') || ', ';
         v_sqlstmt := v_sqlstmt || '''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.ENAME') || ''', ';
         v_sqlstmt := v_sqlstmt || '''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.JOB') || ''', ';
         v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.SAL') || ', ';
         v_sqlstmt := v_sqlstmt || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.DEPTNO') || ')';
      ELSIF v_operation = 'UPDATE' THEN
         v_sqlstmt := 'UPDATE emp SET ';
         v_firstcol := TRUE;

-- Build UPDATE statement using only column in redo log record.
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.EMPNO') = 1 THEN
            v_sqlstmt := v_sqlstmt || 'EMPNO = ' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.EMPNO');
            v_firstcol := FALSE;
         END IF;
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.ENAME') = 1 THEN
            v_sqlstmt := v_sqlstmt || CASE WHEN NOT v_firstcol THEN ', ' ELSE '' END;
            v_sqlstmt := v_sqlstmt || 'ENAME = ''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.ENAME')
              || '''';
            v_firstcol := FALSE;
         END IF;
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.JOB') = 1 THEN
            v_sqlstmt := v_sqlstmt || CASE WHEN NOT v_firstcol THEN ', ' ELSE '' END;
            v_sqlstmt := v_sqlstmt || 'JOB = ''' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.JOB')
              || '''';
            v_firstcol := FALSE;
         END IF;
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.SAL') = 1 THEN
            v_sqlstmt := v_sqlstmt || CASE WHEN NOT v_firstcol THEN ', ' ELSE '' END;
            v_sqlstmt := v_sqlstmt || 'SAL = ' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.SAL');
            v_firstcol := FALSE;
         END IF;
         IF DBMS_LOGMNR.COLUMN_PRESENT(v_redo_value, 'ANDERS.EMP.DEPTNO') = 1 THEN
            v_sqlstmt := v_sqlstmt || CASE WHEN NOT v_firstcol THEN ', ' ELSE '' END;
            v_sqlstmt := v_sqlstmt || 'DEPTNO = ' || DBMS_LOGMNR.MINE_VALUE(v_redo_value, 'ANDERS.EMP.DEPTNO');
            v_firstcol := FALSE;
         END IF;

-- Get empno from emp using rowid.
         SELECT MAX(empno) INTO v_empno FROM anders.emp WHERE rowid = v_row_id;
         IF v_empno IS NULL THEN
           CONTINUE;
         END IF;
         v_sqlstmt := v_sqlstmt || ' WHERE empno = ' || v_empno;
      ELSIF v_operation = 'DELETE' THEN
-- Get the empno from the undo record.
         v_empno := DBMS_LOGMNR.MINE_VALUE(v_undo_value, 'ANDERS.EMP.EMPNO');
         IF v_empno IS NULL THEN
           CONTINUE;
         END IF;
         v_sqlstmt := 'DELETE FROM emp WHERE empno = ' || v_empno;
      END IF;

      DBMS_OUTPUT.PUT_LINE('-- SCN = ' || v_scn);
      DBMS_OUTPUT.PUT_LINE(v_sqlstmt || ';');
   END LOOP;
   IF v_scn IS NOT NULL THEN
     UPDATE scnlog SET scn = v_scn WHERE table_name = 'EMP';
     COMMIT;
   END IF;
   CLOSE cur1;
END;
/
EXIT

Let's try this script, and see if the UPDATE to the sal column that I did at the beginning of this post are properly replicated to MariaDB:
$ sqlplus -S / as sysdba @cdcemp.sql | mysql -u anders -panders anders
And then we check what we have in MariaDB:
$ mysql -u anders -panders anders
MariaDB [anders]> SELECT * FROM emp;
+-------+----------+-----------+----------+--------+
| empno | ename    | job       | sal      | deptno |
+-------+----------+-----------+----------+--------+
|     1 | Anders   | Sales     | 10000.00 |     10 |
|     2 | John     | Developer | 12100.00 |     20 |
|     3 | Georgina | Design    | 11100.00 |     20 |
|     4 | Anne     | Assistant |  8000.00 |     30 |
|     5 | Marge    | HR Mgr    | 14000.00 |     30 |
+-------+----------+-----------+----------+--------+
5 rows in set (0.00 sec)
And then see if we can also handle a DELETE. In SQL*Plus we run this:
SQL> DELETE FROM emp WHERE ename = 'John';
SQL> COMMIT;
The we run out script:
$ sqlplus -S / as sysdba @cdcemp.sql | mysql -u anders -panders anders
And then we check what we have in MariaDB:
$ mysql -u anders -panders anders
MariaDB [anders]> SELECT * FROM emp;
+-------+----------+-----------+----------+--------+
| empno | ename    | job       | sal      | deptno |
+-------+----------+-----------+----------+--------+
|     1 | Anders   | Sales     | 10000.00 |     10 |
|     3 | Georgina | Design    | 11100.00 |     20 |
|     4 | Anne     | Assistant |  8000.00 |     30 |
|     5 | Marge    | HR Mgr    | 14000.00 |     30 |
+-------+----------+-----------+----------+--------+
4 rows in set (0.00 sec)

Bazinga, it worked! Are we done then? I'm afraid not, we still have transactions to take care of, but that has to wait a day or two.

Happy SQL'ing
/Karlsson

No comments: