Migrating OwnCloud from SQlite to PostgreSQL

Introduction

SQLite is not a real database system. It’s more like a big file of text. Using SQLite to manage any kind of system can be dangerous when your database grows. The best advise is to loose a bit more time to learn how to set up a real database system like MySQL or PostgreSQL.

However, sometimes you started with a SQLite database. Then you may want to migrate to a real database system. Let’s do that.

In this article, a SQLite 3 database owncloud.db will be migrated into a PostgreSQL 9.1 database (with username owncloud and database name owncloud). The use case is migrating the database of a instance of OwnCloud 6.0.3 with basic functionalities. Some applications have also been installed and the only noticeable one is News (allows you to subscribe to RSS flux).

Export the SQLite database

This is a pretty easy step since sqlite3 tool can give you a dump of your database as SQL commands. Let’s put the output into a file dump.sql.

sqlite3 owncloud.db .dump > dump.sql

Seems easy, right? Let’s go to the next step.

Import the dump into PostgreSQL

The nomal way

Since SQL is standard, we can try to feed PostgreSQL with this dump.sql file with the following command.

psql -U owncloud -d owncloud < owncloud.db

If it works for you, great! However, as we will see, each database system… has its own SQL standard.

Converting the dump

There is a few things to adapt between SQLite and PostgreSQL. An entry point can be this article which explains what should be adapted. Let’s sum up everything that should be adapted:

Remove PRAGMA calls
Convert DATETIME into TIMESTAMP (DATETIME is for SQLite when TIMESTAMP seems to be the one for PostgreSQL, see documentation)
Remove UNSIGNED (every integer is unsigned in PostgreSQL, see documentation)
Remove AUTOINCREMENT then replace INTEGER with SERIAL (SERIAL is the autoincrement integer in PostgreSQL)
Convert LONGTEXT and CLOB into TEXT or VARCHAR (see documentation)
Convert BLOB into BYTEA (and you might need more work for these ones)
Convert boolean values from respectively 0 and 1 to '0' and '1' or TRUE and FALSE (see documentation)

Except the last 2 points that requires more attention, we can do everything with sed. Let’s do a sed script migrate.sed for automatically adapting our dump dump.sql.

# Remove every line starting with `PRAGMA'
/^PRAGMA/d
# Change types only in the scope of a table creation
/^CREATE TABLE/ {
	# Remove `UNSIGNED' keyword and spaces following
	s/\<UNSIGNED\> *//g
	# Replace `DATETIME' type (SQLite) with `TIMESTAMP' type (PostgreSQL)
	s/\<DATETIME\>/TIMESTAMP/g
	# `LONGTEXT' and `CLOB' replaced by `TEXT' (could use `VARCHAR' too)
	s/\<LONGTEXT\|CLOB\>/TEXT/g
	# Replace any integer with `AUTOINCREMENT' with `SERIAL'
	s/\<INTEGER\>\([^,]*\)[ ]*AUTOINCREMENT/SERIAL\1/g
	# Replace `BLOB' with `BYTEA' (binary blob in PostgreSQL)
	s/\<BLOB\>/BYTEA/g
}

Some notes about that script.

Note	`PRAGMA` are only specific commands specific for the database system. Obviously, if they have sense for SQLite, they will not for PostgreSQL. We may find equivalent `PRAGMA’ commands but I didn’t investigate.

Note	We want to replace types only for table creation. For example, we want to avoid to remove `UNSIGNED` everywhere it appears in the file `dump.sql` because it may appears in the contents of the database. `sed` allows us to restrict modification to only some specific lines.

Note	`AUTOINCREMENT` in SQLite is linked with an integer where in PostgreSQL, `SERIAL` replaces both.

Note	`BLOB` is a binary blob which may need to be decoded with the `decode()` function (something like `decode(value,"hex")`). This can be tricky but see next sections, you may find some help.

After these few notes about the sed script, let’s apply it.

sed -f migrate.sed dump.sql > dump-adapted.sql

`BLOB` and `BOOLEAN` adptations

This is a more tricky problem since we will need to replace or encapsulate value that will be inserted into our database (lines that starts with INSERT INTO). For this purpose, I made a awk script to parse lines with CREATE TABLE and transform them, for each table, into sed replacement command that will target only the needed field (for example, the third one if the third one is a boolean, and the fifth one if it’s a BLOB).

#!/bin/awk -f

# Before program
BEGIN {
}

# Program
{
	print "# " $0;
	header = 0;
	lastpatt = "";
	for(k=1; k<=NF; k++) {
		if($k ~ /BLOB|BOOLEAN/) {
			subin="\\(^INSERT INTO \042\\?" ENVIRON["DB"] "\042\\? VALUES(";
			subout = "";
			can_be_null=0;
			subnum=1;
			for(i=1; i<=NF; i++) {
				patt = "";
				if(($i !~ /KEY/) && (i != 1)) {
					subin = subin ",";
				}
				if($i ~ /INTEGER/) {
					patt = "[[:digit:]]\\+";
				}
				if($i ~ /NUMERIC/) {
					patt = "[[:digit:].-]\\+";
				}
				if(($i ~ /VARCHAR|(LONG)?TEXT|BLOB/)) {
					patt = "\047.*\047";
				}
				if(($i !~ /NOT NULL|KEY/)) {
					patt = "\\(NULL\\|" patt "\\)";
					subnum = subnum + 1;
				}
				if($i ~ /BLOB/) {
					if (i < k) {
						patt = "decode(" patt ",\047hex\047)";
					}
					if(i == k) {
						if(patt ~ /NULL/) {
							bdecnum = 1;
							decnum = subnum;
							adecnum = decnum + 1;
							patt = "\\)" patt "\\(";
						} else {
							bdecnum = 1;
							decnum = subnum + 1;
							adecnum = decnum + 1;
							patt = "\\)\\(" patt "\\)\\(";
						}
						subout = "\\" bdecnum "decode(\\" decnum ",\047hex\047)\\" adecnum;
					}
				}
				if($i ~ /BOOLEAN/) {
					patt = "[01]";
					if(i == k) {
						bdecnum = 1;
						decnum = subnum + 1;
						adecnum = decnum + 1;
						patt = "\\)\\(" patt "\\)\\(";
						subout = "\\" bdecnum "\047\\" decnum "\047\\" adecnum;
					}
				}
				if(($i !~ /KEY/)) {
					lastpatt = patt;
				}
				subin = subin patt;
				can_be_null = 0;
			}
			subin = subin ");$\\)";
			if(header == 0) {
				startpatt = ENVIRON["START_PATT"];
				endpatt = lastpatt ENVIRON["END_PATT"];
				print "/" startpatt "/,/" endpatt "/ {";
				print "/" startpatt "/h";
				print "/" startpatt "/!H";
				print "/" endpatt "/ {";
				print "g";
				header = 1;
			}
			print "s/" subin "/" subout "/g";
		}
	}
	print "p";
	print "}";
	print "d";
	print "}";
}

# After program
END {
}

It’s a pretty unreadable awk and sed collaboration so you may try to understand only if you needed to. Otherwise, you may want to try the bash script I wrote which will run the awk script to produce the sed script then run the sed script to adapt the dump file.

#!/bin/sh
# The original database
SRC_FILE=owncloud.db
# A list of SQL commands that should create a copy or the source database
DUMP_FILE=dump.sql
# A list of `sed` actions to convert the SQLite dump into PostgreSQL
SED_FILE=migrate.sed
# Awk program to create sed actions for the BYTEA
AWK_FILE=migrate.awk

if [ -f $DUMP_FILE ]
then
	rm $DUMP_FILE
fi
if [ -f $SED_FILE ]
then
	rm $SED_FILE
fi

sqlite3 $SRC_FILE .dump > $DUMP_FILE

function to_sed() {
  echo $1 >> $SED_FILE
}

to_sed '#!sed -f'
to_sed '/^PRAGMA/d'
to_sed '/^CREATE TABLE/ {'
to_sed 's/\<UNSIGNED\> *//g'
to_sed 's/\<DATETIME\>/TIMESTAMP/g'
to_sed 's/\<LONGTEXT\>/TEXT/g'
to_sed 's/\<INTEGER\>\([^,]*\)[ ]*AUTOINCREMENT/SERIAL\1/g'
to_sed 's/\<CLOB\>/TEXT/g'
to_sed 's/\<BLOB\>/BYTEA/g'
to_sed '}'

cat $DUMP_FILE | grep '^CREATE TABLE.*\<\(BLOB\|BOOLEAN\)\>'| while read LINE
do
	DB=$( echo $LINE | awk '{print $3}' | sed -e 's/"//g' -e "s/'//g" )
	START_PATT='^INSERT INTO "\?\<'$DB'\>"\?'
	END_PATT=');$'
	export DB START_PATT END_PATT
	echo $LINE | sed -e 's/^CREATE TABLE "\?\<'$DB'\>"\? (//g' -e 's/);$//g' -e 's/([^)]*)//g' | awk -F ' *, *' -f $AWK_FILE >> $SED_FILE
done

to_sed "s/decode(NULL,'hex')/NULL/g"
to_sed 'p'

sed -n -f $SED_FILE -i.bak $DUMP_FILE

Remember, you may need these scripts if you have either BOOLEAN and/or BLOB. If not, then you probably don’t wwant them and just use the simple sed script I provided in the previous section.

Conclusion

To conclude to this article, even if I do a lot of research and it tooks me time to write these scripts, I didn’t success well in migrating from SQLite to PostgreSQL. In the meanwhile, I learned a lot of interesting stuff about awk and sed scripting, about playing with PostgreSQL. I also learned that you want probably to use PostgreSQL or MySQL from the start and avoid any migration. This article may helps you to understand some important things about the migration process, to find information but it will probably not be enough to make it well. It’s can only be a starting point. Good luck!

References

Dumping SQLite database
Migrate from SQLite to PostgreSQL with a few useful links: