Tuesday, December 22, 2015

Ruby SQLite3 memory tuning

0. Motivation

SQLite database is designed as an embedded db, the default settings of memory footprint, page size are biased to workload with moderate data sizes. In a recent project,  I used sqlite to implement some path generators on a graph dataset, in order to scale the program without worrying about out of memory issues, e.g. 3-hop paths in a 100K graph would go beyond 20G easily, external memory algorithms need to be implemented, writing a SQL and writing my own c code basically will have similar performance. However, both ruby and sqlite are not supposed to do this type of work under the default settings. The post contains some settings one needs to do in order to have better performance.

1. Ruby

One should set the GC using the following VARs. (ruby 1.9+, 2.0)

export RUBY_HEAP_MIN_SLOTS=8000000
export RUBY_FREE_MIN=327680
export RUBY_GC_MALLOC_LIMIT=512000000

On details of Ruby GC, and the effect of these environment vars, see this page:
http://tmm1.net/ruby21-rgengc/

Other useful links


2. SQLite

to use more memory and avoid hitting disk, one should set the following PRAGMA
cache_size: basically the number of b-tree pages can be held in memory.
temp_store: temporary results in memory db or in file

this post is nice in this topic:
http://www.codificar.com.br/blog/sqlite-optimization-faq/

My app is read-only on IDBs, so the synchronous PRAGMA is not useful. But even for R/W applications, be careful to set it.

In Ruby SQLite3 binding, the PRAGMAs are included in the Database class.
After getting the db instance, one can just set:

db.cache_size = 512000   # 500M
db.temp_store = 2            # memory

Other useful links on this topic:


No comments:

Post a Comment