0. Motivation
SQLite database is designed as an embedded db, the default settings of memory footprint, page size are biased to workload with moderate data sizes. In a recent project, I used sqlite to implement some path generators on a graph dataset, in order to scale the program without worrying about out of memory issues, e.g. 3-hop paths in a 100K graph would go beyond 20G easily, external memory algorithms need to be implemented, writing a SQL and writing my own c code basically will have similar performance. However, both ruby and sqlite are not supposed to do this type of work under the default settings. The post contains some settings one needs to do in order to have better performance.1. Ruby
One should set the GC using the following VARs. (ruby 1.9+, 2.0)export RUBY_HEAP_MIN_SLOTS=8000000
export RUBY_FREE_MIN=327680
export RUBY_GC_MALLOC_LIMIT=512000000
On details of Ruby GC, and the effect of these environment vars, see this page:
http://tmm1.net/ruby21-rgengc/
Other useful links
- reference setting in real world rails apps: https://gist.github.com/jjb/7389552
2. SQLite
to use more memory and avoid hitting disk, one should set the following PRAGMAcache_size: basically the number of b-tree pages can be held in memory.
temp_store: temporary results in memory db or in file
this post is nice in this topic:
http://www.codificar.com.br/blog/sqlite-optimization-faq/
In Ruby SQLite3 binding, the PRAGMAs are included in the Database class.
After getting the db instance, one can just set:
db.cache_size = 512000 # 500M
db.temp_store = 2 # memory
Other useful links on this topic:
- ruby sqlite3 doc:
- http://www.rubydoc.info/github/luislavena/sqlite3-ruby/SQLite3/Pragmas
- http://www.rubydoc.info/github/luislavena/sqlite3-ruby/SQLite3/Database
- sqlite3 pragma doc:
No comments:
Post a Comment