Transcript of PGroonga & Zulip
- 1. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga &
Zulip Kouhei Sutou ClearCode Inc. Zulip & PGroonga Night
2017-09-06
- 2. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga
Pronunciation: pz:ln PostgreSQL extension PostgreSQL Fast full text
search All languages are supported!
- 3. PGroonga & Zulip Powered by Rabbit 2.2.1 Fast? Need to
measure to confirm Targets textsearch (built-in) pg_bigm (third
party)
- 4. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroona and
textsearch 0 0.2 0.4 0.6 0.8 1 1.2 1.4 PostgreSQL OR MySQL database
America Data: English Wikipedia (Many records and large docs) N
records: About 5.3millions Average text size: 6.4KiB
Elapsedtime(ms) (Shorterisbetter) Query PGroonga textsearch
- 5. PGroonga & Zulip Powered by Rabbit 2.2.1 As fast as
textsearch textsearch textsearch uses word based full text search
textsearch PostgreSQL has enough performance for the approach
PostgreSQL
- 6. PGroonga & Zulip Powered by Rabbit 2.2.1 textsearch and
Japanese textsearch Asian languages including Japanese aren't
supported Need plugin Plugin exists but isn't maintained
- 7. PGroonga & Zulip Powered by Rabbit 2.2.1 Japanese
support Need one of them N-gram approach support N-gram Japanese
specific word based approach support PGroonga supports both of
them
- 8. PGroonga & Zulip Powered by Rabbit 2.2.1 PostgreSQL and
N-gram PostgreSQLN-gram PostgreSQL is slow with N- gram approach
PostgreSQLN-gram N-gram approach: pg_trgm (contrib) Japanese isn't
supported by default pg_bigm (third-party)
- 9. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroona and
pg_bigm 0 0.5 1 1.5 2 2.5 3 311 14706 20389 Data: Japanese
Wikipedia (Many records and large documents) N records: About
0.9millions Average text size: 6.7KiB Fast Fast Elapsedtime(sec)
(Lowerisbetter) N hits PGroonga pg_bigm
- 10. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga is
fast stably PGroonga PostgreSQL needs "recheck" for N-gram approach
PostgreSQLN-gramrecheck Seq search after index search PGroonga
doesn't need PGroonga Only index search OK
- 11. PGroonga & Zulip Powered by Rabbit 2.2.1 Wrap up
textsearch is fast but Asian langs aren't supported textsearch
pg_bigm supports Japanese but is slow for large hits pg_bigm
PGroonga is fast and supports all languages PGroonga
- 12. PGroonga & Zulip Powered by Rabbit 2.2.1 FYI:
textsearch, PGroonga and Groonga 0 0.2 0.4 0.6 0.8 1 1.2 1.4
PostgreSQL OR MySQL database America Data: English Wikipedia (Many
records and large docs) N records: About 5.3millions Average text
size: 6.4KiB Groonga is 30x faster than others Elapsedtime(ms)
(Shorterisbetter) Query PGroonga Groonga textsearch
- 13. PGroonga & Zulip Powered by Rabbit 2.2.1 Zulip and
PGroonga Zulip uses textsearch by default Zuliptextsearch Japanese
isn't supported Zulip supports PGroonga as option ZulipPGroonga
Implemented by me
- 14. PGroonga & Zulip Powered by Rabbit 2.2.1 Zulip: full
text search Zulip Zulip is chat tool Zulip Latency is important for
UX UX Index update is heavy Delay index update
- 15. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index
update CREATE TABLE zerver_message ( rendered_content text, -- ...
Column for full text search search_tsvector tsvector ); -- Index
for full text search CREATE INDEX zerver_message_search_tsvector ON
zerver_message USING gin (search_tsvector);
- 16. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index
update -- Execute append_to_fts_update_log() on change CREATE
TRIGGER zerver_message_update_search_tsvector_async BEFORE INSERT
OR UPDATE OF rendered_content ON zerver_message FOR EACH ROW
EXECUTE PROCEDURE append_to_fts_update_log();
- 17. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index
update -- Insert ID to fts_update_log table CREATE FUNCTION
append_to_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN INSERT INTO fts_update_log (message_id) VALUES (NEW.id);
RETURN NEW; END $$;
- 18. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index
update -- Keep ID to be updated CREATE TABLE fts_update_log ( id
SERIAL PRIMARY KEY, message_id INTEGER NOT NULL );
- 19. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index
update -- Execute do_notify_fts_update_log() -- on INSERT CREATE
TRIGGER fts_update_log_notify AFTER INSERT ON fts_update_log FOR
EACH STATEMENT EXECUTE PROCEDURE do_notify_fts_update_log();
- 20. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index
update -- NOTIFY to fts_update_log channel! CREATE FUNCTION
do_notify_fts_update_log() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN NOTIFY fts_update_log; RETURN NEW; END $$;
- 21. PGroonga & Zulip Powered by Rabbit 2.2.1 Delay index
update cursor.execute("LISTEN ftp_update_log") # Wait
cursor.execute("SELECT id, message_id FROM fts_update_log") ids =
[] for (id, message_id) in cursor.fetchall():
cursor.execute("UPDATE zerver_message SET search_tsvector = "
"to_tsvector('zulip.english_us_search', " "rendered_content) "
"WHERE id = %s", (message_id,)) ids.append(id)
cursor.execute("DELETE FROM fts_update_log WHERE id = ANY(%s)",
(ids,))
- 22. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga:
index update PGroonga PGroonga's index update is fast too PGroonga
PGroonga's search while index update is still fast PGroonga
- 23. PGroonga & Zulip Powered by Rabbit 2.2.1 Perf
characteristics Searchthroughput Update throughput PGroonga
Searchthroughput Update throughput GIN Keep search performance
while many updates Decrease search performance while updating
- 24. PGroonga & Zulip Powered by Rabbit 2.2.1 Update and
lock Update without read locks Write locks are required
- 25. PGroonga & Zulip Powered by Rabbit 2.2.1 GIN:
Read/Write GIN Conn1 Conn2 INSERT start SELECT start Blocked INSERT
finish SELECT finish GIN Slow down!
- 26. PGroonga & Zulip Powered by Rabbit 2.2.1 PGroonga:
Read/Write PGroonga Conn1 Conn2 INSERT start SELECT start INSERT
finish SELECT finish PGroonga No slow down!
- 27. PGroonga & Zulip Powered by Rabbit 2.2.1 Wrap up Zulip:
Low latency for UX ZulipUX Delay index update PGroonga: Keeps fast
search with update PGroonga Chat friendly characteristics
- 28. PGroonga & Zulip Powered by Rabbit 2.2.1 More PGroonga
features PGroonga Query expansion Support synonyms Similar search
Find similar messages Fuzzy search Stemming