full text indexing of recursive nested jsonb structures

Lets say I have a book represented like this:

{ title: "Tis no book"
  chapter: 1,
  text: "Hello world this is a book chapter",
  subchapters: [
     {
     chapter: 1.1
     text: "Nested sub chapter"
     subchapters: [
       chapter: 1.1.1
       text: "Nested nested..."
       subchapters: [ ...etc...]
     },
     {
     chapter: 1.2
     text: "Nested sub chapter 2"
     subchapters: [ ...etc...]
     }
   ]
}

Can I use postgres' 9.4 new jsonb/gin (or anything else) to set up a full text index on the "text" field of this recursively nested data structure such that a library of books stored to the database can be searched by text (using the index)?


I have just started looking into full-text search and jsonb types. It looks like this is possible once you understand how the indexes work on JSONB types. I have found this blog series to be very helpful.

https://bibhas.in/blog/postgresql-swag-part-2-indexing-json-data-type-and-full-text-search/

Also, the documentation on the JSON type at Postgres contains some good insights. http://www.postgresql.org/docs/9.4/static/datatype-json.html


One option -- and likely the best -- is to create a recursive function (official documentation on functions) that combines the text values of the entire JSONB structure into a single string (which it returns). You can full-text index the output of that function (see @thomas-wayne-shelton's answer) as if it were just a string. So long as you use the same function in your WHERE conditions, pgSQL should recognize the index.

I just built/tested a similar "flattening" function so I'm confident that it can be done. Unfortunately, my case was very different so I suspect the code would confuse rather than enlighten. I can say that the function(s) must be marked as IMMUTABLE to support indexing.

===

The other path that looks promising (but I believe a dead-end) is a recursive CTE.

It is certainly possible to unwrap the recursive data using a recursive CTE. Here's the official documentation, a SO answer, and a blog example -- the last two for recursive nested JSONB specifically.

However, I don't believe you can index this output. It's my (tentative) understanding that a query using a recursive CTE is a lot like a view. It's a runtime optimization. I no longer have reference on-hand but recall seeing recent (as of Apr. 2018) discussion that indexing views was still a long way off (and I'm not even sure an index on a view would actually work for a CTE).


I won't answer the question; instead, I'm going to suggest a completely different approach.

Have you taken a look at Lucene, https://lucene.apache.org/core/? Implementing full-text search in the database is bad design. You should be using a separate full-text indexer such as Lucene. The Lucene documents should refer to the database keys which can be used to look up the real record in the database.

By using Lucene, you will most likely get much better performance than by using the full-text search features of the database. Furthermore, Lucene is far easier to scale out than your database.

链接地址: http://www.djcxy.com/p/84264.html

上一篇: 如何将代码注入iOS进程

下一篇: 递归嵌套jsonb结构的全文索引