insensitive indexes on Postgres string array
I'm using a varchar[]
column (varchar array) in Postgres 9.2 to store some tags. While retrieving rows by tags, I want the query to be case insensitive. However, I want to preserve the case to display in the UI (therefore I can't just store everything as lower case).
So, my question is how do I create a case-insensitive index in Postgres over a varchar array? One possible approach would be to create a functional GIN index on the column. How does one do that? Any other approaches?
@Saurabh Nanda: Similar to what you posted, you can also create a simple function to convert your varchar array to lowercase as follows:
CREATE OR REPLACE FUNCTION array_lowercase(varchar[]) RETURNS varchar[] AS
$BODY$
SELECT array_agg(q.tag) FROM (
SELECT btrim(lower(unnest($1)))::varchar AS tag
) AS q;
$BODY$
language sql IMMUTABLE;
Note that I'm also trimming the tags of spaces. This might not be necessary for you but I usually do for consistency.
Testing:
SELECT array_lowercase(array['Hello','WOrLD']);
array_lowercase
-----------------
{hello,world}
(1 row)
As noted by Saurabh, you can then create a GIN index:
CREATE INDEX ix_tags ON tagtable USING GIN(array_lowercase(tags));
And query:
SELECT * FROM tagtable WHERE ARRAY['mytag'::varchar] && array_lowercase(tags);
UPDATE: Performance of WHILE
vs array_agg/unnest
I created table of 100K 10 element text[]
arrays (12 character random mixed case strings) and tested each function.
The array_agg/unnest function returned:
EXPLAIN ANALYZE VERBOSE SELECT array_lowercase(data) FROM test;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.test (cost=0.00..28703.00 rows=100000 width=184) (actual time=0.320..3041.292 rows=100000 loops=1)
Output: array_lowercase((data)::character varying[])
Total runtime: 3174.690 ms
(3 rows)
The WHILE function returned:
EXPLAIN ANALYZE VERBOSE SELECT array_lowercase_while(data) FROM test;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.test (cost=0.00..28703.00 rows=100000 width=184) (actual time=5.128..4356.647 rows=100000 loops=1)
Output: array_lowercase_while((data)::character varying[])
Total runtime: 4485.226 ms
(3 rows)
UPDATE 2: FOREACH
vs. WHILE
As a final experiment, I changed the WHILE function to use FOREACH:
CREATE OR REPLACE FUNCTION array_lowercase_foreach(p_input varchar[]) RETURNS varchar[] AS $BODY$
DECLARE
el text;
r varchar[];
BEGIN
FOREACH el IN ARRAY p_input LOOP
r := r || btrim(lower(el))::varchar;
END LOOP;
RETURN r;
END;
$BODY$
language 'plpgsql'
Results appeared to be similar to WHILE
:
EXPLAIN ANALYZE VERBOSE SELECT array_lowercase_foreach(data) FROM test;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on public.test (cost=0.00..28703.00 rows=100000 width=184) (actual time=0.707..4106.867 rows=100000 loops=1)
Output: array_lowercase_foreach((data)::character varying[])
Total runtime: 4239.958 ms
(3 rows)
Though my tests are not by any means rigorous, I did run each version a number of times and found the numbers to be representative, suggesting that the SQL method (array_agg/unnest) is the fastest.
Found one possible approach using a custom pgplsql function:
First declare a custom function that takes a varchar[] array as input and returns a new array with all elements converted to lowercase. (This is the first time I'm writing PL/SQL, so this might be very inefficient code).
CREATE OR REPLACE FUNCTION array_lowercase(varchar[]) RETURNS varchar[] AS $$
DECLARE
i INTEGER;
l INTEGER;
r VARCHAR[];
inp ALIAS FOR $1;
BEGIN
i := 1;
l := array_length($1, 1);
WHILE i <= l LOOP
r[i] = lower(inp[i]);
i := i + 1;
END LOOP;
RETURN r;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
Next, create a GIN index on the expression using the newly defined array_lowercase
function:
create index hotel_bookings_tags on hotel_bookings using gin(array_lowercase(tags));
Now use it in a query (verify that it's using the index using EXPLAIN):
select * from posts where array[(varchar 'some_tag')] && array_lowercase(tags);
不知道它有帮助,但我正在寻找类似于text[]
东西,并使用了类型转换:
select id from product where lower(tags::text)::text[] && array['tat'];
链接地址: http://www.djcxy.com/p/87220.html
下一篇: 对Postgres字符串数组不敏感的索引