Solr does not highlight some words

I configured solr 4.10 (also 5.3) with highlighting functionality. It works fine with most of the words, however I found some words which " does not " allow highlightings, that is, solr returns the required docs, but does not highlights some of them.

What can cause such effect?

solrconfig.xml

 <requestHandler name="/select" class="solr.SearchHandler">
 <lst name="defaults">
   <str name="wt">json</str>
   <str name="indent">true</str>
   <str name="defType">edismax</str>
   <str name="bf">product(concount)</str>
   <str name="df">text bio text_syn text_syn_other</str>
   <str name="qf">
    text^25 bio^16 text_syn^8 text_syn_other^3
   </str>
   <str name="hl">on</str>
   <str name="hl.fl">text bio text_syn text_syn_other</str>
   <str name="hl.preserveMulti">true</str>
   <str name="hl.encoder">html</str>
   <str name="f.text.hl.fragsize">100</str>
   <str name="hl.snippets">20</str>
   <arr name="components">
     <str>highlight</str>
   </arr>
 </lst>

schema.xml

    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[sn,/]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms_abbr.txt" ignoreCase="true" expand="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[sn,/]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<fieldType name="text_en_syn" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[sn,/]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[sn,/]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<fieldType name="text_en_syn_other" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[sn,/]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms_other.txt" ignoreCase="true" expand="false"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[sn,/]" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

<field name="text" type="text_en" indexed="true" stored="true" multiValued="false" />
<field name="text_syn" type="text_en_syn" indexed="true" stored="false" multiValued="true" />
<field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="false" multiValued="true" />

<field name="text_exact" type="string" indexed="true" stored="false" multiValued="false" />

<field name="bio" type="text_en" indexed="true" stored="true" multiValued="false" />

<field name="bio_exact" type="string" indexed="true" stored="false" multiValued="false" />

<field name="concount" type="long" indexed="true" stored="true" multiValued="false" />

<field name="concount_exact" type="long" indexed="true" stored="false" multiValued="false" />

<copyField source="text" dest="text_syn"/>
<copyField source="bio" dest="text_syn"/>
<copyField source="text" dest="text_syn_other"/>
<copyField source="bio" dest="text_syn_other"/>

For the query http://localhost:8983/solr/select?q=senior I got docs containing the word senior , but in highlighting section of solr response that word is not highlighted.


UPDATE 1: I find out that I have the word senior in my synonyms_abbr.txt file, the line senior,lead . When I commented that line or replaced the places of words, lead,senior , surprisingly the word senior started geting highlighting. Any ideas ?


UPDATE 2: Words from synonyms.txt and synonyms_other.txt are getting highlighting normally, but words from synonyms_abbr.txt behave strangely as follows. For example, I have the line lead,head,senior in synonyms_abbr.txt then

  • the queries http://localhost:8983/solr/select?q=senior and http://localhost:8983/solr/select?q=head does not highlight any word,
  • the query http://localhost:8983/solr/select?q=lead highlights not only the word lead , but also head and senior .

  • From your update2 it is clear that only the first word among lead,head,senior is actually used for synonym matching and highlighting.

    If you look at Docs on SolrWiki https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters there is a mention of expand=true having a certain effect

    The synonyms parameter names an external file defining the synonyms. If ignoreCase is true, matching will lowercase before checking equality. If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list .

    The site also presents and example

    # If expand==true, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
    ipod, i-pod, i pod => ipod, i-pod, i pod
    # If expand==false, "ipod, i-pod, i pod" is equivalent to the explicit mapping:
    ipod, i-pod, i pod => ipod
    

    This seems to be consistent with the behaviour you are observing. This implies that you should change the Synonym filters definition in schema.xml to use expand=true OR change the way your synonyms file defines the filter to use explicit mapping.

    Additionally since the Analyzers work at time of indexing, you may have to reindex documents for this to work.


    Some fields are not stored thus cannot be returned. Since they are indexed they are searchable. Change your schema to have stored="true" for all the fields you want to highlight.

    <field name="text_syn" type="text_en_syn" indexed="true" stored="true" multiValued="true" />
    <field name="text_syn_other" type="text_en_syn_other" indexed="true" stored="true" multiValued="true" />
    

    By looking at your config I presume highlighting works on the fields bio and text?


    你可以尝试添加高级,领导和领导,高级的文件synony_abbr.txt,然后尝试运行荧光笔

    链接地址: http://www.djcxy.com/p/88356.html

    上一篇: 离线使用Google地图(Js Api)

    下一篇: Solr不会突出显示某些词语