Open main menu

UESPWiki β

User:Daveh/CirrusSearch

< User:Daveh

Notes involving the setup of CirrusSearch for the UESP wiki.

SetupEdit

  • Install Java if not already installed on server (yum install java-1.7.0-openjdk.x86_64).
  • Download Elastic Search and uncompress. Or install via yum (?).
  • If using manual install, copy to /home/uesp/elasticsearch/.
  • Add a script.disable_dynamic: false line to config/elasticsearch.yml.
  • Set the amount of memory to use in bin/elasticsearch.in.sh with two lines like:
    ES_HEAP_SIZE=2g
    MAX_LOCKED_MEMORY=unlimited
  • Create/copy an init.d script in /etc/init.d/elasticsearch. Start ES as a daemon (/home/uesp/elasticsearch/bin/elasticsearch -d) with the user uesp.
  • Add elasticsearch to chkconfig startup.
    chkconfig --add elasticsearch
    chkconfig --level 345 elasticsearh on
    require_once( "$IP/extensions/Elastica/Elastica.php");
    require_once( "$IP/extensions/CirrusSearch/CirrusSearch.php" );
    # $wgDisableSearchUpdate = true;
    $wgCirrusSearchServers = array( '10.7.143.20' );
    $wgSearchType = 'CirrusSearch';
  • Create the ES index:
    php ./extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php
  • Force the index to be updated:
    php ./extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
    php ./extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse
Average index rate for the first step on content3 was ~3 pages/second over 200349 page IDs. Second index step averaged around 40 pages/second.
Average index rate for the final index on files1 was ~14 pages/second using 4 parallel indexing operations on content1/2/3.
  • Test search function.

BenchmarkingEdit

  • dev.uesp.net
  • 600 MB RAM in use, Index size 700MB.
  • From LocalHost
  • Simple benchmark like ab -kc 10 -t 3 http://localhost:9200/uesp_net_wiki5_general_first/_search?q=something
  • Werewolf4: 902 req/sec (11 ms average, 99% at 55 ms)
  • Werewolf: 5560 req/sec (2 ms average, 99% at 13 ms)
  • Werewolf7: 1580 req/sec (6 ms average, 99% at 22 ms)
  • Vampire: 5050 req/sec (2 ms average, 99% at 13 ms)
  • Vampire+Werewolf: 1210 req/sec (8 ms average, 99% at 30 ms)
  • From Content1
  • Vampire: 2140 req/sec (5 ms average, 99% at 10 ms)
  • files1.uesp.net
  • From content2
  • Vampire: 1550 req/sec (6.5 ms average, 99% at 14 ms)
  • Werewolf: 1070 req/sec (9.3 ms average, 99% at 17 ms)
  • Vampire+Werewolf: 150 req/sec (67 ms average, 99% at 92 ms)

Search HighlighterEdit

  • This details installing the ElasticSearch searchhighlighter plugin so we can use the $wgCirrusSearchUseExperimentalHighlighter = true; feature in MW. Note that setting this to true without the plugin causes the search to crash.
  • The installation for v1.7 detailed at https://github.com/wikimedia/search-highlighter doesn't work as the JAR files are no longer available at the original location. Instead use the following command on your ElasticSearch setup:
      ./bin/plugin --install wikimedia/search-highlighter --url https://download.jar-download.com/cache_jars/org.wikimedia.search.highlighter/experimental-highlighter-elasticsearch-plugin/1.7.0/jar_files.zip
  • Restart ElasticSearch.
  • Check the ElasticSearch log and look for a line like:
      [plugins                 ] [Domina] loaded [experimental highlighter], sites []
to verify the plugin is installed and working.
  • You can also check the MW search query by adding &cirrusDumpQuery and looking for "type":"experimental" in the result.
  • Set the following in the MW config and test search:
      $wgCirrusSearchUseExperimentalHighlighter = true;
      $wgCirrusSearchOptimizeIndexForExperimentalHighlighter = true;

PortsEdit

ElasticSearch services running on search1:

  • v2.4 -- Port 9202
  • v5.3 -- Port 9005
  • v5.6 -- Port 9004
  • v6.8 -- Port 9006
  • v7.10 -- Port 9007

LinksEdit