Demo Solr and PHP integration
Uses PHP 7.x, Solarium 5.2.0 and Solr 8.5.x (Java JDK 8 or 11)
See https://lucene.apache.org/solr/downloads.html
Download zip from Apache solr website, unzip it, and run it locally as a non-root user on standard port 8983. Make sure the server is only accessible via localhost (not for the whole internet)
In the solr directory:
bin/solr start -Djetty.host=127.0.0.1
Create a new 'core' with default configuration
bin/solr create_core -c igo
Then, edit the solrconfig.xml in server/solr/igo/ to add Tika's extract handler (to extract text from various document formats)
See also https://lucene.apache.org/solr/guide/8_5/uploading-data-with-solr-cell-using-apache-tika.html
Stop and restart Solr
bin/solr stop -Djetty.host=127.0.0.1
bin/solr start -Djetty.host=127.0.0.1
See https://getcomposer.org/doc/00-intro.md#installation-linux-unix-macos
See https://solarium.readthedocs.io/en/stable/
Put a new composer.json in an empty directory, and let composer download the required dependencies (a vendor subdirectory will be created)
composer.phar install
Let composer generate the autoload file in vendor/autoload.php
composer.phar dump-autoload
php test.php
This will add two PDF files to Solr index. Tika/Solr will try to reuse as much metadata from the PDF as possible, and automatically create fields. Each document is also given two additional fields (not part of the PDF-metdata): a custom ID and a date.
Then two queries are performed and the name + all fields are displayed:
- a full text search on the word
virtueel
(one result) - a full text search on the word
BOSA
- which is mentioned in both documents - plus filtering on date field (less than Jan 1st, 2020) (also one result)