Given a user-provided query table, a Search Join [1] finds related tables and integrates them with the query table. The result is an enriched query table with additional attributes.
To run the search join, you first need to create an index over a corpus of tables:
#!/bin/bash
JAR="insert path to jar file here"
CLS="de.uni_mannheim.informatik.dws.searchjoin.cli.TableIndexing"
IDX="index/"
WEB="insert path to tables here"
java c- $JAR $CLS -index $IDX $WEB
Run the search join with the just created index:
#!/bin/bash
JAR="insert path to jar file here"
CLS="de.uni_mannheim.informatik.dws.searchjoin.cli.SearchJoin"
IDX="index/"
QUERY="insert path to query table(s) here"
RESULT="result/"
java c- $JAR $CLS -index $IDX -out $RESULT $QUERY
This project is a simplified implementation of the Mannheim Search Join Engine [1-3] developed at the Data and Web Science Group at the University of Mannheim based on the WInte.r Framework. It is designed to be used with the Web Data Commons Web Tables corpora. Other sources of tables can be used as long as they use the same data format.
The Search Join code can be used under the Apache 2.0 License.
[1] Lehmberg, O., Ritze, D., Ristoski, P., Meusel, R., Paulheim, H., & Bizer, C. (2015). The Mannheim Search Join Engine. Web semantics: science, services and agents on the World Wide Web, 35, 159-166.
[2] http://searchjoins.webdatacommons.org/
[3] Christian Bizer: Search Joins with the Web. Invited Lecture at the 17th International Conference on Database Theory (ICDT2014), Athens, Greece, March 2014.