Project experts-semantic-analysis
This project is part of a scientific research. We are interested in knowing how to find experts in online community.
You can find here the analysis of the paper "Finding Topical Experts in Question & Answer Communities" published at "The 16th IEEE International Conference on Advanced Learning Technologies - ICALT 2016" (http://ieeexplore.ieee.org/document/7757009/ ).
In other words, you will find here everything you need to reproduce this research.
If you use anything related to this research, please cite:
@inproceedings{topical-experts-2016,
title={Finding Topical Experts in Question \& Answer Communities},
author={Procaci, Thiago B and Nunes, Bernardo Pereira and Nurmikko-Fuller, Terhi and Siqueira, Sean WM},
booktitle={Advanced Learning Technologies (ICALT), 2016 IEEE 16th International Conference on},
pages={407--411},
year={2016},
organization={IEEE}
}
Comments about the database tables
The names of the tables were written in Portuguese.
We translated the names in order to help you to understand.
Table name
Translation (english)
usuario
user
pergunta
question
resposta
answer
comentariopergunta
comments on question
comentarioresposta
comments on answer
forum
forum
anotacoes
annotations
entidades
entities
tag
tag
perguntatag
question tag
We also translated some important table fields:
Field name
Translation (english)
id
id
reputacao
reputation
nome
name
titulo
title
texto
text
usuarioID
user id
forumID
forum id
perguntaID
question id
respostaID
answer id
tagID
tag id
dataCriacao
creation date
votosPositivos
number of votes up
votosNegativos
number of votes down
numeroVisualizacao
number of visualizations
tipo
type
titulo_pergunta
Question title
texto_pergunta
Question text
texto_resposta
Answer text
comentario_pergunta
Comment on question text
comentario_resposta
Comment on answer text
To build each analysis, follow the instructions:
TABLE I. OVERVIEW OF THE BQA REPUTATION SCORE
Go to folder table-I-reputation
Run reputation.sql
Export sql data to csv file (like reputation.csv)
Run table-1-reputation.R
The results should look like:
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
1.0
17.0
101.0
193.5
136.0
16660.0
TABLE II and III. Correlation (Spearman and Kendall)
Go to folder table-II-III-correlation
Run correlation.sql
Export sql data to csv file (like correlation.csv)
Run correlation.R
The results should look like:
attribute
method
correlation
p-value
valid
desc
QUESTIONS
"spearman"
"0.190391450769118"
"0.185379772049513"
"No"
"weak"
ANSWERS
"spearman"
"0.765476070024332"
"0.0000000000951391018063494"
"Yes"
"strong"
COMMENTS_ON_QUESTION
"spearman"
"0.607756034714175"
"0.00000285881187179338"
"Yes"
"moderate"
COMMENTS_ON_ANSWERS
"spearman"
"0.717891463777467"
"0.00000000441627523848938"
"Yes"
"strong"
attribute
method
correlation
p-value
valid
desc
QUESTIONS
"kendall"
"0.136604275184825"
"0.172782763180915"
"No"
"weak"
ANSWERS
"kendall"
"0.577659514570134"
"0.00000000399321886668247"
"Yes"
"moderate"
COMMENTS_ON_QUESTION
"kendall"
"0.431902165608286"
"0.00000998873197555206"
"Yes"
"moderate"
COMMENTS_ON_ANSWERS
"kendall"
"0.521100053348757"
"0.00000010271813755125"
"Yes"
"moderate"
TABLE IV. Users Grouped by Expertise
Go to folder table-IV-recommendation
Execute all sqls in folder sqls
Export sql data to csv data file (like the CSVs in the folder data)
Run table-IV.R
The results should look like:
entity
VOTE_UP_MEAN
VOTE_DOWN_MEAN
PV
EFFECT_SIZE
NUMBER_USER
aerobic respiration
"5.7714+-4.70"
"0.08571+-0.28"
"< 0,001"
"97%"
"14"
amphibians
"2.1333+-1.68"
"0.1333+-0.35"
"< 0,001"
"93%"
"6"
bacteria
"4.6567+-4.34"
"0.0469+-0.21"
"< 0,001"
"98%"
"46"
blood
"3.0909+-2.07"
"0.0000+-0.00"
"< 0,001"
"93%"
"12"
brain
"4.2732+-3.49"
"0.1183+-0.41"
"< 0,001"
"98%"
"37"
cancer
"3.8389+-3.33"
"0.02778+-0.22"
"< 0,001"
"97%"
"26"
chromosomes
"4.2553+-3.68"
"0.1081+-0.31"
"< 0,001"
"96%"
"38"
DNA
"4.7837+-4.83"
"0.04171+-0.23"
"< 0,001"
"98%"
"46"
enzymes
"4.6010+-4.86"
"0.0146+-0.12"
"< 0,001"
"98%"
"38"
evolutionary
"5.3216+-4.83"
"0.0924+-0.36"
"< 0,001"
"97%"
"47"
gene
"3.9551+-3.28"
"0.04289+-0.26"
"< 0,001"
"97%"
"46"
genetic code
"6.0120+-6.97"
"0.04217+-0.20"
"< 0,001"
"99%"
"25"
genomes
"4.1137+-3.15"
"0.04046+-0.24"
"< 0,001"
"98%"
"45"
hormones
"4.9606+-4.70"
"0.1102+-0.65"
"< 0,001"
"95%"
"26"
humans
"4.8293+-4.60"
"0.08302+-0.37"
"< 0,001"
"97%"
"50"
muscle
"3.2138+-2.98"
"0.06207+-0.24"
"< 0,001"
"96%"
"23"
organisms
"4.9665+-5.12"
"0.09152+-0.33"
"< 0,001"
"97%"
"45"
plants
"5.7218+-8.35"
"0.06031+-0.37"
"< 0,001"
"98%"
"37"
protein
"4.3315+-3.69"
"0.04375+-0.25"
"< 0,001"
"98%"
"45"
ribosome
"3.7261+-3.62"
"0.03043+-0.17"
"< 0,001"
"96%"
"41"
RNA
"5.6694+-6.22"
"0.04918+-0.25"
"< 0,001"
"98%"
"39"
species
"4.3077+-4.47"
"0.07835+-0.30"
"< 0,001"
"98%"
"42"
vaccine
"5.9744+-5.80"
"0.01282+-0.11"
"< 0,001"
"95%"
"11"
Virus
"4.4777+-4.52"
"0.02077+-0.16"
"< 0,001"
"97%"
"31"
TABLE V. Users Grouped by Expertise - No reputation
Go to folder table-V-recommendation
Execute all sqls in folder sqls
Export sql data to csv data file (like the CSVs in the folder data)
Run table-V.R
The results should look like:
entity
VOTE_UP_MEAN
VOTE_DOWN_MEAN
PV
EFFECT_SIZE
NUMBER_USER
aerobic respiration
"5.2424+-4.06"
"0.06061+-0.24"
"< 0,001"
"97%"
"12"
amphibians
"2.2500+-1.69"
"0.1250+-0.34"
"< 0,001"
"94%"
"7"
bacteria
"4.0720+-3.78"
"0.04974+-0.22"
"< 0,001"
"97%"
"47"
blood
"2.7200+-2.07"
"0.0000+-0.00"
"< 0,001"
"94%"
"12"
brain
"4.0494+-3.55"
"0.07716+-0.31"
"< 0,001"
"97%"
"42"
cancer
"3.6755+-3.25"
"0.0266+-0.22"
"< 0,001"
"97%"
"30"
chromosomes
"3.6863+-2.75"
"0.1025+-0.30"
"< 0,001"
"95%"
"40"
DNA
"4.5462+-4.73"
"0.0375+-0.22"
"< 0,001"
"98%"
"45"
enzymes
"4.1162+-4.41"
"0.01695+-0.13"
"< 0,001"
"98%"
"39"
evolutionary
"5.1257+-4.54"
"0.08461+-0.35"
"< 0,001"
"97%"
"47"
gene
"3.7699+-3.20"
"0.04084+-0.25"
"< 0,001"
"97%"
"48"
genetic code
"5.8121+-6.88"
"0.04242+-0.20"
"< 0,001"
"99%"
"24"
genomes
"3.7837+-3.02"
"0.04436+-0.25"
"< 0,001"
"97%"
"45"
hormones
"4.7054+-4.90"
"0.1250+-0.57"
"< 0,001"
"94%"
"25"
humans
"4.6830+-4.66"
"0.06575+-0.30"
"< 0,001"
"97%"
"50"
muscle
"3.0855+-2.94"
"0.04605+-0.21"
"< 0,001"
"96%"
"24"
organisms
"4.6674+-5.04"
"0.07296+-0.30"
"< 0,001"
"97%"
"45"
plants
"5.5293+-8.37"
"0.05455+-0.36"
"< 0,001"
"98%"
"40"
protein
"4.1062+-3.60"
"0.03443+-0.22"
"< 0,001"
"97%"
"44"
ribosome
"3.2035+-3.10"
"0.01732+-0.13"
"< 0,001"
"95%"
"41"
RNA
"5.1458+-5.94"
"0.03836+-0.23"
"< 0,001"
"98%"
"38"
species
"4.1569+-4.48"
"0.07703+-0.30"
"< 0,001"
"97%"
"44"
vaccine
"5.5698+-5.67"
"0.01163+-0.11"
"< 0,001"
"95%"
"15"
Virus
"4.2332+-4.34"
"0.02145+-0.16"
"< 0,001"
"97%"
"33"
TABLE VI. Recommendation Testing
Go to folder table-V-recommendation
Execute all sqls in folder sql-recommendation and sql-recommendation2
Export sql data to csv data file (like the CSVs in the folder recommedation)
Run recommedation-test.R
The results should look like:
entity
"USER_REC"
"USER_ANSWERED_QUESTION"
"PERCENT"
"AVG_ANSWERS"
"AVG_VOTE_UP"
"AVG_VOTE_DOWN"
"PV"
"EFFECT_SIZE"
"aerobic respiration"
"12"
"4"
"33%"
"2+-0.58"
"4+-1.84"
"0.00+-0.00"
"0.0202"
"100%"
"amphibians"
"7"
"2"
"29%"
"1+-0.00"
"2+-3.54"
"0.00+-0.00"
"0.6171"
"75%"
"bacteria"
"47"
"40"
"85%"
"5+-5.60"
"4+-3.67"
"0.06+-0.18"
"< 0,001"
"97%"
"blood"
"12"
"5"
"42%"
"1+-0.89"
"4+-2.98"
"0.00+-0.00"
"0.0254"
"90%"
"brain"
"42"
"34"
"81%"
"3+-2.72"
"4+-2.96"
"0.05+-0.18"
"< 0,001"
"98%"
"cancer"
"30"
"22"
"73%"
"2+-1.37"
"4+-2.50"
"0.00+-0.00"
"< 0,001"
"100%"
"chromosomes"
"40"
"33"
"82%"
"2+-2.10"
"4+-3.59"
"0.06+-0.24"
"< 0,001"
"100%"
"dna"
"45"
"44"
"98%"
"7+-7.82"
"4+-2.63"
"0.03+-0.09"
"< 0,001"
"100%"
"enzymes"
"39"
"29"
"74%"
"3+-3.99"
"3+-2.34"
"0.01+-0.06"
"< 0,001"
"98%"
"evolutionary"
"47"
"44"
"94%"
"7+-8.97"
"5+-3.53"
"0.09+-0.19"
"< 0,001"
"100%"
"gene"
"48"
"44"
"92%"
"7+-7.27"
"3+-2.00"
"0.02+-0.07"
"< 0,001"
"100%"
"genetic code"
"24"
"15"
"62%"
"2+-0.72"
"4+-2.26"
"0.07+-0.26"
"< 0,001"
"100%"
"genomes"
"45"
"36"
"80%"
"4+-4.15"
"4+-3.34"
"0.04+-0.17"
"< 0,001"
"98%"
"hormones"
"25"
"15"
"60%"
"3+-2.37"
"4+-2.98"
"0.00+-0.00"
"< 0,001"
"97%"
"humans"
"50"
"50"
"100%"
"9+- 9.50"
"5+-6.52"
"0.05+-0.13"
"< 0,001"
"100%"
"muscle"
"24"
"13"
"54%"
"2+-1.50"
"4+-2.06"
"0.08+-0.28"
"< 0,001"
"100%"
"organisms"
"45"
"42"
"93%"
"5+-5.76"
"5+-3.92"
"0.09+-0.27"
"< 0,001"
"100%"
"plants"
"40"
"38"
"95%"
"4+-4.49"
"4+-3.35"
"0.03+-0.10"
"< 0,001"
"99%"
"protein"
"44"
"39"
"89%"
"8+-9.02"
"4+-2.42"
"0.007+-0.03"
"< 0,001"
"96%"
"ribosome"
"41"
"30"
"73%"
"2+-2.09"
"3+-2.16"
"0.08+-0.32"
"< 0,001"
"96%"
"rna"
"38"
"25"
"66%"
"3+-3.34"
"4+-2.97"
"0.05+-0.20"
"< 0,001"
"100%"
"species"
"44"
"42"
"95%"
"6+-5.97"
"5+-3.57"
"0.12+-0.36"
"< 0,001"
"98%"
"vaccine"
"15"
"7"
"47%"
"1+-0.38"
"6+-8.32"
"0.00+-0.00"
"0.0037"
"93%"
"virus"
"33"
"25"
"76%"
"3+-2.33"
"4+-2.72"
"0.00+-0.00"
"< 0,001"
"98%"