As seen in the European Journalism Observatory (in French)
The bylines of more than 1.8 million articles (blogs were excluded) published by 16 of the HuffPost's 18 versions were scraped1.
Each byline was categorized thus:
HP_yes
when the byline includes "HuffPost" or the name of an employee or freelancer, even if another media organization is mentioned.HP_no
when the byline attributes authorship exclusively to another media organization (press agency, website, newspaper, etc.).HP_unknown
when authorship cannot be established.
The table below gives the rates per edition.
Links are to Jupyter notebooks in French.
One can access the raw data by downloading this csv file: scraping-nettoye.csv (apart from the Spanish edition, whose data is included in this file: scraping-ES-2.csv).
version | launch date | articles | HP_yes |
HP_no |
HP_unknown |
originality rate |
---|---|---|---|---|---|---|
US2, site | 2005-05-09 | 550 955 | 250 528 | 210 226 | 90 201 | 45.5% |
Canada, site | 2011-05-26 | 265 153 | 40 809 | 222 950 | 1 394 | 15.4% |
United Kingdom, site | 2011-07-06 | 161 263 | 118 317 | 42 757 | 189 | 73.4% |
France, site | 2012-01-23 | 54 156 | 49 815 | 4 088 | 253 | 92.0% |
Québec, site | 2012-02-08 | 390 231 | 44 282 | 344 510 | 1 439 | 11.3% |
Spain, site | 2012-06-07 | 56 348 | 48 879 | 7 381 | 88 | 86.7% |
Italy, site | 2012-09-24 | 64 880 | 53 820 | 9 944 | 1 116 | 83.0% |
Japan, site | 2013-05-06 | 23 708 | 16 490 | 6 865 | 353 | 69.6% |
Maghreb, site | 2013-06-25 | 28 653 | 25 200 | 3 337 | 116 | 87.9% |
Germany, site | 2013-10-01 | 68 733 | 31 831 | 33 445 | 3 457 | 46.3% |
Brazil, site | 2014-01-29 | 20 831 | 14 543 | 5 745 | 543 | 69.8% |
South Korea, site | 2014-02-26 | 51 890 | 25 945 | 25 476 | 469 | 50.0% |
Greece, site | 2014-11-20 | 55 433 | 55 004 | 279 | 150 | 99.2% |
India, site | 2014-12-08 | 14 618 | 8 613 | 3 154 | 2 851 | 58.9% |
Australia, site | 2015-08-18 | 17 154 | 12 335 | 3 255 | 1 564 | 71.9% |
Mexico, site | 2016-09-01 | 2 168 | 1 916 | 102 | 150 | 88.4% |
All | 1 826 174 | 798 327 | 923 514 | 104 333 | 43.7% |