diff --git a/.gitignore b/.gitignore index 2947144..ee09efe 100755 --- a/.gitignore +++ b/.gitignore @@ -38,4 +38,3 @@ vignettes/*.pdf # R Environment Variables .Renviron inst/doc -docs diff --git a/docs/LICENSE.html b/docs/LICENSE.html new file mode 100644 index 0000000..8668c3f --- /dev/null +++ b/docs/LICENSE.html @@ -0,0 +1,169 @@ + + + +
+ + + + +Copyright (c) 2021 Bryan Whiting
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+developers.Rmd
Starting a new package/one-time commands.
+
+devtools::create('generalconference')
+devtools::install()
+usethis::use_mit_license("My Name")
+# specify packages you want to use
+usethis::use_package('dplyr')
+usethis::use_package('rvest')
+# builds data-raw/ folder
+use_this::use_data_raw()
+# build functions in R and save, builds a test file
+usethis::use_r(name="new_func")
+usethis::use_test()
+
+# Add new package to DESCRIPTION as necessary
+usethis::use_package('xxxx'):
+
+# Once function is written, load it. You'll run `load_all()` multiple times.
+devtools::load_all()
+devtools::test()
+devtools::check() # checks package
_pkgdown.yml
with documents
+# Add new documentation
+usethis::use_vignette('introduction') # add a vignette
+
+# (optional, one-off steps) Build individual files
+devtools::run_examples() # builds examples and vignettes
+devtools::build_vignettes() #
+pkgdown::build_articles() #
+pkgdown::build_reference() # edit reference in _pkgdown.yml reference: section
+
+# Prepare the package
+devtools::document() # generates NAMESPACE from documentation. Exports functions.
+covr::report() # run the coverage test
+devtools::test() # run unit tests
+devtools::check() # check the package
+devtools::build() # build the package
+pkgdown::build_site() # Build the r package documentation
The data are nested to minimize redundancy, but they can easily be unnested.
+Example of nested data:
+
+library(dplyr)
+#>
+#> Attaching package: 'dplyr'
+#> The following objects are masked from 'package:stats':
+#>
+#> filter, lag
+#> The following objects are masked from 'package:base':
+#>
+#> intersect, setdiff, setequal, union
+mtcars %>%
+ select(mpg, disp, am, vs) %>%
+ tidyr::nest(data = c(vs, c(mpg, disp)))
+#> # A tibble: 2 × 2
+#> am data
+#> <dbl> <list>
+#> 1 1 <tibble [13 × 3]>
+#> 2 0 <tibble [19 × 3]>
example-analysis.Rmd
+library(generalconference)
+#> Loading required package: dplyr
+#>
+#> Attaching package: 'dplyr'
+#> The following objects are masked from 'package:stats':
+#>
+#> filter, lag
+#> The following objects are masked from 'package:base':
+#>
+#> intersect, setdiff, setequal, union
+#> Loading required package: glue
+#>
+#> Attaching package: 'glue'
+#> The following object is masked from 'package:dplyr':
+#>
+#> collapse
+#> Loading required package: furrr
+#> Loading required package: future
+#> Loading required package: purrr
+#> Loading required package: stringr
+#> Loading required package: readr
+#> Loading required package: rvest
+#>
+#> Attaching package: 'rvest'
+#> The following object is masked from 'package:readr':
+#>
+#> guess_encoding
+#> Loading required package: tictoc
+#> Loading required package: tidyr
+#> Loading required package: xml2
+library(dplyr)
+data("genconf")
+head(genconf)
+#> # A tibble: 6 × 3
+#> year month sessions
+#> <dbl> <dbl> <list>
+#> 1 1971 4 <tibble [7 × 4]>
+#> 2 1971 10 <tibble [7 × 4]>
+#> 3 1972 4 <tibble [7 × 4]>
+#> 4 1972 10 <tibble [7 × 4]>
+#> 5 1973 4 <tibble [7 × 4]>
+#> 6 1973 10 <tibble [7 × 4]>
+df <- genconf
How many conferences have there been since 1971?
+
+df %>%
+ count()
+#> # A tibble: 1 × 1
+#> n
+#> <int>
+#> 1 101
How many sessions have there been?
+ +How many talks have there been since 1971?
+ +how-to-scrape.Rmd
+library(rvest)
+library(dplyr)
+#>
+#> Attaching package: 'dplyr'
+#> The following objects are masked from 'package:stats':
+#>
+#> filter, lag
+#> The following objects are masked from 'package:base':
+#>
+#> intersect, setdiff, setequal, union
+library(xml2)
+rv_doc <- rvest::read_html("https://www.churchofjesuschrist.org/study/liahona/2020/11/15cook?lang=eng")
+rv_doc %>%
+ html_elements(".body-block") %>%
+ xml2::html_structure()
+#> [[1]]
+#> <div.body-block>
+#> <p#p5 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p6 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p7 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p8 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p9 [data-aid]>
+#> {text}
+#> <p#p42 [data-aid]>
+#> {text}
+#> <p#p10 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p43 [data-aid]>
+#> {text}
+#> <p#p44 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p11 [data-aid]>
+#> {text}
+#> <span.page-break [data-page]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p12 [data-aid]>
+#> <em>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p13 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p14 [data-aid]>
+#> <em>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p15 [data-aid]>
+#> {text}
+#> <p#p16 [data-aid]>
+#> {text}
+#> <a.scripture-ref [href]>
+#> {text}
+#> {text}
+#> <p#p17 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p18 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p19 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p38 [data-aid]>
+#> {text}
+#> <p#p39 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p20 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p21 [data-aid]>
+#> {text}
+#> <span.page-break [data-page]>
+#> {text}
+#> <p#p22 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p23 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p24 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <em>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p40 [data-aid]>
+#> {text}
+#> <p#p41 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p25 [data-aid]>
+#> {text}
+#> <a.scripture-ref [href]>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p26 [data-aid]>
+#> {text}
+#> <p#p27 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p28 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p29 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p30 [data-aid]>
+#> {text}
+#> <a.scripture-ref [href]>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p31 [data-aid]>
+#> {text}
+#> <a.scripture-ref [href]>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p32 [data-aid]>
+#> {text}
+#> <p#p33 [data-aid]>
+#> {text}
+#> <span.page-break [data-page]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p34 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> {text}
+#> <p#p35 [data-aid]>
+#> {text}
+#> <p#p36 [data-aid]>
+#> {text}
+#> <a.note-ref [to, href, data-scroll-id]>
+#> <sup.marker>
+#> {text}
+#> <p#p37 [data-aid]>
+#> {text}
Explore node 1:
+
+rv_doc %>%
+ html_elements(".body-block") %>%
+ xml2::xml_child(1)
+#> {html_node}
+#> <p data-aid="144618637" id="p5">
+#> [1] <a class="note-ref" to="[object Object]" href="#note1" data-scroll-id="no ...
Explore node 2:
+
+rv_doc %>%
+ html_elements(".body-block") %>%
+ xml2::xml_child(2)
+#> {html_node}
+#> <p data-aid="144618639" id="p6">
+#> [1] <a class="note-ref" to="[object Object]" href="#note2" data-scroll-id="no ...
+rv_doc %>%
+ html_elements(".body-block") %>%
+ xml_contents()
+#> {xml_nodeset (40)}
+#> [1] <p data-aid="144618637" id="p5">Righteousness and unity are profoundly s ...
+#> [2] <p data-aid="144618639" id="p6">As a young man not of our faith, General ...
+#> [3] <p data-aid="144618644" id="p7">In 1872, General Kane, his talented wife ...
+#> [4] <p data-aid="144618648" id="p8">During the trip they stayed in Fillmore ...
+#> [5] <p data-aid="144618651" id="p9">Elizabeth wrote that as Matilda was prep ...
+#> [6] <p data-aid="144618656" id="p42">Matilda’s son’s reply was, “She said ‘T ...
+#> [7] <p data-aid="144618659" id="p10">Elizabeth asked, “Will she really do th ...
+#> [8] <p data-aid="144618665" id="p43">Matilda’s son answered, “Mother will se ...
+#> [9] <p data-aid="144618668" id="p44">And so she did, and “they ate with perf ...
+#> [10] <p data-aid="144618672" id="p11">As leaders, we are not under the illusi ...
+#> [11] <p data-aid="144618676" id="p12"><em>Righteousness</em> is a broad, comp ...
+#> [12] <p data-aid="144618679" id="p13">Being righteous is not dependent on eac ...
+#> [13] <p data-aid="144618685" id="p14"><em>Unity</em> is also a broad, compreh ...
+#> [14] <p data-aid="144618690" id="p15">The context for my message is the contr ...
+#> [15] <p data-aid="144618696" id="p16">It has been 200 years since the Father ...
+#> [16] <p data-aid="144618701" id="p17">The historical record we read in 4 Neph ...
+#> [17] <p data-aid="144618706" id="p18">With respect to unity, 4 Nephi reads, “ ...
+#> [18] <p data-aid="144618710" id="p19">Unfortunately, 4 Nephi then describes a ...
+#> [19] <p data-aid="144618715" id="p38">“But O my son, how can a people like th ...
+#> [20] <p data-aid="144618720" id="p39">“How can we expect that God will stay h ...
+#> ...
+
+rv_doc %>%
+ html_elements(".body-block p")
+#> {xml_nodeset (40)}
+#> [1] <p data-aid="144618637" id="p5">Righteousness and unity are profoundly s ...
+#> [2] <p data-aid="144618639" id="p6">As a young man not of our faith, General ...
+#> [3] <p data-aid="144618644" id="p7">In 1872, General Kane, his talented wife ...
+#> [4] <p data-aid="144618648" id="p8">During the trip they stayed in Fillmore ...
+#> [5] <p data-aid="144618651" id="p9">Elizabeth wrote that as Matilda was prep ...
+#> [6] <p data-aid="144618656" id="p42">Matilda’s son’s reply was, “She said ‘T ...
+#> [7] <p data-aid="144618659" id="p10">Elizabeth asked, “Will she really do th ...
+#> [8] <p data-aid="144618665" id="p43">Matilda’s son answered, “Mother will se ...
+#> [9] <p data-aid="144618668" id="p44">And so she did, and “they ate with perf ...
+#> [10] <p data-aid="144618672" id="p11">As leaders, we are not under the illusi ...
+#> [11] <p data-aid="144618676" id="p12"><em>Righteousness</em> is a broad, comp ...
+#> [12] <p data-aid="144618679" id="p13">Being righteous is not dependent on eac ...
+#> [13] <p data-aid="144618685" id="p14"><em>Unity</em> is also a broad, compreh ...
+#> [14] <p data-aid="144618690" id="p15">The context for my message is the contr ...
+#> [15] <p data-aid="144618696" id="p16">It has been 200 years since the Father ...
+#> [16] <p data-aid="144618701" id="p17">The historical record we read in 4 Neph ...
+#> [17] <p data-aid="144618706" id="p18">With respect to unity, 4 Nephi reads, “ ...
+#> [18] <p data-aid="144618710" id="p19">Unfortunately, 4 Nephi then describes a ...
+#> [19] <p data-aid="144618715" id="p38">“But O my son, how can a people like th ...
+#> [20] <p data-aid="144618720" id="p39">“How can we expect that God will stay h ...
+#> ...
+
+rv_doc %>%
+ html_elements(".body-block") %>%
+ html_children()
+#> {xml_nodeset (40)}
+#> [1] <p data-aid="144618637" id="p5">Righteousness and unity are profoundly s ...
+#> [2] <p data-aid="144618639" id="p6">As a young man not of our faith, General ...
+#> [3] <p data-aid="144618644" id="p7">In 1872, General Kane, his talented wife ...
+#> [4] <p data-aid="144618648" id="p8">During the trip they stayed in Fillmore ...
+#> [5] <p data-aid="144618651" id="p9">Elizabeth wrote that as Matilda was prep ...
+#> [6] <p data-aid="144618656" id="p42">Matilda’s son’s reply was, “She said ‘T ...
+#> [7] <p data-aid="144618659" id="p10">Elizabeth asked, “Will she really do th ...
+#> [8] <p data-aid="144618665" id="p43">Matilda’s son answered, “Mother will se ...
+#> [9] <p data-aid="144618668" id="p44">And so she did, and “they ate with perf ...
+#> [10] <p data-aid="144618672" id="p11">As leaders, we are not under the illusi ...
+#> [11] <p data-aid="144618676" id="p12"><em>Righteousness</em> is a broad, comp ...
+#> [12] <p data-aid="144618679" id="p13">Being righteous is not dependent on eac ...
+#> [13] <p data-aid="144618685" id="p14"><em>Unity</em> is also a broad, compreh ...
+#> [14] <p data-aid="144618690" id="p15">The context for my message is the contr ...
+#> [15] <p data-aid="144618696" id="p16">It has been 200 years since the Father ...
+#> [16] <p data-aid="144618701" id="p17">The historical record we read in 4 Neph ...
+#> [17] <p data-aid="144618706" id="p18">With respect to unity, 4 Nephi reads, “ ...
+#> [18] <p data-aid="144618710" id="p19">Unfortunately, 4 Nephi then describes a ...
+#> [19] <p data-aid="144618715" id="p38">“But O my son, how can a people like th ...
+#> [20] <p data-aid="144618720" id="p39">“How can we expect that God will stay h ...
+#> ...
+rv_doc %>%
+ html_elements("header")
+#> {xml_nodeset (7)}
+#> [1] <header class="panelHeader-2k7Jd backToAll-1PgB6"><a class="backText-1xON ...
+#> [2] <header class="panelHeader-2k7Jd contentHead-3F0ox"><button class="sc-1g7 ...
+#> [3] <header class="bookmarkHeader-2Bn20"><span class="bookmarkManagerTitle-1U ...
+#> [4] <header class="downloadHead-3O2wO">Downloads</header>
+#> [5] <header class="settingsHead-3iDND">Footnotes</header>
+#> [6] <header class="settingsHead-3iDND">Theme</header>
+#> [7] <header><span class="page-break" data-page="18"></span><div class="bvqtyr ...
+rv_doc %>%
+ html_elements(".body") %>%
+ html_elements("header") %>%
+ html_text2()
+#> [1] "Hearts Knit in Righteousness and Unity\n\nBy Elder Quentin L. Cook\n\nOf the Quorum of the Twelve Apostles\n\nAt this 200-year hinge point in our Church history, let us commit ourselves to live righteously and be united as never before."
Get specific paragraph by id:
+
+rv_doc %>%
+ html_elements("#p5")
+#> {xml_nodeset (1)}
+#> [1] <p data-aid="144618637" id="p5">Righteousness and unity are profoundly si ...
Get multiple things at the same time (headers and paragraphs):
+
+rv_doc %>%
+ html_elements(".body-block h2, .body-block p")
+#> {xml_nodeset (40)}
+#> [1] <p data-aid="144618637" id="p5">Righteousness and unity are profoundly s ...
+#> [2] <p data-aid="144618639" id="p6">As a young man not of our faith, General ...
+#> [3] <p data-aid="144618644" id="p7">In 1872, General Kane, his talented wife ...
+#> [4] <p data-aid="144618648" id="p8">During the trip they stayed in Fillmore ...
+#> [5] <p data-aid="144618651" id="p9">Elizabeth wrote that as Matilda was prep ...
+#> [6] <p data-aid="144618656" id="p42">Matilda’s son’s reply was, “She said ‘T ...
+#> [7] <p data-aid="144618659" id="p10">Elizabeth asked, “Will she really do th ...
+#> [8] <p data-aid="144618665" id="p43">Matilda’s son answered, “Mother will se ...
+#> [9] <p data-aid="144618668" id="p44">And so she did, and “they ate with perf ...
+#> [10] <p data-aid="144618672" id="p11">As leaders, we are not under the illusi ...
+#> [11] <p data-aid="144618676" id="p12"><em>Righteousness</em> is a broad, comp ...
+#> [12] <p data-aid="144618679" id="p13">Being righteous is not dependent on eac ...
+#> [13] <p data-aid="144618685" id="p14"><em>Unity</em> is also a broad, compreh ...
+#> [14] <p data-aid="144618690" id="p15">The context for my message is the contr ...
+#> [15] <p data-aid="144618696" id="p16">It has been 200 years since the Father ...
+#> [16] <p data-aid="144618701" id="p17">The historical record we read in 4 Neph ...
+#> [17] <p data-aid="144618706" id="p18">With respect to unity, 4 Nephi reads, “ ...
+#> [18] <p data-aid="144618710" id="p19">Unfortunately, 4 Nephi then describes a ...
+#> [19] <p data-aid="144618715" id="p38">“But O my son, how can a people like th ...
+#> [20] <p data-aid="144618720" id="p39">“How can we expect that God will stay h ...
+#> ...
+header_ids <- rv_doc %>%
+ html_elements(".body-block h2") %>%
+ html_attr("id")
+p_ids <- rv_doc %>%
+ html_elements(".body-block p") %>%
+ html_element("#p1")
+xm_contents <- rv_doc %>%
+ html_elements(".body-block") %>%
+ xml_contents()
+rv_doc %>%
+ html_elements(".body-block") %>%
+ # html_children() %>%
+ xml_child(1) %>%
+ xml_contents() %>%
+ html_elements("p")
+#> {xml_nodeset (0)}
Scrape metadata for url
+
+rv_doc %>%
+ html_elements("head") %>%
+ html_elements("meta")
+#> {xml_nodeset (10)}
+#> [1] <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n
+#> [2] <meta charset="utf-8">\n
+#> [3] <meta name="viewport" content="width=device-width,initial-scale=1">\n
+#> [4] <meta data-react-helmet="true" name="Search.doc-aid" content="144618619">\n
+#> [5] <meta data-react-helmet="true" name="title" content="Hearts Knit in Righ ...
+#> [6] <meta data-react-helmet="true" name="description" content="Elder Cook en ...
+#> [7] <meta data-react-helmet="true" property="og:image" content="https://medi ...
+#> [8] <meta data-react-helmet="true" property="og:title" content="Hearts Knit ...
+#> [9] <meta data-react-helmet="true" property="og:type" content="website">\n
+#> [10] <meta data-react-helmet="true" property="og:url" content="https://www.ch ...
new-sessions.Rmd
+library(generalconference)
+#> Loading required package: dplyr
+#>
+#> Attaching package: 'dplyr'
+#> The following objects are masked from 'package:stats':
+#>
+#> filter, lag
+#> The following objects are masked from 'package:base':
+#>
+#> intersect, setdiff, setequal, union
+#> Loading required package: glue
+#>
+#> Attaching package: 'glue'
+#> The following object is masked from 'package:dplyr':
+#>
+#> collapse
+#> Loading required package: furrr
+#> Loading required package: future
+#> Loading required package: purrr
+#> Loading required package: stringr
+#> Loading required package: readr
+#> Loading required package: rvest
+#>
+#> Attaching package: 'rvest'
+#> The following object is masked from 'package:readr':
+#>
+#> guess_encoding
+#> Loading required package: tictoc
+#> Loading required package: tidyr
+#> Loading required package: xml2
Use the following code to download a session one-off:
+
+# Define the file path
+year = 2021
+month = 4
+mo_str = "04"
+path=glue("/home/rstudio/generalconference/data/sessions/{year}{mo_str}.rds")
+generalconference::scrape_conference_talks(year, month, path)
+# Read the dataframe in
+df_conf <- readr::read_rds(path)
+df_conf %>%
+ unnest(sessions) %>%
+ unnest(talks)
+#> # A tibble: 37 × 13
+#> year month session_name session_id session_url talk_urls talk_session_id
+#> <dbl> <dbl> <chr> <int> <chr> <chr> <int>
+#> 1 2021 4 Saturday Mor… 1 /study/gener… /study/ge… 1
+#> 2 2021 4 Saturday Mor… 1 /study/gener… /study/ge… 2
+#> 3 2021 4 Saturday Mor… 1 /study/gener… /study/ge… 3
+#> 4 2021 4 Saturday Mor… 1 /study/gener… /study/ge… 4
+#> 5 2021 4 Saturday Mor… 1 /study/gener… /study/ge… 5
+#> 6 2021 4 Saturday Mor… 1 /study/gener… /study/ge… 6
+#> 7 2021 4 Saturday Mor… 1 /study/gener… /study/ge… 7
+#> 8 2021 4 Saturday Aft… 2 /study/gener… /study/ge… 1
+#> 9 2021 4 Saturday Aft… 2 /study/gener… /study/ge… 2
+#> 10 2021 4 Saturday Aft… 2 /study/gener… /study/ge… 3
+#> # … with 27 more rows, and 6 more variables: url <chr>, title1 <chr>,
+#> # author1 <chr>, author2 <chr>, kicker1 <chr>, paragraphs <list>
extract_element.Rd
rvest::read_html("https://www.churchofjesuschrist.org/study/general-conference/1971/04/kingdom-of-god?lang=eng")
+extract_element(rv_doc, element)+ +
rv_doc | +rvest::read_html() document |
+
---|---|
element | +class you want to extract (use Selector Gadget) |
+
dataframe
+ +extract_metadata.Rd
Extract title, author, and kicker from a url and return as a row in a +dataframe.
+extract_metadata(html_document, url)+ + + +
extract_session_hrefs.Rd
Extract Session hrefs
+extract_session_hrefs(html_docmap, session_id)+ +
html_docmap | +An rvest docmap scrape from +scrape_conference_html_doc_map() |
+
---|---|
session_id | +Integer for session you want to extract |
+
hrefs for the session, which includes the Session href in addition +to the talk refs.
+ ++scrape_conference_html_doc_map(2019, 4) %>% + extract_session_hrefs(session_id = 1) %>% + parse_session_urls() +#> # A tibble: 1 × 3 +#> session_name session_url session_talk_ur… +#> <chr> <chr> <list> +#> 1 Saturday Morning Session /study/general-conference/2019/04/s… <tibble [6 × 2]>
genconf.Rd
A dataset containing all general conference talks back to 1971.
+genconf
+
+
+ genconf: A 4-level nested data frame with nestings for conference, session, talk, and paragraph.
genconf A data frame with one row per conference (year + month)
Session year
Session month
List dataframe with one row per session.
sessions A data frame one row per session (Saturday AM, PM, etc.)
individual timepoint
mean value including imputed values
Suffix URL path to session (not full url))
List of dataframes, one row per talk in that session
talks A data frame one row per talk
Stub urls for talk.
Talk index within session
Full url path to talk.
Title.
Author Name (typically, might be missing)
Author Role (typically, might be missing)
Talk kicker
List of dataframes, one row per talk in that session
paragraphs A data frame one row per paragraph in talk
If talk has sections, this would be the section number. Newer talks are more likely to have sections.
Paragraph number
Paragraph html tag (can be used to generate a url deep link). Might not be in order with p_num due to edge-case talks that use #p1-#p4 for title, author, kicker, etc.
If a talk contains sections, those sections have headers. Header content will be a few words.
Text of talk. <sup></sup> html tags (superscripts/footnotes) have been stripped out.
https://www.churchofjesuschrist.org/study/general-conference
+ +scrape_conference_html_doc_map.Rd
Given a year and a month, pull the entire .doc-map class +object from the Conference URL. This will be parsed +by downstream objects
+scrape_conference_html_doc_map(year, month)+ +
year | +Year (integer) |
+
---|---|
month | +Month (integer) |
+
Rvest object
+ ++scrape_conference_html_doc_map(2017, 4) +scrape_conference_html_doc_map(1971, 10) +scrape_conference_html_doc_map(1985, 10) +
scrape_conference_talks.Rd
For one-off sessions or debugging, see new-sessions.Rmd.
+scrape_conference_talks(year, month, path, loop_method = 1)+ +
year | +Year |
+
---|---|
month | +Month |
+
Writes out session to /data/sessions/<year><month>.rds
+ +scrape_conference_urls.Rd
Main function to scrape all conference talk urls +For a given year-month conference, return a nested tibble of all sessions +with a tibble-column containing the dataframes
+scrape_conference_urls(year, month)+ +
year | +year |
+
---|---|
month | +month |
+
tibble
+ ++scrape_conference_urls(2019, 10) +#> # A tibble: 1 × 3 +#> year month sessions +#> <dbl> <dbl> <list> +#> 1 2019 10 <tibble [5 × 4]>scrape_conference_urls(1971, 4) +#> # A tibble: 1 × 3 +#> year month sessions +#> <dbl> <dbl> <list> +#> 1 1971 4 <tibble [7 × 4]>