Clean and prepare tidy data with tidyr and dplyr.
- This lesson uses
fread
from thedata.table
package to read in spreadsheet files, rather than the tidyverseread_csv
fromreadr
. While there is little difference with the 60,000 row ACS table, there is a notable improvement with the >2M row CBP table (0.5 vs 7 seconds). - Lesson should be updated to include pivot_*
functions to replace
gather
andspread
. - In
str_detect
, the regex[0-9]
matches any number,{2}
means exactly 2 matches, and the----
matches exactly. - In
str_remove
, the regex-+
matches "1 or more-
" - For troubleshooting
select
, try specifyingdplyr::select
in case there are conflicting packages loaded.
The National Socio-Environmental Synthesis Center (SESYNC) curates and runs tutorials on using cyberinfrastructure in pursuit of the Center's scientific mission. Visit www.sesync.org to learn more about SESYNC and cyberhelp.sesync.org for more tutorials and ideas.