Week 4, October 16: Data, part II – Working with Messy Data
Pre-class reading
- A Guide to Bulletproofing Your Data, by Jennifer LaFleur, 2013
- Avoiding Mistakes When Cleaning Your Data, by Noah Veltman, 2013
- Dollars for Docs Mints a Millionaire, by Tracy Weber and Charles Ornstein, 2013
What we’ll cover
- “Regular expressions”: find-and-replace on steroids.
- Using OpenRefine to clean and transform data.
- Fact-checking your data.
Post-class assignments
- Use regular expressions and/or OpenRefine to clean the dataset(s) you’ve chosen for your long-term project. Keep a log of the steps you took, and the decisions you made. If you’re certain your datasets don’t need cleaning, explain the steps you took to be sure.
- CodeCademy Web Fundamentals: CSS Selectors
- CodeCademy Web Fundamentals: JavaScript