Data cleansing in Python using Pandas and the Jupyter Notebook
This two hour workshop will consist of about an hour of presentation and discussion followed by about an hour for in-class practice exercises. The presentation will cover reading data from text/csv and Excel files, managing data type conversions (numbers, text, dates), managing NULL values, creation and deletion of columns in Pandas data frames, indexing and re-indexing data frames, and identifying/managing incorrect values. Demonstration and practice data will be pulled from a variety of sources including deidentified clinical data.
Requirements: Participants should have a working knowledge of Python programming and should bring a laptop with Python version 3, Pandas, and the Jupyter Notebook installed (Pandas and the Jupyter Notebook are included in Anaconda Python). Python versions 3.5 or 3.6 and Anaconda version 4.3 are recommended. Python 3.5.2 will be used in class demonstrations. Registrants will receive installation instructions and practice data sets prior to the session.
- Monday, March 27, 2017
- 1:00pm - 3:00pm
- Health Sciences Library Carter Classroom
- Jim Harrison, Public Health Sciences