Articles by Li-Lian Ang
- So You’ve Got a Really Big Dataset. Here’s How You Clean It.
Li-Lian provides a step-by-step guide to cleaning large datasets in Python, using the Pandas and Matplotlib libraries. She explains how to filter data, standardize missing data labels, clean dependent variables, remove duplicate entries, and check for missing values in each variable and row. She suggests auditing variables by type and providing suggestions for each type, including Boolean, datetime, numerical, categorical, and text.