Data
Data Cleaning
Learn Data Cleaning with the Practica AI Coach
The Practica AI Coach helps you improve in Data Cleaning by using your current work challenges as opportunities to improve. The AI Coach will ask you questions, instruct you on concepts and tactics, and give you feedback as you make progress.Curated Learning Resources
- Data Cleaning IS Analysis, Not Grunt WorkCleaning data is considered by some people to be menial work that’s somehow “beneath” the sexy “real” data science work. Randy calls BS on this. The act of cleaning data imposes values/judgments/interpretations upon data intended to allow downstream analysis algorithms to function and give results. That’s exactly the same as doing data analysis. Data cleaning is a spectrum of reusable data transformations on the path towards doing a full data analysis. Once we accept that framework, the steps we need to take to clean data flow more naturally. We want to allow our analysis code to run, control useless variance, eliminate bias, and document for others to use, all in service to the array of potential analyses we want to run in the future.
- So You’ve Got a Really Big Dataset. Here’s How You Clean It.Li-Lian provides a step-by-step guide to cleaning large datasets in Python, using the Pandas and Matplotlib libraries. She explains how to filter data, standardize missing data labels, clean dependent variables, remove duplicate entries, and check for missing values in each variable and row. She suggests auditing variables by type and providing suggestions for each type, including Boolean, datetime, numerical, categorical, and text.
Related Skills
- ROI for Data Work
- MLOps Platforms
- Prioritization for Data Work
- Structuring Data Teams
- Effective Dashboards
- SQL
- Machine Learning
- Data Science Career Ladders
- Data Engineering
- Neural Networks
- Analytics
- Analysis Documentation
- Data Infrastructure
- Cohort Analysis
- Data Tools
- ETL
- Data Soft Skills
- Data Dictionary
- Data Governance
- Data Roadmaps
- Event Data
- Personalization
- OCR
- Data Warehouse
- Deep Learning
- Sampling Algorithms
- Data Intuition
- Linear Regressions