Data
Skills
Data Engineering
- Defining Data Intuition
Ryan proposes the following definition for data intuition: a resilience to misleading data and analyses.
- Quick Guide: Calculate Cohort Retention Analysis with SQL
Huy provides a step-by-step walk-through of how to generate cohort retention by month using two source tables - users and activities.
- Practical SQL for Data Analysis
Haki provides an in-depth guide to using SQL for fast and efficient data analysis. He dives into specific tactics including descriptive statistics, subtotals, pivot tables, running and cumulative aggregation, linear regressions, interpolations, and binning.
- Analytical Excellence Is All about Speed
Cassie provides a nuanced take on the value of speed in data analytics. This is a multi-article series that covers: • Software skills • Handling lots of data with ease • Immunity to data science bias • Understanding the analyst's career path • Refusing to be a data charlatan • Resistance to confirmation bias • Realistic expectations of data • Knowing how to add value • Thinking differently about time
- Building high-performing Research and Data Science teams with clear career paths
Karen describes the process and results of building a career ladder for the research, analytics, and data science team at Intercom. The IC track covers 6 levels and 6 skill groups, and the manager track covers 2 levels with 5 skill groups.
- Data Science Career Path & Progression
Julien explains the 4 skill groups (he calls them data axes) of data career paths: • Data axis • Engineering axis • Business axis • Product axis
- Roadmap to Learn SQL
Arif provides a list of the key concepts that you'll need to learn to learn SQL, along with the order to learn them in.
- So You’ve Got a Really Big Dataset. Here’s How You Clean It.
Li-Lian provides a step-by-step guide to cleaning large datasets in Python, using the Pandas and Matplotlib libraries. She explains how to filter data, standardize missing data labels, clean dependent variables, remove duplicate entries, and check for missing values in each variable and row. She suggests auditing variables by type and providing suggestions for each type, including Boolean, datetime, numerical, categorical, and text.
- Prioritizing Data Science Work
As a data scientist, you are constantly deciding what tasks to prioritize. There are many requests from stakeholders but not all have the same impact or innovativeness. Jacqueline recommends prioritizing projects that are both innovative and impactful as they have the greatest potential to change the business. Projects that are not innovative but still provide useful proof can also be valuable. Jacqueline advises against getting stuck doing interesting but irrelevant work or only reporting, as these contribute less to the company. Data scientists should aim to do work that both affects the company and is innovative.
- Prioritising the Scientific Way
Shyam proposes a scientific framework for prioritization consisting of a first principles approach and second order thinking. The first principles approach involves breaking down problems into fundamental components to remove biases. Second order thinking considers the consequences of consequences to uncover hidden impacts and complexities. Shyam outlines four types of complexities - structural, technical, temporal and directional. Complications are distinguished from complexities. A scientific prioritization process involves documenting evidence to improve the methodology over time. Consistency is key to allow positive effects to compound, and documentation helps transfer accountability to the process rather than individuals.