Articles by Paul Osman
- Solving a Murder Mystery
Paul recalls a challenging bug encountered at the observability tool company that required teamwork and good observability to resolve. A customer reported that a specific query was generating a timeout error, and the error message indicated a problem with Honeycomb's datastore. However, the issue was not with the database availability but with the data itself, as the same query with different time windows produced no error. After further investigation, the team found that a segment was missing from S3, which was causing the error. The article provides technical details of how Honeycomb's systems work and how the custom instrumentation helped identify the issue.