GitHub has released its June 2024 availability report, detailing two significant incidents that resulted in degraded performance across its services, according to The GitHub Blog. These incidents affected the GitHub Issues and GitHub Migration services, causing disruptions and delays for users.
Incident on June 5, 2024
The first incident occurred on June 5, 2024, starting at 17:05 UTC and lasting for 142 minutes. During this period, the GitHub Issues service experienced degraded performance. Events related to projects were not displayed on issue timelines, which include actions such as adding or removing issues from projects and changing their status within a project.
The root cause was identified as a misconfiguration due to a scheduled secret rotation. This initiative aimed to clean up and simplify service configurations for improved automation. However, a bug in the implementation led to the use of expired secrets by one of the configured services, resulting in the performance degradation. GitHub mitigated the issue by correcting the service configuration and expects that the simplified setup will prevent similar incidents in the future.
Incident on June 27, 2024
The second incident took place on June 27, 2024, from 20:39 UTC to 21:37 UTC, lasting 58 minutes. This incident affected the GitHub Migration service, causing all in-progress migrations to fail. Upon detecting the increased failure rate, GitHub paused new migrations to prevent further disruptions. Although this led to longer migration times, it allowed the team to address the issue without additional failures.
The root cause was traced back to an invalid infrastructure credential that required manual intervention. Once identified, GitHub's first responders quickly mitigated the issue, resumed the paused migrations, and restored normal service levels. To prevent similar occurrences, GitHub is enhancing its monitoring and alerting mechanisms for infrastructure credentials.
Future Prevention and Monitoring
GitHub has committed to improving its monitoring and alerting systems to prevent such incidents in the future. Users are encouraged to follow the GitHub status page for real-time updates and post-incident recaps. For more insights into GitHub's ongoing projects and engineering efforts, the GitHub Engineering Blog offers detailed information.
Image source: Shutterstock