Status

Logo

Status of Foundries.io web services

View the Project on GitHub foundriesio/status

2025-06-12 GCS + CloudFlare outage

We use third party cloud services from CloudFlare and Google Cloud Platform. They are both currently experiencing an issue:

We use CloudFlare to make sure end users can access our Web UI over at, amongst others, https://app.foundries.io . We use Google Cloud Storage to store all our data in the cloud, amongst others, artifacts from our CI. The latter is causing our core infrastructure to be down.

CloudFlare

The root cause analysis for CloudFlare seems to be:

Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency.

Which could be due the fact they might be running parts on Google Cloud Platform as well, or it is just a coincidence.

Google

The initial report states the root cause was related to:

Multiple Google Cloud and Google Workspace products experienced increased 503 errors in external API requests, impacting customers. From our initial analysis, the issue occurred due to an invalid automated quota update to our API management system which was distributed globally, causing external API requests to be rejected. To recover we bypassed the offending quota check, which allowed recovery in most regions within 2 hours. However, the quota policy database in us-central1 became overloaded, resulting in much longer recovery in that region.

Timeline of Events

Lessons Learned

Identified a non-integral part of our CI codebase that had the potential to disrupt and cause a huge fall out if a third party service happens to be down or suffer a partial outage. A fix is now put in place to help further to be more resilient when a similar outage happens again. Work is also ongoing to scan all of our services for similar points of failure and address then accordingly.