We are investigating an issue with HTML generation component
Incident Report for Beefree SDK
Postmortem

On July 9th, 2023 (Sunday), starting from 18:30 (CEST), the BEE HTML generation engine experienced a gradual degradation of performance that went on until 21:47 (CEST), when the server was restarted.

The engine experienced a second period of degraded performance on July 10th, 2023 (Monday), from 16:30 (CEST) to 17:30.

During that first episode on July 9th, an increasing number of requests received a timeout after 30 seconds with a 504 error response.

During the 1-hour interval on July 10, approx 10% of the calls returned a 504 error response.

While we were working to resolve this, there were brief times the generation may have been unavailable in the following days. However, we are confident this issue is now resolved following an emergency release.

Our monitoring tools did not detect the first degraded performance issue immediately because our health checks reported all systems were OK, even on the unresponsive instances of the HTML Generation Engine.
We updated our monitoring tools after the two episodes of degraded performance so that we were able to prevent the issue and act accordingly. In the following days, the service had no more experienced degraded performance.

Root Cause

The issue was caused by a change in the way we handle the environment configuration settings of the HTML Generation Engine. This change was published on July 5th and did not raise any issue until five days later.
After proper troubleshooting, we identified that the problem was happening only when specific types of incorrect JSON files were sent to the engine.

Action Items
*We improved our monitoring tools, adding additional parameters to keep the HTML Generation Engine under surveillance.
*We added new specific performance tests to our HTML Generation Engine deployment pipeline.
*We included a sample of the invalid JSON files causing the issue to the collection of JSON files that we test the HTML Generation Engine with before every public deployment.
*We deployed a specific fix for the problem, and a new version of the HTML Generation Engine was deployed to production on July 18th, 2023.

Posted Jul 20, 2023 - 16:02 PDT

Resolved
You could be experiencing intermitting errors while generating pages and emails.
Posted Jul 09, 2023 - 12:30 PDT