I recently had a weird performance issue which I hadnt seen before and thought it would be worth making a couple of notes.
The solution looks like below where we have a partner calling an API we expose for them. This forwards onto a function which does some work and then eventually pops a message on a queue and we do some downstream processing. It looks like below.
When I call the API to test it, it was working great but I was noticing in the App Insights logs there were some errors flagging up and some really long calls. I wasnt getting any errors at all and I worked out the partner was testing the API and getting errors.
The evidence I could see for the problem was that occasionally we would get a call which took 30-50 secs to run and would return a 502 response to the caller.
I looked into the trace of the requests in App Insights and I could see that we receive the request in APIM but there was nothing reporting for the function. Initially I thought there must be some dodgy code I wrote in the function but it when I went through the logs looking at the events which failed I couldnt correlate them to activity in the functions. It looked very much like the issue was between APIM and Functions.
My first thought was that a 500 range error code means its probably my code and maybe something to do with the start up, but then the requests I sent were working.
Using App Insights I turned on the request and response tracing and was able to create requests exactly like the partner was sending to see if I could recreate the issue.
Root Cause
Eventually I identified that the request from the partner had an additional header that my test requests did not. It had an accept header with a bunch of options below.
accept : text/plain,application/json,text/x-json,text/javascript,text/xml,application/xml
If I add this to the test request then I was able to replicate the behaviour I was seeing where the following happened:
- Request sent to APIM
- Request forwarded to Azure Function
- Nothing logged at the function side in app insights or function logs
- After 30-50 sec the APIM receives a 502 response from the function app
- The request is returned to the client
- The APIM app insights logs dont really have anything beyond the backend call to functions taking ages and returning a 502
I made the same request directly against the function app and it returns a 502 there too. It also seemed that in addition to throwing the error it also doesnt throw it immediately it takes ages to attempt to process the message and then throws an error after 30+ secs.
Fix
The fix for this is pretty straight forward. I just override the accept header in my APIM policy before forwarding the request to the function app.
The policy has this additional statement in the inbound policy:
<set-header name="Accept" exists-action="override">
<value>application/json</value>
</set-header>
Hopefully this saves someone else a few hours troubleshooting a similar issue.