After some days working the application UI stops receiving data from all services.
All services are running just fine and I'm able to retrieve data straight from the services using Postman for example.
UI is also working but the calls to the services return no data. Looking at the UI log I found this:
2016-01-25T14:09:28.64-0800 [App/0] OUT 2016/01/25 22:09:28 [warn] 38#0: *525 upstream server temporarily disabled while connecting to upstream, client: 10.72.11.13, server: localhost, request: "GET /api/excursions HTTP/1.1", upstream: "http://54.200.145.64:80/v1/excursions/", host: "demo-uo-poa.run.aws-usw02-pr.ice.predix.io", referrer: "https://demo-uo-poa.run.aws-usw02-pr.ice.predix.io/excursions"
2016-01-25T14:09:28.64-0800 [App/0] OUT 2016/01/25 22:09:28 [error] 38#0: *525 upstream timed out (110: Connection timed out) while connecting to upstream, client: 10.72.11.13, server: localhost, request: "GET /api/excursions HTTP/1.1", upstream: "http://54.200.145.64:80/v1/excursions/", host: "demo-uo-poa.run.aws-usw02-pr.ice.predix.io", referrer: "https://demo-uo-poa.run.aws-usw02-pr.ice.predix.io/excursions"
What could be causing this issue?
Answer by Greg Stroup · Feb 19, 2016 at 06:07 PM
We've done some more digging into this problem and found the root cause. This blog post describes it well, and includes a solution: http://tenzer.dk/nginx-with-dynamic-upstreams/
Basically, nginx resolves the upstream server's IP address and caches it. Then the IP address of that server changes, and nginx is still pointing to the old IP.
We've tried adding a variable for the upstream server, and a "resolver" using Google's DNS server at 8.8.8.8. This seems to have helped with the problem. Still not sure if the problem is solved 100%, but it does seem to be behaving better now.
Another option that some teams are doing, is to use the Node Express web server, rather than nginx. Express is widely used, has good documentation, and is easier to debug.
I've tried that before, using googles DNS but for some reason that IP was unreachable. Maybe we've unblocked them.
I've asked for the internal Predix DNS to use on that configuration but got no response.
Let me know if Google DNS is working in definitive for you.
Cool, my apps have been working with no issues for almost 2 weeks now. They may have fixed something at the Predix level. If I face the same problem again I will give the Goggle DNS a try again.
Yes, the Google DNS is definitely working for us for an app running in the Predix Basic cloud. (VPC)
What I'm not yet sure about is the upstream server - I don't know if its IP address has changed recently or not. I will say that we have two apps that have been up for several days now with no upstream connection problems.
Answer by Greg Stroup · Jan 27, 2016 at 05:34 PM
Hi, Are you working with a UI application based on the seed app or reference app? Both use similar nginx configurations. We have seen a similar error pop up recently with the reference app, that was not happening before. It seems that something is going wrong in the nginx layer... perhaps fetching the token is failing silently? We're investigating, and will respond if we find an answer. As a band-aid fix, we found that restarting the UI app (nginx app) will fix the problem for a while.
I'm almost sure I used the seed app, but at this point I can't tell you for sure.
It's definitely related to the nginx layer, and it seems to be something related to the token. Could it be related to how the token is stored in Redis? Did you guys change anything in the Redis implementation?
It's actually something related to DNS.
It just checked my nginx log and it's calling another service using the IP address but that IP address is not bound to the service DNS anymore and hence it's causing the call to fail/timeout.
2016/01/27 19:41:37 [warn] 38#0: *222 upstream server temporarily disabled while connecting to upstream, client: 10.72.11.10, server: localhost, request: "GET /api/excursions HTTP/1.1", upstream: "http://54.186.133.36:80/v1/excursions/", host: "test-uo-poa.run.aws-usw02-pr.ice.predix.io", referrer: "https://test-uo-poa.run.aws-usw02-pr.ice.predix.io/excursions"
It's calling the IP 54.186.133.36 for the service test-excursions.run.aws-usw02-pr.ice.predix.io which is actually using the IP 54.200.145.64.
@gregory.stroup@ge.com did you find out anything ele?
We are using the Predix Mobile Service with a custom UI facing the same issue. The issue can be replicated using their reference application.
Answer by sohil.vasa@ge.com · Feb 15, 2016 at 07:27 AM
I am facing same issue in our application, we have raised same issue with Predix support but yet no conclusion. Can you please update if you have any update on this issue ? our apps are failing in VPC. It was working fine and starting to get this issue 2 weeks back, it resolved for couple of days and again now its happening again.
Error when access Predix View decks API 9 Answers
How to push web application(UI part) on Cloud foundry ..? 4 Answers
Random 404 errors 1 Answer
Issue in running bower install after downloading ref-app UI from GITHub. 5 Answers
predix-asset Auth Failed 4 Answers