Functions raise if port dies #20

benwilson512 · 2018-01-24T16:19:31Z

Hey folks,

Maybe I'm doing something wrong, but if my statsd agent restarts all function calls end up raising as:

> Sensetra.Statsd.increment("elixir.foo", 1)
** (ArgumentError) argument error
    :erlang.port_command(Sensetra.Statsd, [<<1, 31, 189, 172, 31, 22, 97>>, "elixir.foo", 58, "1", 124, "c"])
    (statix) lib/statix/conn.ex:30: Statix.Conn.transmit/2

This seems undesirable, monitoring functions shouldn't nuke the app in the event that the port goes down.

Ideally there'd be some mechanism for trying to reconnect as well, but at a minimum it would seem we probably want to catch this error.

The text was updated successfully, but these errors were encountered:

thecodeboss · 2018-02-13T01:25:27Z

@benwilson512 I was able to find a solution to the second problem of having the server reconnect. We use a GenServer to manage our Statix connection, which looks a bit like this:

defmodule Metrics do
  use Statix
  use GenServer

  def init(_opts) do
    Process.flag(:trap_exit, true)
    connect()
    {:ok, current_conn()}
  end

  def handle_info({:EXIT, port, reason}, %Statix.Conn{sock: __MODULE__} = state) do
    Logger.error("Port #{inspect(port)} exited with reason #{reason}")
    {:stop, :normal, state}
  end
end

Basically this Metrics module traps exits, so if the port gets closed it can stop and be restarted by its supervisor, which will cause a new connection to be opened.

This could lead to a restart loop, but using appropriate supervisor strategies you can define policies for the restart loop.

By default when you call connect(), your application is not handling exits unless they are something other than :normal, so by explicitly trapping all of them we can cause a reconnect.

--

For the other problem of increment crashing, you can override increment in the GenServer, and catch ArgumentErrors when they happen. Though it's not very elegant.

benwilson512 · 2018-03-07T19:08:49Z

@thecodeboss that solves the issue of reconnecting, but any metrics calls that happen between when it fails and when it restarts will raise. The inability to send a metric shouldn't take down a process.

lexmag · 2018-04-16T22:39:32Z

Hey, do you have any reliable way to trigger port dying?

scrogson · 2018-05-02T20:09:44Z

@lexmag :erlang.port_close/1

scrogson · 2018-05-02T20:17:18Z

I think the issue brought up in the issue is related to the usage documentation. The problem with the way it is documented is that it tells users to call the connect/0 function directly in the application's start/2 function. As I understand it, this links the port to the application process.

When the port closes and you try to call any of the APIs that call :erlang.port_command/2 it will cause an ArgumentError which will kill the application process.

lexmag · 2018-05-02T20:32:23Z

@scrogson thanks, but I wondered more on how (and why) the port gets closed in non-manual way.

keathley · 2018-05-02T20:40:46Z

We've seen the port die if it can't communicate with the agent in a variety of ways: agent restarts, netsplits, etc.

lexmag · 2018-05-02T20:50:32Z

@keathley 👍.

I think the ultimate fix for this would be to provide Statix.child_spec which starts a GenServer that will monitor the port and reopen and re-register it in case of failure.

benwilson512 · 2018-05-02T20:51:45Z

I do think that's a solution, but I also think that wrapping the port operations in a try is critical as well. Even in the best case reconnection isn't instant, and nothing should die in the application just because monitoring is down. In the worst case the re-connection fails (maybe the agent is down).

lexmag · 2018-05-02T20:54:56Z

I also think that wrapping the port operations in a try is critical as well.

That's definitely what we should do of course.
I think we should also log missed metric in catch/rescue.

thecodeboss mentioned this issue May 29, 2018

Do not raise ArgumentError when port gets closed #25

Merged

lexmag added the Kind:Bug label Mar 1, 2020

Abat linked a pull request Jun 13, 2021 that will close this issue

Add port monitoring #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functions raise if port dies #20

Functions raise if port dies #20

benwilson512 commented Jan 24, 2018

thecodeboss commented Feb 13, 2018

benwilson512 commented Mar 7, 2018

lexmag commented Apr 16, 2018

scrogson commented May 2, 2018

scrogson commented May 2, 2018 •

edited

lexmag commented May 2, 2018

keathley commented May 2, 2018

lexmag commented May 2, 2018

benwilson512 commented May 2, 2018

lexmag commented May 2, 2018 •

edited

Functions raise if port dies #20

Functions raise if port dies #20

Comments

benwilson512 commented Jan 24, 2018

thecodeboss commented Feb 13, 2018

benwilson512 commented Mar 7, 2018

lexmag commented Apr 16, 2018

scrogson commented May 2, 2018

scrogson commented May 2, 2018 • edited

lexmag commented May 2, 2018

keathley commented May 2, 2018

lexmag commented May 2, 2018

benwilson512 commented May 2, 2018

lexmag commented May 2, 2018 • edited

scrogson commented May 2, 2018 •

edited

lexmag commented May 2, 2018 •

edited