How to restart child with custom state using Erlang OTP supervisor behaviour?
I'm using OTP supervisor behaviour to supervise and restart child processes. However when the child dies I want to restart it with the same state it had before the crash.
If I write my own custom supervisor, I can just receive {EXIT,Pid,Reason} message and act upon it. When using OTP supervisor behaviour however it is all managed by OTP and I have no control over it. The only callback function I implement is init.
Is there any standard approach in case like this? How to customise the state of a child being restarted dynamically by the otp supervisor? How to get Pid of the terminating process using OTP? Or maybe its possible to get the state of the child just before termination, and then restore the child to the same state it had before it crashed?
Possibly restart with same state is not good idea. Probably wrong state lead process to crash and if you restart with same state, it will crash again. But if you want this, use external resource to keep it (like ets or mnesia).
Without knowing any details about what you are doing, I can imagine a world where the following makes sense:
So, if I had 12 child processes representing the 12 Tribes of Cobol each would use its name as the key to the ETS table to look for state left behind by a previous incarnate upon starting. And each process would update the table (again using its name as the key) whenever its state changed.
The supervisor will automatically restart a killed child and step 2, above, would be executed in the child's init method. Step 3 would be dealt with in a child's handle_call, handle_cast and handle_info methods (I am making some assumptions about the nature of your processes). There are a number of restart strategies available via the supervisor that can even restart siblings if desired.
Hope this gives you some thoughts.
I think this sort of customizations of the OTP supervisor behaviour can't be done easily. The way OTP supervisors are designed forces me to follow some strict design practices. Most important one in this case is that supervisor shouldn't do anything else apart from monitoring its children and restarting them in case of abnormal termination. There should be no additional logic in the supervisor to not introduce any bugs in the supervisors which are critical part of supervision tree and fault tolerance.
when the child dies I want to restart it with the same state it had before the crash - this is bad practice in general because child might've died because of the corrupted state it had before termination and restarting it with the same state in such case will surely cause problems
Is there any standard approach in case like this? Customizing the state of the children within the supervisor, before restarting them acts against supervisor good design practices. Therefore this kind of tasks are usually done differently, for example by introducing another process, for example gen_server which would be responsible for starting children via supervisor (supervisor:start_child) and maintaining monitors on all processes. This additional process could do any required customizations before starting new child.
How to get Pid of the terminating process using OTP? - in the additional process which starts children via supervisor:start_child you can monitor them and then listen to DOWN messages. For example in case of gen_server you would use handle_info function as below:
handle_info({'DOWN', Ref, process, _Pid, _}, S) ->
handle_down_worker(Ref, _Pid, S).
Or maybe its possible to get the state of the child just before termination, and then restore the child to the same state it had before it crashed? - Correct me if I'm wrong but I think it is not possible in Erlang to send, along with the 'DOWN' message, the state of the process which child had, just before the termination. If that would be possible then I could just handle message similar to {DOWN, Pid, Reason, State} and restart the process with the same state or part of it. But then, I'm thinking.. How could you preserve the state of the suddenly dying child which was for example killed with exit(Pid, kill) ? I doubt that would be possible.
链接地址: http://www.djcxy.com/p/38202.html