Add Watchdog timer reset in kill() #3077

Blue-Marlin · 2016-03-03T23:39:27Z

Add watchdog reset in the kill loop to simplify recovering with Arduino Mega based boards..

Fix for #3076

Add watchdog reset in the kill loop to simplify recovering.

thinkyhead · 2016-03-06T11:39:14Z

Does this do what we want in all cases? From what I can see the watchdog timer is set up to have the board reset if watchdog_reset is not called within a 4 second period. So it looks like the empty loop in kill() is intended to cause the board to reset after 4 seconds. The change you propose will cause the board to hang forever, until a manual reset. If that is what we're after, great.

Some interesting things keep coming up about resetting an Arduino with Watchdog enabled. This of course:

Another thing I found in the datasheet:

Quote:

Note: If the Watchdog is accidentally enabled, for example by a runaway pointer or brown-out condition, the device will be reset and the Watchdog Timer will stay enabled. If the code is not set up to handle the Watchdog, this might lead to an eternal loop of time-out resets. To avoid this situation, the application software should always clear the Watchdog System Reset Flag (WDRF) and the WDE control bit in the initialization routine, even if the Watchdog is not in use.

And… from this forum post at AVR Freaks we have:

The critical step missing in your original code is to reset the WDRF bit in MCUSR. From the ATmega328 data sheet:

WDE is overridden by WDRF in MCUSR. This means that WDE is always set when WDRF is set. To clear WDE, WDRF must be cleared first.

The sequence I use is:
MCUSR = 0;
wdt_disable( );
So that is why WDRF must be cleared before you can clear WDE.

The current procedure at startup for setting the Watchdog Timer is as follows:

// Initialize watchdog with a 4 sec interrupt time
void watchdog_init() {
  #if ENABLED(WATCHDOG_RESET_MANUAL)
    // We enable the watchdog timer, but only for the interrupt.
    // Take care, as this requires the correct order of operation, with interrupts disabled. See the datasheet of any AVR chip for details.
    wdt_reset();
    _WD_CONTROL_REG = _BV(_WD_CHANGE_BIT) | _BV(WDE);
    _WD_CONTROL_REG = _BV(WDIE) | WDTO_4S;
  #else
    wdt_enable(WDTO_4S);
  #endif
}

If this is equivalent to the above discussion, then all is well. I think the main concern is that the watchdog reset apparatus might not always be started, and that this condition should be avoided. Marlin avoids it by doing watchdog reset pretty frequently, and on every reset. I haven't heard any complaints where resetting a board caused endless resets, but it's something to watch out for.

Blue-Marlin · 2016-03-06T13:10:55Z

@thinkyhead
If we have had a watchdog reset, there is nothing we can do in Marlin. The reset loop is in the Arduino bootloader (at least for the Arduino Mega - all RAMPS boards, ++).
The only thing we can do is to avoid unnecessary watchdog resets.

The watchdog is ought to catch unexpected endless loops in the code. The kill() loop is not unexpected.

I suggest to make a simple test.
Take a machine with activated watchdog. Run it. Unconnect a thermistor. Mintemp is detected, kill called, the watchdog resets the machine 4 seconds later. Reconnect the thermistor. Try serial reset (unconnect-reconnect). Try the reset button. Power cycle the printer.
Now reflash without the activated watchdog and try the same.
Got it? Do you see the difference?

Lets wait for some days. The day where Octoprint users will begin to complain about having to run down 3 stairs, to power cycle their machines, is not far.

thinkyhead · 2016-03-06T14:13:46Z

@Blue-Marlin Customarily when users have different needs we will make such options configurable through Configuration.h and/or Configuration_adv.h. If users are connected to a remote host then it may be more desirable for them to leave the timeout loop in place. Consider packaging this as a new feature, as in:

#define MANUAL_RESET_ON_KILL // Enable this option to require a manual reset on kill()

jbrazio · 2016-03-06T15:44:44Z

This was exactly my suggestion on #3045 when this reset thing came to discussion.

thinkyhead · 2016-03-06T22:29:39Z

I'm seriously considering including this change even though it redefines the default behavior. I personally would prefer that if my printer locks up, it should stay locked up and not reboot on its own. Your points made in #3085 are compelling, especially surrounding the response of hosts and the AUTOSTART feature. So my new suggestion is something more along the lines of this:

In Configuration.h: a sub-option with USE_WATCHDOG disabled by default:

//#define AUTO_RESET_ON_KILL // Enable this option to automatically reset the board on kill()

In SanityCheck.h::

#if ENABLED(USE_WATCHDOG) && ENABLED(AUTO_RESET_ON_KILL) && ENABLED(AUTOSTART)
  #error AUTO_RESET_ON_KILL combined with AUTOSTART considered risky. Comment this line if you only use AUTOSTART for harmless things.
#endif

And finally in Marlin_main.cpp…

  #if ENABLED(USE_WATCHDOG) && DISABLED(AUTO_RESET_ON_KILL)
    watchdog_reset();
  #endif

Another thought occurs to me, which is of course "Why would anyone use USE_WATCHDOG without AUTO_RESET_ON_KILL?" It looks like the whole point of USE_WATCHDOG is to get that automatic reset in the first place. So perhaps no new option is needed. If you don't want an automatic reset, then there's no other reason to use USE_WATCHDOG, is there?

thinkyhead · 2016-03-06T22:58:06Z

After reviewing the code I see that, yes, in fact, the whole point of USE_WATCHDOG is to get an Automatic Reset, whether due to kill() or some uncontrolled situation. But I can also see why you might want an automatic reset in some situations — especially where there is an unknown hang potentially leaving heaters on — but you might not want it in the case of a proper kill(), since the heaters will (one hopes) be turned off. Leaving the machine hung forever might also help by leaving the last LCD screen up so you can see the state of the machine at the time of failure.

Hopefully few users will have to deal with these things in the first place, but it's good to look at them once in a while.

Indication of a failure state is always helpful. A hung machine that isn't responsive has to be prodded to determine the failed state. I would prefer to give some clear indication, if only toggling the configured LED pin as a visual alarm. (Audio alarms have been discussed and are considered … too alarming.) If it's immediately apparent by a flashing light that the machine has gone down then users can respond by resetting right away instead of trying the controller knob.

Hosts assume that a machine is hung if it stops responding long enough. They can also look for the Marlin startup message and by that assume that a machine has been reset. But these have not been formalized. We have received requests from makers of host software to send more messages out about Marlin's states ("busy", "awaiting input", etc.). So, even in the "precarious" position of a machine that has gone into the kill() loop, one could argue perhaps for sending a last "hung" or "resetting" message to the host and showing a 4-second countdown on the LCD screen before resetting, as well.

jbrazio · 2016-03-06T23:18:55Z

Sounds sane your suggestion but I would still prefer to ship with USE_WATCHDOG active; if for some reason the AVR locks up and the heater pin is active.. it will never shut it down until everything is on fire. USE_WATCHDOG is the security feature.

The issue is not Marlin to go into a panic mode and need reset, the issue is when the watchdog kicks in people are obliged to power cycle the printer and for some people this may mean remove the plug from mains and plug it back again..

As it is today without USE_WATCHDOG active when kill() is called Marlin locks up in a while (1) {} waiting for any type of reset. The user can then soft-reset or hard-reset it by power cycling the board.

When USE_WATCHDOG is active when kill() is called Marlin locks up on the same while (1) {} but now the watchdog will kick in due to wdt_enable(WDTO_4S); in watchdog.cpp.

When WDTO_4S kicks in the ISR(WDT_vect) traps the interrupt and will once again call kill() and try to lock up Marlin (again) in a while(1) loop. So far so good.

The issue is when the user manually soft-resets the board. The Arduino bootloader is expected to disable the watchdog which it is not.. which means that after 4S of booting Marlin will go again into ISR(WDT_vect). The only way to "fix" this is to hard-reset the board which will reset the watchdog.

This bugfix will still lockup Marlin on the while (1) {} loop but before locking up the uC it will disable the watchdog, then if the user soft-reset the board it will not get trapped again inside ISR(WDT_vect).

thinkyhead · 2016-03-07T00:20:13Z

The Arduino bootloader is expected to disable the watchdog which it is not

This cannot be counted upon, even according to Arduino's own documentation.

The only way to "fix" this is to hard-reset the board which will reset the watchdog.

As I described above, during program initialization (in setup()) the watchdog must always be explicitly turned off, because a watchdog can stay alive through various kinds of reset. The note above suggests MCUSR = 0; wdt_disable( ); and that's easy enough to test.

This bugfix will still lockup Marlin on the while (1) {} loop but before locking up the uC it will disable the watchdog

I'm not seeing how this PR "disables" the watchdog. It just signals watchdog to stay alive with watchdog_reset() so the board won't self-reset.

Add Watchdog timer reset in kill()

thinkyhead · 2016-03-07T00:24:17Z

Merged this as basically an obvious bug-fix. Expected behavior for kill() is to stay in a loop until manually reset. Keep it simple.

Blue-Marlin · 2016-03-07T00:25:38Z

@jbrazio
Thanks for you detailed explanation.

I guess language is the key for our initial misunderstanding.
"Watchdog reset" can mean a reset caused by the watchdog.
Calling "watchdog_reset()" means the opposite - reset of the watchdog timer to prevent a reset by the watchdog.

I don't think it needs an additional configuration option. The behaviour of kill() with activated watchdog is, with this change, the same as if the watchdog would not be activated. The advantage of the watchdog, to catch unexpected endless loops, is untouched.
WATCHDOG_RESET_MANUAL is evil. Not because its calling kill, but because of redefining the interrupt vector. It makes a completely hardware handled reset to a software function. That alone would not be so bad, but if our endless loop is in an interrupt all further software interrupts are disabled and therefore the watchdog interrupt routine never executed. So with WATCHDOG_RESET_MANUAL enabled we lose the ability to protect from hangs in interrupts. For the same reason we do not get a endless chain of kill() calls. If we are once in an interrupt, the interrupt_disable_flag is set until, we reset it ourselves, or leave the interrupt routine. If that would not be the case we'd have a stack overflow very soon. Ergo the watchdog reset with WATCHDOG_RESET_MANUAL did and does the same as kil(), when fired at all.

Have an other look at the kill() function. We explicitly disable software interrupts but the pure hardware watchdog timer interrupt (without WATCHDOG_RESET_MANUAL) still works!

jbrazio · 2016-03-07T00:26:28Z

I'm not seeing how this PR disables the watchdog. It just signals watchdog to stay alive with watchdog_reset() so the board won't self-reset.

You're correct, watchdog_reset() is just a wrapper for wdt_reset()thus only "pings" the watchdog.

Add Watchdog reset in kill()

b4e5e0f

Add watchdog reset in the kill loop to simplify recovering.

thinkyhead added Needs: Testing Testing is needed for this change Needs: More Data We need more data in order to proceed C: Boards/Pins labels Mar 6, 2016

jbrazio mentioned this pull request Mar 6, 2016

Heating failed #3085

Closed

Blue-Marlin changed the title ~~Add Watchdog reset in kill()~~ Add Watchdog timwer reset in kill() Mar 6, 2016

Blue-Marlin changed the title ~~Add Watchdog timwer reset in kill()~~ Add Watchdog timer reset in kill() Mar 6, 2016

thinkyhead added a commit that referenced this pull request Mar 7, 2016

Merge pull request #3077 from Blue-Marlin/watchdogreset-in-kill

501f638

Add Watchdog timer reset in kill()

thinkyhead merged commit 501f638 into MarlinFirmware:RCBugFix Mar 7, 2016

Blue-Marlin deleted the watchdogreset-in-kill branch March 9, 2016 13:07

This was referenced Mar 19, 2016

Marlin RC-4 (RCBugFix) almost stable! Going final soon! #3154

Closed

Release 1.1.0-RC4 #3104

Closed

jbrazio mentioned this pull request Mar 25, 2016

Change Kill Button Behavior? #3045

Closed

jbrazio modified the milestone: 1.1.0 Jul 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Watchdog timer reset in kill() #3077

Add Watchdog timer reset in kill() #3077

Uh oh!

Blue-Marlin commented Mar 3, 2016

Uh oh!

thinkyhead commented Mar 6, 2016

Uh oh!

Blue-Marlin commented Mar 6, 2016

Uh oh!

thinkyhead commented Mar 6, 2016

Uh oh!

jbrazio commented Mar 6, 2016

Uh oh!

thinkyhead commented Mar 6, 2016

Uh oh!

thinkyhead commented Mar 6, 2016

Uh oh!

jbrazio commented Mar 6, 2016

Uh oh!

thinkyhead commented Mar 7, 2016

Uh oh!

thinkyhead commented Mar 7, 2016

Uh oh!

Blue-Marlin commented Mar 7, 2016

Uh oh!

jbrazio commented Mar 7, 2016

Uh oh!

Uh oh!

Uh oh!

Add Watchdog timer reset in kill() #3077

Add Watchdog timer reset in kill() #3077

Uh oh!

Conversation

Blue-Marlin commented Mar 3, 2016

Uh oh!

thinkyhead commented Mar 6, 2016

Uh oh!

Blue-Marlin commented Mar 6, 2016

Uh oh!

thinkyhead commented Mar 6, 2016

Uh oh!

jbrazio commented Mar 6, 2016

Uh oh!

thinkyhead commented Mar 6, 2016

Uh oh!

thinkyhead commented Mar 6, 2016

Uh oh!

jbrazio commented Mar 6, 2016

Uh oh!

thinkyhead commented Mar 7, 2016

Uh oh!

thinkyhead commented Mar 7, 2016

Uh oh!

Blue-Marlin commented Mar 7, 2016

Uh oh!

jbrazio commented Mar 7, 2016

Uh oh!

Uh oh!