(Translated by https://www.hiragana.jp/)
[BUG] "Unhandled exception running state.sls" and "salt.exceptions.SaltClientError: Attempt to authenticate " for salt schedule job in multimaster minion when first master is not available, refusing to contact the second master for sls file retrieval · Issue #66721 · saltstack/salt · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "Unhandled exception running state.sls" and "salt.exceptions.SaltClientError: Attempt to authenticate " for salt schedule job in multimaster minion when first master is not available, refusing to contact the second master for sls file retrieval #66721

Open
amalaguti opened this issue Jul 16, 2024 · 1 comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage

Comments

@amalaguti
Copy link

Description
Minion throws "Unhandled exception running state.sls" exception and fails to execute salt scheduled job under the given conditions:
The minion is in multimaster configuration.
The scheduled job fun is state.sls so at time of execution the minion needs to contact the master to get the sls file required by the job.
At moment of execution, the first server in the master list is not available (down for maintenance), the minion seems to stick to the first master to get the files, throwing a time out exception, never contacting the second master server to get the sls file.
The second master server receives a failed return event

# Job setup
----------
          ID: schedule_new_task
    Function: schedule.present
      Result: True
     Comment: Modifying job schedule_new_task in schedule
     Started: 19:06:42.446128
    Duration: 62.488 ms
     Changes:
              ----------
              schedule_new_task:
                  ----------
                  new:
                      ----------
                      function:
                          state.sls
                      return_job:
                          True
                      args:
                          - some.example.statefile
                      maxrunning:
                          1
                      name:
                          schedule_new_task
                      enabled:
                          True
                      jid_include:
                          True
                      splay:
                          10
                      when:
                          - 2024-07-16 19:11:42
# State execution to set the salt schedule job applied
2024-07-16 19:06:42,446 [salt.state       :2294][INFO    ][3028] Running state [schedule_new_task] at time 19:06:42.446128
2024-07-16 19:06:42,446 [salt.state       :2325][INFO    ][3028] Executing state schedule.present for [schedule_new_task]
2024-07-16 19:06:42,508 [salt.state       :319 ][INFO    ][3028] Made the following changes:
<REDACTED>
2024-07-16 19:06:42,508 [salt.state       :2492][INFO    ][3028] Completed state [schedule_new_task] at time 19:06:42.508616 (duration_in_ms=62.488)
2024-07-16 19:06:42,524 [salt.loaded.int.module.cmdmod:438 ][INFO    ][3028] Executing command attrib in directory 'C:\Users\adrian'
# Exception thrown after timing out trying to retrieve the state file only from the first master which is not available
2024-07-16 19:11:46,274 [salt.utils.schedule:1792][INFO    ][6524] Running scheduled job: schedule_new_task (runtime missed by 0.274241 seconds) with jid 20240716191146274241
2024-07-16 19:13:32,966 [salt.utils.schedule:913 ][ERROR   ][5840] Unhandled exception running state.sls
Traceback (most recent call last):
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\crypt.py", line 768, in sign_in
    payload = yield channel.send(sign_in_payload, tries=tries, timeout=timeout)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
    value = future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 337, in send
    ret = yield self._uncrypted_transfer(load, timeout=timeout)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
    value = future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 308, in _uncrypted_transfer
    ret = yield self.transport.send(
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
    value = future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\zeromq.py", line 916, in send
    ret = yield self.message_client.send(load, timeout=timeout)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
    value = future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\zeromq.py", line 595, in send
    recv = yield future
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
    value = future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
salt.exceptions.SaltReqTimeoutError: Message timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\schedule.py", line 880, in handle_func
    ret["return"] = self.functions[func](*args, **kwargs)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 159, in __call__
    ret = self.loader.run(run_func, *args, **kwargs)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 1245, in run
    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\loader\lazy.py", line 1260, in _run_as
    return _func_or_method(*args, **kwargs)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\modules\state.py", line 1425, in sls
    st_ = salt.state.HighState(
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 4954, in __init__
    BaseHighState.__init__(self, opts)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 3754, in __init__
    self.opts = self.__gen_opts(opts)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\state.py", line 3777, in __gen_opts
    mopts = self.client.master_opts()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\fileclient.py", line 1481, in master_opts
    return self._channel_send(
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\fileclient.py", line 1151, in _channel_send
    return self.channel.send(
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\asynchronous.py", line 124, in wrap
    raise exc_info[1].with_traceback(exc_info[2])
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\asynchronous.py", line 130, in _target
    result = io_loop.run_sync(lambda: getattr(self.obj, key)(*args, **kwargs))
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\ioloop.py", line 459, in run_sync
    return future_cell[0].result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 340, in send
    ret = yield self._crypted_transfer(load, timeout=timeout, raw=raw)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
    value = future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\channel\client.py", line 290, in _crypted_transfer
    yield self.auth.authenticate()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
    value = future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\crypt.py", line 657, in _authenticate
    creds = yield self.sign_in(channel=channel)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1056, in run
    value = future.result()
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\concurrent.py", line 249, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\ext\tornado\gen.py", line 1064, in run
    yielded = self.gen.throw(*exc_info)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\crypt.py", line 776, in sign_in
    raise SaltClientError(
salt.exceptions.SaltClientError: Attempt to authenticate with the salt master failed with timeout error
2024-07-16 19:13:32,981 [salt.minion      :2244][INFO    ][6524] Returning information for job: 20240716191146274241
2024-07-16 19:13:32,997 [salt.minion      :2244][INFO    ][6524] Returning information for job: 20240716191146274241
2024-07-16 19:13:48,017 [salt.minion      :2760][ERROR   ][6524] Timeout encountered while sending {'cmd': '_return', 'id': 'minion-win-1', 'fun': 'state.sls', 'fun_args': ['some.example.statefile', <REDACTED>, 'schedule': 'schedule_new_task', 'jid': '20240716191146274241', 'pid': 5840, 'return': 'Unhandled exception running state.sls', 'success': False, 'retcode': 254, '_stamp': '2024-07-16T19:13:32.997426', 'out': 'highstate', 'nonce': '16a331fbcbbe4d5591caa7892ee408a0'} request
# schedule job return event on the second master event bus showing the Unhandled exception running stae.sls message
# This is the only event shown on the second master, no attempt from the minion to retrieve the sls file is seen
salt/job/20240716191146274241/ret/minion-win-1	{
    "_stamp": "2024-07-16T19:13:32.925634",
    "cmd": "_return",
    "fun": "state.sls",
    "fun_args": [
        "some.example.statefile",
    ],
    "id": "minion-win-1",
    "jid": "20240716191146274241",
    "out": "highstate",
    "pid": 5840,
    "retcode": 254,
    "return": "Unhandled exception running state.sls",
    "schedule": "schedule_new_task",
    "success": false
}

Setup
Steps to Reproduce the behavior
3006.x minion in multimaster config (active active), minion is able to talk to both masters
Schedule job to run state.sls some.state.file
Prior to schedule job trigger, shutdown/disable first master
Wait for schedule job execution time and some extra time for timing out, an exception is triggered on the minion log.

Expected behavior
THe minion should attempt to connect to the second master to retrieve the files required by state.sls and complete the scheduled job

@amalaguti amalaguti added Bug broken, incorrect, or confusing behavior needs-triage labels Jul 16, 2024
@amalaguti
Copy link
Author

In addition,

Forcing the execution of the job via salt-call schedule.run_job, after establishing that the first master is unreachable and changing master to the next one in the list, the command throws Job schedule_new_task does not exist.
But the job is listed by schedule.list

Somehow the job is tied to the first master ?

PS C:\Users\adrian> salt-call schedule.run_job schedule_new_task -l info
[INFO    ] Master saltmaster-pip could not be reached, trying next master (if any)
[WARNING ] Master ip address changed from 172.21.0.10 to 172.21.0.11
local:
    ----------
    comment:
        Job schedule_new_task does not exist.
    result:
        False


PS C:\Users\adrian> salt-call schedule.list
[INFO    ] Master saltmaster-pip could not be reached, trying next master (if any)
[WARNING ] Master ip address changed from 172.21.0.10 to 172.21.0.11
local:
    schedule:
      schedule_new_task:
        args:
        - utils.reboot_system_module
        enabled: true
        function: state.sls
        jid_include: true
        maxrunning: 1
        name: some.example.statefile
        return_job: true
        returner: rawfile_json
        saved: true
        splay: 10
        when:
        - '2024-07-16 19:56:33'


PS C:\Users\adrian> salt-call schedule.run_job schedule_new_task -l info
[INFO    ] Master saltmaster-pip could not be reached, trying next master (if any)
[WARNING ] Master ip address changed from 172.21.0.10 to 172.21.0.11
local:
    ----------
    comment:
        Job schedule_new_task does not exist.
    result:
        False


# using --local triggers the job, but does not complete succesfully given the first master is unreachable
PS C:\Users\adrian> salt-call --local schedule.run_job schedule_new_task -l info
local:
    ----------
    comment:
        Scheduling Job schedule_new_task on minion.
    result:
        True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior needs-triage
Projects
None yet
Development

No branches or pull requests

1 participant