If MC 'start_task' request to a spawner fails, MC believes the session started and blocks the user from starting train sessions

XMLWordPrintable

    • Type: Bug
    • Resolution: Fixed
    • Priority: High
    • Edge AI Studio
    • EDGEST-1332
    • MC_1.3.1
    • MC_1.4.0
    • None

      I believe this is quite rare, but if it happens the way to fix it is to restart the spawner. 

      using the admin API /api/admin/get_spawner_info shows he is having a train session but using 'docker ps' confirmed there is no session. 

      After restarting MC the state was fixed. 

      Looking at the dinfra logs it appears that for some reason the HTTP request from MC to spawner failed with 

      2025-03-18T19:04:34.873Z INFO DAEMON dev.ti.com/cluster1/dev-mcw2-1 default/modelcomposer 812337108 [

          "[permId: 249772, projectId: fc7c5080, taskType: detection] /api/start_train: {\"errno\":-111,\"code\":\"ECONNREFUSED\",\"syscall\":\"connect\",\"address\":\"10.123.41.74\",\"port\":41087}"

      ]

      The error showed to the user , but let the MC state broken. 

      I don't know why was there connection refused for the API call, but in cases like this the state of the MC should be correct. 

      An easy way to reproduce this and make sure it is fix, is just by changing the MC code to use random port, the error will be different but the state will be broken.

            Assignee:
            TI User
            Reporter:
            TI User
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Connection: Intermediate to External PROD System
                EXTSYNC-5205 - If MC 'start_task' request to a spa...
                SYNCHRONIZED
                • Last Sync Date: