[EXT_EP-12253] If MC 'start_task' request to a spawner fails, MC believes the session started and blocks the user from starting train sessions Created: 21/Mar/25  Updated: 06/Aug/25  Resolved: 06/Aug/25

Status: Fixed
Project: Embedded Software & Tools
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: High
Reporter: TI User Assignee: TI User
Resolution: Fixed Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Product: Edge AI Studio
Internal ID: EDGEST-1332
Found In Release: MC_1.3.1
Fix In Release: MC_1.4.0
Affected Platform/Device: None

 Description   

I believe this is quite rare, but if it happens the way to fix it is to restart the spawner. 

using the admin API /api/admin/get_spawner_info shows he is having a train session but using 'docker ps' confirmed there is no session. 

After restarting MC the state was fixed. 

Looking at the dinfra logs it appears that for some reason the HTTP request from MC to spawner failed with 

2025-03-18T19:04:34.873Z INFO DAEMON dev.ti.com/cluster1/dev-mcw2-1 default/modelcomposer 812337108 [

    "[permId: 249772, projectId: fc7c5080, taskType: detection] /api/start_train: {\"errno\":-111,\"code\":\"ECONNREFUSED\",\"syscall\":\"connect\",\"address\":\"10.123.41.74\",\"port\":41087}"

]

The error showed to the user , but let the MC state broken. 

I don't know why was there connection refused for the API call, but in cases like this the state of the MC should be correct. 

An easy way to reproduce this and make sure it is fix, is just by changing the MC code to use random port, the error will be different but the state will be broken.


Generated at Sat Dec 13 11:37:43 CST 2025 using Jira 10.3.7#10030007-sha1:a563685562f94d165eb4e158cfb2a142338d8c54.