[EXT_EP-12253] If MC 'start_task' request to a spawner fails, MC believes the session started and blocks the user from starting train sessions Created: 21/Mar/25 Updated: 06/Aug/25 Resolved: 06/Aug/25 |
|
| Status: | Fixed |
| Project: | Embedded Software & Tools |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | High |
| Reporter: | TI User | Assignee: | TI User |
| Resolution: | Fixed | Votes: | 0 |
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Product: | Edge AI Studio |
| Internal ID: | EDGEST-1332 |
| Found In Release: | MC_1.3.1 |
| Fix In Release: | MC_1.4.0 |
| Affected Platform/Device: | None |
| Description |
|
I believe this is quite rare, but if it happens the way to fix it is to restart the spawner. using the admin API /api/admin/get_spawner_info shows he is having a train session but using 'docker ps' confirmed there is no session. After restarting MC the state was fixed. Looking at the dinfra logs it appears that for some reason the HTTP request from MC to spawner failed with 2025-03-18T19:04:34.873Z INFO DAEMON dev.ti.com/cluster1/dev-mcw2-1 default/modelcomposer 812337108 [ "[permId: 249772, projectId: fc7c5080, taskType: detection] /api/start_train: {\"errno\":-111,\"code\":\"ECONNREFUSED\",\"syscall\":\"connect\",\"address\":\"10.123.41.74\",\"port\":41087}" ] The error showed to the user , but let the MC state broken. I don't know why was there connection refused for the API call, but in cases like this the state of the MC should be correct. An easy way to reproduce this and make sure it is fix, is just by changing the MC code to use random port, the error will be different but the state will be broken. |