Description:
AbstractQuality of organ at risk (OAR) autosegmentation is often judged by concordance metrics against the human‐generated gold standard. However, the ultimate goal is the ability to use unedited autosegmented OARs in treatment planning, while maintaining the plan quality. We tested this approach with head and neck (HN) OARs generated by a prototype deep‐learning (DL) model on patients previously treated for oropharyngeal and laryngeal cancer. Forty patients were selected, with all structures delineated by an experienced physician. For each patient, a set of 13 OARs were generated by the DL model. Each patient was re‐planned based on original targets and unedited DL‐produced OARs. The new dose distributions were then applied back to the manually delineated structures. The target coverage was evaluated with inhomogeneity index (II) and the relative volume of regret. For the OARs, Dice similarity coefficient (DSC) of areas under the DVH curves, individual DVH objectives, and composite continuous plan quality metric (PQM) were compared. The nearly identical primary target coverage for the original and re‐generated plans was achieved, with the same II and relative volume of regret values. The average DSC of the areas under the corresponding pairs of DVH curves was 0.97 ± 0.06. The number of critical DVH points which met the clinical objectives with the dose optimized on autosegmented structures but failed when evaluated on the manual ones was 5 of 896 (0.6%). The average OAR PQM score with the re‐planned dose distributions was essentially the same when evaluated either on the autosegmented or manual OARs. Thus, rigorous HN treatment planning is possible with OARs segmented by a prototype DL algorithm with minimal, if any, manual editing.