Beschreibung:
AbstractFor semantic segmentation tasks, it is expensive to get pixel-level annotations on real images. Domain adaptation eliminates this process by transferring networks trained on synthetic images to real-world images. As one of the mainstream approaches to domain adaptation, most of the self-training based domain adaptive methods focus on how to select high confidence pseudo-labels, i.e., to obtain domain invariant knowledge indirectly. A more direct means to explicitly align the data of the source and target domains globally and locally is lacking. Meanwhile, the target features obtained by traditional self-training methods are relatively scattered and cannot be aggregated in a relatively compact space. We offer an approach that utilizes data augmentation and contrastive learning in this paper to perform more effective knowledge migration with the basis of self-training. Specifically, the style migration and image mixing modules are first introduced for data augmentation to cope with the problem of large domain gaps in the source and target domains. To assure the aggregation of features from the same class and the discriminability of features from other classes during the training process, we propose a multi-scale pixel-level contrastive learning module. What’s more, a cross-scale contrastive learning module is proposed to help each level of the model gain the capability to obtain more information on the basis of its own original task. Experiments show that our final trained model can effectively classify the images from target domain.