On the Arbitrary-Oriented Object Detection: Classification based
Approaches Revisited
Abstract

Abstract

Arbitrary-oriented object detection has been a building block for rotation sensitive tasks. We first show that the problem of discontinuous boundaries suffered in existing dominant regression-based rotation detectors, is caused by angular periodicity or corner ordering, according to the parameterization protocol. We also show that the root cause is that the ideal predictions can be out of the defined range. Accordingly, we transform the angular prediction task from a regression problem to a classification one. For the resulting circularly distributed angle classification problem, we first devise a Circular Smooth Label (CSL) technique to handle the periodicity of angle and increase the error tolerance to adjacent angles. To reduce the excessive model parameters by CSL, we further design a Gray Coded Label (GCL), which greatly reduces the length of the encoding. Finally, we further develop an object heading detection module, which can be useful when the exact heading orientation information is needed e.g. for ship and plane heading detection. We release our OHD-SJTU dataset and OHDet detector for heading detection. Results on three large-scale public datasets for aerial images i.e. DOTA, HRSC2016, OHD-SJTU, as well as scene text dataset ICDAR2015 and MLT, show the effectiveness of our approach.

Authors
  • Xue Yang, Shanghai Jiao Tong University, China
  • Junchi Yan (corresponding author), Shanghai Jiao Tong University, China
  • Tao He, COWAROBOT Co., Ltd, China
Paper & Code & Dataset
Approach
OHDet can be applied to rotation detection and object heading detection. Its structure combines many of my previous research contents, including R3Det, IoU-Smooth L1 Loss, CSL, etc. The figure below is the architecture of the proposed detector (RetinaNet as an embodiment).



Performance

Performance on OBB task of DOTA dataset:

MethodBackbonePLBDBRGTFSVLVSHTCBCSTSBFRAHASPHCmAP
FR-OResNet10179.0969.1217.1763.4934.2037.1636.2089.1969.6058.9649.452.5246.6944.8046.3052.93
IENetResNet10180.2064.5439.8232.0749.7165.0152.5881.4544.6678.5146.5456.7364.4064.2436.7557.14
R-DFPNResNet10180.9265.8233.7758.9455.7750.9454.7890.3366.3468.6648.7351.7655.1051.3235.8857.94
TOSOResNet10180.1765.5939.8239.9549.7165.0153.5881.4544.6678.5148.8556.7364.4064.2436.7557.92
PIoUDLA-3480.969.724.160.238.364.464.890.977.270.446.537.157.161.964.060.5
R2CNNResNet10180.9465.6735.3467.4459.9250.9155.8190.6766.9272.3955.0652.2355.1453.3548.2260.67
RRPNResNet10188.5271.2031.6659.3051.8556.1957.2590.8172.8467.3856.6952.8453.0851.9453.5861.01
Axis LearningResNet10179.5377.1538.5961.1567.5370.4976.3089.6679.0783.5347.2761.0156.2866.0636.0565.98
ICNResNet10181.4074.3047.7070.3064.9067.8070.0090.8079.1078.2053.6062.9067.0064.2050.2068.20
RADetResNeXt10179.4576.9948.0565.8365.4674.4068.8689.7078.1474.9749.9264.6366.1471.5862.1669.09
RoI-TransformerResNet10188.6478.5243.4475.9268.8173.6883.5990.7477.2781.4658.3953.5462.8358.9347.6769.56
P-RSDetResNet10189.0273.6547.3372.0370.5873.7172.7690.8280.1281.3259.4557.8760.7965.2152.5969.82
CAD-NetResNet10187.882.449.473.571.163.576.790.979.273.348.460.962.067.062.269.9
O2-DNetHourglass10489.3182.1447.3361.2171.3274.0378.6290.7682.2381.3660.9360.1758.2166.9861.0371.04
AOODResNet10189.9981.2544.5073.2068.9060.3366.8690.8980.9986.2364.9863.8865.2468.3662.1371.18
Cascade-FFResNet15289.980.451.777.468.275.275.690.878.884.462.364.657.769.450.171.8
BBAVectorsResNet10188.3579.9650.6962.1878.4378.9887.9490.8583.5884.3554.1360.2465.2264.2855.7072.32
SCRDetResNet10189.9880.6552.0968.3668.3660.3272.4190.8587.9486.8665.0266.6866.2568.2465.2172.61
SARDResNet10189.9384.1154.1972.0468.4161.1866.0090.8287.7986.5965.6564.0466.6868.8468.0372.95
GLS-NetResNet10188.6577.4051.2071.0373.3072.1684.6890.8780.4385.3858.3362.2767.5870.6960.4272.96
DRNHourglass10489.7182.3447.2264.1076.2274.4385.8490.5786.1884.8957.6561.9369.3069.6358.4873.23
FADetResNet10190.2179.5845.4976.4173.1868.2779.5690.8383.4084.6853.4065.4274.1769.6964.8673.28
MFIAR-NetResNet15289.6284.0352.4170.3070.1367.6477.8190.8585.4086.2263.2164.1468.3170.2162.1173.49
R3DetResNet15289.2480.8151.1165.6270.6776.0378.3290.8384.8984.4265.1057.1868.1068.9860.8872.81
RSDetResNet15290.182.053.868.570.278.773.691.287.184.764.368.266.169.363.774.1
Gliding VertexResNet10189.6485.0052.2677.3473.0173.1486.8290.7479.0286.8159.5570.9172.9470.8657.3275.02
Mask OBB ResNeXt10189.5685.9554.2172.9076.5274.1685.6389.8583.8186.4854.8969.6473.9469.0663.3275.33
FFAResNet10190.182.754.275.271.079.983.590.783.984.661.268.070.776.063.775.7
APEResNeXt-10189.9683.6253.4276.0374.0177.1679.4590.8387.1584.5167.7260.3374.6171.8465.5575.75
CenterMap OBBResNet10189.8384.4154.6070.2577.6678.3287.1990.6684.8985.2756.4669.2374.1371.5666.0676.03
CSLResNet15290.2585.5354.6475.3170.4473.5177.6290.8486.1586.6969.6068.0473.8371.1068.9376.17
GCLResNet15289.7083.3455.4467.3178.9874.7885.8690.8285.5685.3365.5661.5272.3078.1168.9176.23


We divide the training and validation images into 600x600 subimages with an overlap of 150 pixels and scale it to 800x800. In the process of cropping the image with the sliding window, keeping those objects whose center point is in the subimage. All experiments are based on the same setting, using ResNet101 as the backbone. Except for data augmentation (include random horizontal, vertical flipping, random graying, and random rotation) is used in OHD-SJTU-S, no other tricks are used.

Performance on OBB task of OHD-SJTU-L:

MethodPLSHSVLVHAHCAP50AP75AP50:95
R2CNN89.9971.9354.0065.4666.3655.9467.2832.6934.78
RRPN89.6675.3550.2572.2262.9945.2665.9621.2430.13
RetinaNet-H90.2066.9953.5863.3863.7553.8265.2934.5935.39
RetinaNet-R89.9977.6551.7781.2262.8552.2569.2939.0738.90
R3Det89.8978.3655.2378.3557.0653.5068.7335.3637.10
OHDet89.7277.4052.8978.7263.7654.6269.5241.8939.51


Performance on OBB task of OHD-SJTU-S:

MethodPLSHAP50AP75AP50:95
R2CNN90.9177.6684.2855.0052.80
RRPN90.1476.1383.1327.8740.74
RetinaNet-H90.8666.3278.5958.4553.07
RetinaNet-R90.8288.1489.4874.6261.86
R3Det90.8285.5988.2167.1356.19
OHDet90.7487.5989.0678.5563.94


The performance of object heading detection on OHD-SJTU-L:

TaskPLSHSVLVHAHCIoU50IoU75IoU50:95
OBB mAP89.6375.8846.2175.8861.4333.8763.8837.4536.42
OHD mAP59.8841.9026.2135.3441.2417.5337.0224.1022.46
Head Accuracy74.4969.7162.2157.9576.6649.0665.0165.7764.60


The performance of object heading detection on OHD-SJTU-S:

TaskPLSHIoU50IoU75IoU50:95
OBB mAP90.7388.5989.6675.6261.49
OHD mAP76.8986.4081.6565.5155.09
Head Accuracy90.9194.8792.8993.8194.25

Visualization
Detection examples of our proposed method in large scenarios on OHD-SJTU dataset. Our method can both effectively handle the dense and rotating cases. The blue border in the bounding box represents the predicted head of the object.