Magari yanayojiendesha yenyewe hayawezi kumudu makosa. Kukosa taa ya trafiki au mtembea kwa miguu kunaweza kusababisha maafa. Lakini kugundua kitu katika mazingira ya mijini yenye nguvu? Hiyo ni ngumu.
Nilifanya kazi katika kuboresha utambuzi wa kitu kwa magari yanayojiendesha kwa kutumia Atrous Spatial Pyramid Pooling (ASPP) na Transfer Learning. Matokeo? Muundo unaotambua vitu katika mizani nyingi, hata katika mwanga mbaya, na huendeshwa kwa ufanisi katika muda halisi.
Hivi ndivyo nilivyofanya.
Magari yanayojiendesha hutegemea Convolutional Neural Networks (CNNs) kugundua vitu, lakini hali halisi ya ulimwengu huleta changamoto:
CNN za kitamaduni hupambana na utambuzi wa vitu kwa viwango vingi na mafunzo kutoka mwanzo huchukua milele. Hapo ndipo ASPP na Transfer Learning huja.
CNN hufanya kazi vizuri kwa vitu vya ukubwa usiobadilika lakini vitu vya ulimwengu halisi hutofautiana kwa ukubwa na umbali. Ukusanyaji wa Pyramid Spatial Atrous (ASPP) hutatua hili kwa kutumia mipasuko iliyopanuliwa ili kunasa vipengele katika mizani nyingi.
ASPP hutumia vichujio vingi vya ubadilishaji vilivyo na viwango tofauti vya upanuzi ili kutoa vipengele katika viwango tofauti, vitu vidogo, vitu vikubwa na kila kitu kilicho katikati.
Hivi ndivyo nilivyotekeleza ASPP katika PyTorch nikijumuisha urekebishaji wa kikundi na umakini kwa utendaji thabiti katika mazingira changamano:
import torch import torch.nn as nn import torch.nn.functional as F class ASPP(nn.Module): """ A more advanced ASPP with optional attention and group normalization. """ def __init__(self, in_channels, out_channels, dilation_rates=(6,12,18), groups=8): super(ASPP, self).__init__() self.aspp_branches = nn.ModuleList() #1x1 Conv branch self.aspp_branches.append( nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=False), nn.GroupNorm(groups, out_channels), nn.ReLU(inplace=True) ) ) for rate in dilation_rates: self.aspp_branches.append( nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=rate, dilation=rate, bias=False), nn.GroupNorm(groups, out_channels), nn.ReLU(inplace=True) ) ) #Global average pooling branch self.global_pool = nn.AdaptiveAvgPool2d((1, 1)) self.global_conv = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False), nn.GroupNorm(groups, out_channels), nn.ReLU(inplace=True) ) #Attention mechanism to refine the concatenated features self.attention = nn.Sequential( nn.Conv2d(out_channels*(len(dilation_rates)+2), out_channels, kernel_size =1, bias=False), nn.Sigmoid() ) self.project = nn.Sequential( nn.Conv2d(out_channels*(len(dilation_rates)+2), out_channels, kernel_size=1, bias=False), nn.GroupNorm(groups, out_channels), nn.ReLU(inplace=True) ) def forward(self, x): cat_feats = [] for branch in self.aspp_branches: cat_feats.append(branch(x)) g_feat = self.global_pool(x) g_feat = self.global_conv(g_feat) g_feat = F.interpolate(g_feat, size=x.shape[2:], mode='bilinear', align_corners=False) cat_feats.append(g_feat) #Concatenate along channels x_cat = torch.cat(cat_feats, dim=1) #channel-wise attention att_map = self.attention(x_cat) x_cat = x_cat * att_map out = self.project(x_cat) return out
Kufunza muundo wa utambuzi wa kitu kutoka mwanzo hakutoi manufaa mengi wakati kuna miundo iliyofunzwa mapema. Mafunzo ya kuhamisha huturuhusu kurekebisha muundo ambao tayari unaelewa vitu.
Nilitumia DETR (Detection Transformer), modeli ya kugundua kitu chenye msingi wa kibadilishaji kutoka kwa Facebook AI. Hujifunza muktadha—kwa hivyo haipati tu alama ya kusimama, inaelewa kuwa ni sehemu ya eneo la barabarani.
Hivi ndivyo nilivyorekebisha DETR kwenye hifadhidata za kujiendesha:
import torch import torch.nn as nn from transformers import DetrConfig, DetrForObjectDetection class CustomBackbone(nn.Module): def __init__(self, in_channels=3, hidden_dim=256): super(CustomBackbone, self).__init__() # Example: basic conv layers + ASPP self.initial_conv = nn.Sequential( nn.Conv2d(in_channels, 64, kernel_size=7, stride=2, padding=3, bias=False), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1) ) self.aspp = ASPP(in_channels=64, out_channels=hidden_dim) def forward(self, x): x = self.initial_conv(x) x = self.aspp(x) return x class DETRWithASPP(nn.Module): def __init__(self, num_classes=91): super(DETRWithASPP, self).__init__() self.backbone = CustomBackbone() config = DetrConfig.from_pretrained("facebook/detr-resnet-50") config.num_labels = num_classes self.detr = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50", config=config) self.detr.model.backbone.body = nn.Identity() def forward(self, images, pixel_masks=None): features = self.backbone(images) feature_dict = { "0": features } outputs = self.detr.model(inputs_embeds=None, pixel_values=None, pixel_mask=pixel_masks, features=feature_dict, output_attentions=False) return outputs model = DETRWithASPP(num_classes=10) images = torch.randn(2, 3, 512, 512) outputs = model(images)
Magari yanayojiendesha yanahitaji hifadhidata kubwa, lakini data iliyo na lebo ya ulimwengu halisi ni chache. kurekebisha? Tengeneza data sanisi kwa kutumia GANs (Generative Adversarial Networks).
Nilitumia GAN kuunda alama za njia bandia lakini za kweli na matukio ya trafiki ili kupanua hifadhidata .
Hapa kuna GAN rahisi kwa kizazi cha kuashiria njia:
import torch import torch.nn as nn import torch.nn.functional as F class LaneMarkingGenerator(nn.Module): """ A DCGAN-style generator designed for producing synthetic lane or road-like images. Input is a latent vector (noise), and the output is a (1 x 64 x 64) grayscale image. You can adjust channels, resolution, and layers to match your target data. """ def __init__(self, z_dim=100, feature_maps=64): super(LaneMarkingGenerator, self).__init__() self.net = nn.Sequential( #Z latent vector of shape (z_dim, 1, 1) nn.utils.spectral_norm(nn.ConvTranspose2d(z_dim, feature_maps * 8, 4, 1, 0, bias=False)), nn.BatchNorm2d(feature_maps * 8), nn.ReLU(True), #(feature_maps * 8) x 4 x 4 nn.utils.spectral_norm(nn.ConvTranspose2d(feature_maps * 8, feature_maps * 4, 4, 2, 1, bias=False)), nn.BatchNorm2d(feature_maps * 4), nn.ReLU(True), #(feature_maps * 4) x 8 x 8 nn.utils.spectral_norm(nn.ConvTranspose2d(feature_maps * 4, feature_maps * 2, 4, 2, 1, bias=False)), nn.BatchNorm2d(feature_maps * 2), nn.ReLU(True), #(feature_maps * 2) x 16 x 16 nn.utils.spectral_norm(nn.ConvTranspose2d(feature_maps * 2, feature_maps, 4, 2, 1, bias=False)), nn.BatchNorm2d(feature_maps), nn.ReLU(True), #(feature_maps) x 32 x 32 nn.utils.spectral_norm(nn.ConvTranspose2d(feature_maps, 1, 4, 2, 1, bias=False)), nn.Tanh() ) def forward(self, z): return self.net(z) class LaneMarkingDiscriminator(nn.Module): """ A DCGAN-style discriminator. It takes a (1 x 64 x 64) image and attempts to classify whether it's real or generated (fake). """ def __init__(self, feature_maps=64): super(LaneMarkingDiscriminator, self).__init__() self.net = nn.Sequential( #1x 64 x 64 nn.utils.spectral_norm(nn.Conv2d(1, feature_maps, 4, 2, 1, bias=False)), nn.LeakyReLU(0.2, inplace=True), #(feature_maps) x 32 x 32 nn.utils.spectral_norm(nn.Conv2d(feature_maps, feature_maps * 2, 4, 2, 1, bias=False)), nn.BatchNorm2d(feature_maps * 2), nn.LeakyReLU(0.2, inplace=True), #(feature_maps * 2) x 16 x 16 nn.utils.spectral_norm(nn.Conv2d(feature_maps * 2, feature_maps * 4, 4, 2, 1, bias=False)), nn.BatchNorm2d(feature_maps * 4), nn.LeakyReLU(0.2, inplace=True), #(feature_maps * 4) x 8 x 8 nn.utils.spectral_norm(nn.Conv2d(feature_maps * 4, feature_maps * 8, 4, 2, 1, bias=False)), nn.BatchNorm2d(feature_maps * 8), nn.LeakyReLU(0.2, inplace=True), #(feature_maps * 8) x 4 x 4 nn.utils.spectral_norm(nn.Conv2d(feature_maps * 8, 1, 4, 1, 0, bias=False)), ) def forward(self, x): return self.net(x).view(-1)
Kwa kuchanganya ASPP, Transfer Learning, na Data Synthetic , nilitengeneza mfumo sahihi zaidi wa kugundua vitu kwa magari yanayojiendesha. Baadhi ya matokeo muhimu ni:
Tuliunganisha ASPP, Transfoma, na Data Sinifu kuwa tishio mara tatu kwa utambuzi wa kitu kinachojitegemea—kugeuza miundo iliyokuwa ya uvivu, isiyo na macho kuwa mifumo ya ufahamu ambayo hutambua mwanga wa trafiki kutoka umbali fulani. Kwa kukumbatia mazungumzo yaliyopanuliwa kwa maelezo ya viwango vingi, uhamishaji wa mafunzo kwa urekebishaji wa haraka, na data inayozalishwa na GAN ili kujaza kila pengo, tunapunguza muda wa makisio karibu nusu na kuokoa saa za mafunzo. Ni hatua kubwa kuelekea magari ambayo yanaona ulimwengu zaidi kama tunavyofanya kwa kasi zaidi, kwa usahihi zaidi, na kuelekea kwenye mitaa yetu yenye machafuko mengi kwa kujiamini.