Shandura Dhata Capture (CDC) inzira inoshandiswa kuteedzera shanduko padanho remutsara mumabasa edhatabhesi (kupinza, kugadziridza, kudzima) uye kuzivisa mamwe masisitimu mukurongeka kwezviitiko. Mumamiriro ekudzoreredza njodzi, CDC inonyanya kuwiriranisa data pakati pekutanga uye dhatabhesi yekuchengetedza, ichigonesa chaiyo-nguva data syncing kubva kune yekutanga kuenda kune yechipiri dhatabhesi.
source ----------> CDC ----------> sink
SeaTunnel CDC inopa marudzi maviri ekuyananisa data:
Iyo yekuvhara-yemahara snapshot yekubatanidza chikamu inosimbiswa nekuti akawanda aripo eCDC mapuratifomu, akadai seDebezium, anogona kukiya matafura panguva yenhoroondo yekuwiriranisa data. Kuverenga kweSnapshot ndiyo maitiro ekuyananisa nhoroondo yedatabase data. Hwaro hwekufamba kweiyo nzira ndeiyi inotevera:
storage -------------> splitEnumerator ---------- split ----------> reader ^ | | | \----------------- report -----------/
Split Partitioning
splitEnumerator
(split distributor) inopatsanura data retafura kuita kupatsanurwa kwakawanda zvichienderana nenzvimbo dzakatarwa (senge ID yetafura kana makiyi akasiyana) uye saizi yenhanho yakatsanangurwa.
Parallel Processing
Kupatsanurwa kwega kwega kunopihwa muverengi akasiyana kuti averenge zvakafanana. Muverengi mumwe chete anotora kubatana kumwe.
Chiitiko Feedback
Mushure splitEnumerator
basa rekuverenga rekupatsanura, muverengi wega wega anoshuma kufambira mberi kudzokera kusplitEnumerator. Iyo metadata yekuparadzanisa inopihwa sezvizvi:
String splitId # Routing ID TableId tableId # Table ID SeatunnelRowType splitKeyType # The type of field used for partitioning Object splitStart # Start point of the partition Object splitEnd # End point of the partition
Kana muverengi angogamuchira ruzivo rwakapatsanurwa, inogadzira iyo yakakodzera SQL zvirevo. Isati yatanga, inoisa iyo yazvino kupatsanura inoenderana chinzvimbo mudura redatabase. Mushure mekupedza kupatsanurwa kwazvino, muverengi anoshuma kufambira mberi kune splitEnumerator
neiyo inotevera data:
String splitId # Split ID Offset highWatermark # Log position corresponding to the split, for future validation
Iyo yekuwedzera yekuwiriranisa chikamu inotanga mushure meiyo snapshot yekuverenga chikamu. Mune ino nhanho, chero shanduko inoitika mune sosi dhatabhesi inotorwa uye inowiriraniswa kune yekuchengetedza dhatabhesi munguva chaiyo. Ichi chikamu chinoteerera kune dhatabhesi log (semuenzaniso, MySQL binlog). Kuwedzera kwekutevera kunowanzo kumwechete-shinda kuti udzivise kudhirowa kudhonzwa kwebinlog uye kuderedza dhatabhesi mutoro. Naizvozvo, muverengi mumwe chete anoshandiswa, achitora kubatana kumwe.
data log -------------> splitEnumerator ---------- split ----------> reader ^ | | | \----------------- report -----------/
Muchikamu chekuwedzera chekubatanidza, zvese zvakapatsanurwa uye matafura kubva pachikamu chechidimbu zvinosanganiswa kuita kupatsanurwa kumwe. Iyo kupatsanurwa metadata panguva ino ndeiyi inotevera:
String splitId Offset startingOffset # The lowest log start position among all splits Offset endingOffset # Log end position, or "continuous" if ongoing, eg, in the incremental phase List<TableId> tableIds Map<TableId, Offset> tableWatermarks # Watermark for all splits List<CompletedSnapshotSplitInfo> completedSnapshotSplitInfos # Snapshot phase split details
Iyo CompletedSnapshotSplitInfo
minda yakaita seiyi:
String splitId TableId tableId SeatunnelRowType splitKeyType Object splitStart Object splitEnd Offset watermark # Corresponds to the highWatermark in the report
Iyo yakakamurwa muchikamu chekuwedzera ine watermark kune ese akapatsanurwa muchikamu chesnapshot. Iyo shoma watermark inosarudzwa senzvimbo yekutanga yekuwedzera kuwiriranisa.
Ingave mumufananidzo wekuverenga kana chikamu chekuwedzera chekuverenga, dhatabhesi rinogonawo kuchinja kuti riwirirane. Isu tinovimbisa sei chaizvo kutumirwa kumwe chete?
Muchikamu chekuverenga snapshot, semuenzaniso, kupatsanurwa kuri kuwiriraniswa apo shanduko dziri kuitika, sekuiswa kwemutsara k3
, inogadziridza k2
, uye kudzima k1
. Kana pasina chiziviso chebasa chinoshandiswa panguva yekuverenga, zvigadziriso zvinogona kurasika. SeaTunnel inobata izvi ne:
split{start, end}
.
Kana high = low
, data yekupatsanurwa haina kuchinja panguva yekuverenga. Kana (high - low) > 0
, shanduko dzakaitika panguva yekugadzirisa. Mumamiriro ezvinhu akadaro, SeaTunnel icha:
low watermark
kuenda high watermark
mukurongeka, uchishandisa makiyi ekutanga kudzoreredza mashandiro patafura yendangariro.
insert k3 update k2 delete k1 | | | vvv bin log --|---------------------------------------------------|-- log offset low watermark high watermark CDC reads: k1 k3 k4 | Replays v Real data: k2 k3' k4
Asati atanga chikamu chekuwedzera, SeaTunnel inotanga kusimbisa kupatsanurwa kwese kubva padanho rekutanga. Pakati pekuparadzaniswa, data inogona kuvandudzwa, semuenzaniso, kana zvinyorwa zvitsva zvakaiswa pakati pekuparadzanisa1 uye split2, zvinogona kupotsa panguva yechikamu chechidimbu. Kuti udzore iyi data pakati pekuparadzana, SeaTunnel inotevera nzira iyi:
completedSnapshotSplitInfos
kuti uone kana iyo data yakagadziriswa mune chero kupatsanurwa. Kana zvisina kudaro, inofungidzirwa kuti data pakati pekuparadzana uye inofanira kugadziriswa.
|------------filter split2-----------------| |----filter split1------| data log -|-----------------------|------------------|----------------------------------|- log offset min watermark split1 watermark split2 watermark max watermark
Zvakadini nekumbomira uye kutangazve CDC? SeaTunnel inoshandisa yakagoverwa snapshot algorithm (Chandy-Lamport):
Fungidzira kuti sisitimu ine maitiro maviri, p1
uye p2
, apo p1
ine matatu akasiyana X1 Y1 Z1
uye p2
ine matatu akasiyana X2 Y2 Z2
. Mamiriro ekutanga ndeaya anotevera:
p1 p2 X1:0 X2:4 Y1:0 Y2:2 Z1:0 Z2:3
Panguva ino, p1
inotanga mufananidzo wepasi rose. p1
inotanga kurekodha maitiro ayo, yozotumira chiratidzo kune p2
.
Chiratidzo chisati chasvika p2
, p2
inotumira meseji M
kune p1
.
p1 p2 X1:0 -------marker-------> X2:4 Y1:0 <---------M---------- Y2:2 Z1:0 Z2:3
Pakugamuchira chiratidzo, p2
inorekodha mamiriro ayo, uye p1
inogamuchira iyo meseji M
. Sezvo p1
yakatoita snapshot yemunharaunda, inongoda kuisa meseji M
. Mufananidzo wekupedzisira unotaridzika seizvi:
p1 M p2 X1:0 X2:4 Y1:0 Y2:2 Z1:0 Z2:3
MuSeaTunnel CDC, mamakisi anotumirwa kune vese vaverengi, kupatsanura vaverengi, vanyori, uye dzimwe node, imwe neimwe ichichengeta ndangariro yayo.