Shirkaddii hore, waxaan sameeyay shaqo dufcad ah oo la socota mitirka guud ahaan warbaahinta bulshada, sida Twitter, LinkedIn, Mastodon, Bluesky, Reddit, iwm. Kadibna waxaan xaqiiqsaday inaan ku koobin karo "persona" ii gaar ah. Dhibaatadu waxay tahay in warbaahinta qaar aysan siin HTTP API cabbirada aan rabo. Waa kuwan cabbirrada aan rabo LinkedIn:
Waxaan raadiyay wakhti dheer laakiin ma helin API gelitaanka cabbirada sare. Subax kasta muddo dheer ayaan gacanta ku xoqi jiray mitirka waxaanan ugu dambayn go'aansaday inaan otomaatig u noqdo hawshan adag. Waa kan waxa aan bartay
Shaqadu waxay ku taal Python, markaa waxaan rabaa inaan ku sii jiro isla xirmada farsamada. Baadhitaan degdeg ah ka dib, waxaan helay Playwright , qalab browser ah oo iswada oo leh laba luuqadood oo API ah, oo ay ku jiraan Python. Kiiska isticmaalka aasaasiga ah ee Playwright waa tijaabada dhamaadka-ilaa-dhamaadka, laakiin sidoo kale waxay maamuli kartaa browserka ka baxsan macnaha tijaabada.
Waxaan u isticmaalayaa Gabayga si aan u maareeyo ku-tiirsanaanta. Ku rakibida Playwright waa sida ugu fudud:
poetry add playwright
Halkaa marka ay marayso, Playwright wuxuu diyaar u yahay inuu isticmaalo. Waxay bixisaa laba API oo kala duwan, mid synchronous iyo mid asynchronous . Sababtoo ah kiis-isticmaalkayga, dhadhanka ugu horreeya ayaa ka badan ku filan.
Waxaan jeclahay inaan u wajaho horumarka si kordheysa.
Halkan waxaa ah qayb ka mid ah API:
Waxay u tarjumaysaa summada soo socota:
from playwright.sync_api import Browser, Locator, Page, sync_playwright with (sync_playwright() as pw): #1 browser: Browser = pw.chromium.launch() #2 page: Page = browser.new_page() #3 page.goto('https://www.linkedin.com/login') #4 page.locator('#username').press_sequentially(getenv('LINKEDIN_USERNAME')) #5 page.locator('#password').press_sequentially(getenv('LINKEDIN_PASSWORD')) #5 page.locator('button[type=submit]').press('Enter') #6 page.goto('https://www.linkedin.com/dashboard/') #4 metrics_container: Locator = page.locator('.pcd-analytic-view-items-container') metrics: List[Locator] = metrics_container.locator('p.text-body-large-bold').all() #7 impressions = atoi(metrics[0].inner_text()) #8 # Get other metrics browser.close() #9
Hel shay playwright
yaqaan.
Bilow tusaale browserka Noocyo badan oo browser ah ayaa diyaar ah; Waxaan u doortay Chromium si niyad ah. Ogsoonow in ay ahayd in aad hore ugu rakibtay browser-ka gaarka ah, tusaale ahaan , playwright install --with-deps chromium
.
Sida caadiga ah, browserku wuxuu u furan yahay madax la'aan ; ma soo baxayso. Waxaan kula talin lahaa in si muuqata u socodsiiyo bilowga si ay u sahlanaato khaladka: headless = True
.
Fur daaqad cusub.
U soco meel cusub
Hel meelaha gelinta ee cayiman oo ku buuxi aqoonsigayga.
Hel badhanka la cayimay oo taabo
Hel dhammaan walxaha la cayimay.
Hel qoraalka gudaha ee qaybta koowaad.
Xir browserka si aad u nadiifiso
Kor ku xusan waxay u shaqeeyeen sidii la filayay. Dhibka kaliya ee jira ayaa ah in aan email ka helo LinkedIn mar kasta oo aan ordo qoraalka:
Hi Nicolas,
Waxaad si guul leh ugu hawlgelisay Igu xasuuso aalad cusub HeadlessChrome, <OS> gudaha <magaalada>, <gobolka>, <wadan> . Baro wax badan oo ku saabsan sida Xusuusnow iigu shaqeeyo aaladda.
Waxaan sidoo kale kula kulmay Fabien Vauchelles shirkii JavaCro . Waxa uu ku takhasusay xoqista shabakada waxana uu ii sheegay in dadka inta badan goobtan ay awoodaan profiles browser. Runtii, haddii aad gasho LinkedIn, waxaad heli doontaa calaamada aqoonsiga oo loo kaydiyay sida cookies, uma baahnid inaad mar kale xaqiijiso ka hor intuusan dhicin. Nasiib wanaag, Playwright wuxuu bixiyaa sifada noocaan oo kale ah iyada oo la adeegsanayo habka launch_persistent_context
.
launch
sare waxaan ku bedeli karnaa kuwa soo socda:
with sync_playwright() as pw: playwright_profile_dir = f'{Path.home()}/.social-metrics/playwright-profile' context: BrowserContext = pw.chromium.launch_persistent_context(playwright_profile_dir) #1 try: #2 page: Page = context.new_page() #3 page.goto('https://www.linkedin.com/dashboard/') #4 if 'session_redirect' in page.url: #4 page.locator('#username').press_sequentially(getenv('LINKEDIN_USERNAME')) page.locator('#password').press_sequentially(getenv('LINKEDIN_PASSWORD')) page.locator('button[type=submit]').press('Enter') page.goto('https://www.linkedin.com/dashboard/') metrics_container: Locator = page.locator('.pcd-analytic-view-items-container') # Same as in the previous snippet except Exception as e: #2 logger.error(f'Could not fetch metrics: {e}') finally: #5 context.close()
Playwright waxa uu ku kaydin doonaa astaanta galka la cayimay oo dib u isticmaali doona inta uu socdo oo dhan.
Hagaajinta maaraynta ka reeban.
BrowserContext
wuxuu kaloo furi karaa bogag.
Waxaan isku dayeynaa inaan u gudubno dashboard-ka. LinkedIn waxay noo hagaajin doontaa bogga gelitaanka haddii aan la xaqiijin; markaas waan xaqiijin karnaa.
Xir macnaha guud wixii ka soo baxaba.
Halkaa marka ay marayso, waxaan u baahanahay oo kaliya in aan ku xaqiijino labada aqoonsi marka ugu horeysa. Socodka xiga, waxay kuxirantahay.
Waxaan la yaabay markaan arkay in koodka sare aanu si kalsooni leh ugu shaqayn. Waxay ka shaqeysay orodkii ugu horreeyay iyo mararka qaarkood kuwa xiga. Sababtoo ah waxaan ku kaydinayaa astaanta biraawsarka guud ahaan socodka, marka aan u baahdo inaan xaqiijiyo, LinkedIn kaliya waxay waydiisaa erayga sirta ah, ma aha login! Sababtoo ah koodka wuxuu isku dayaa inuu galo gelitaanka, wuu ku guuldareystaa kiiskan. Hagaajintu waa mid toos ah:
username_field = page.locator('#username') if username_field.is_visible(): username_field.press_sequentially(getenv('LINKEDIN_USERNAME')) page.locator('#password').press_sequentially(getenv('LINKEDIN_PASSWORD'))
In kasta oo aanan khabiir ku ahayn Python, haddana waxa aan ku guulaystey waxa aan rabo Playwright. Waxaan doorbiday inaan isticmaalo sync API sababtoo ah waxay ka dhigaysaa koodhka wax yar in la fahmo, mana haysto wax shuruudo shaqo ah. Kaliya waxaan isticmaalay astaamaha aasaasiga ah ee uu bixiyo Playwright. Playwright wuxuu ogol yahay duubista fiidiyowyada macnaha guud ee imtixaanada, taas oo aad waxtar u leh marka imtixaanku ku guuldareysto inta lagu jiro fulinta dhuumaha CI.
Si aad u sii socoto:
Asal ahaan waxaa lagu daabacay A Java Geek Janaayo 19-keedii, 2024