Replies: 1 comment
-
|
the memory issue is because WebElements hold references to browser session and dont get garbage collected easily. fixes:
def extract_listing(rows):
listing = []
for row in rows:
# extract all text data first
cols = row.find_elements(By.TAG_NAME, "td")
link = cols[0].find_element(By.TAG_NAME, "a")
# get all values as strings immediately
name = link.text.strip()
href = link.get_attribute("href")
full_text = cols[0].text.strip()
price_text = cols[2].text
# now process without holding WebElement refs
listing_id = full_text.replace(name, "").strip()
listing.append({
"listing_id": listing_id,
"listing_name": name,
"listing_link": href,
"last_price": float(price_text.replace(",", ""))
})
return listing
import gc
def extract_listing_batched(rows, batch_size=100):
listing = []
for i in range(0, len(rows), batch_size):
batch = rows[i:i+batch_size]
for row in batch:
listing.append(extract_info(row))
gc.collect() # force garbage collection
return listing
data = driver.execute_script("""
return Array.from(document.querySelectorAll("tr")).map(row => {
const cols = row.querySelectorAll("td");
const link = cols[0]?.querySelector("a");
return {
name: link?.textContent?.trim(),
href: link?.href,
price: cols[2]?.textContent
};
});
""")this is fastest and uses least memory since no WebElement objects created the JS approach is recommended for large tables |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm running into memory issues with this code - the VM ram gets completely exhausted during the table row processing loop.
The Problem:
In
extract_listing, when I loop throughrows, memory usage spikes until the VM runs out. Each iteration processes a row usingextract_info.Code snippet:
The memory leak seems to happen during the WebElement iteration. Each
rowWebElement might be holding onto more memory than expected, or there's some accumulation I'm not sure.Any known fixes or alternative approaches I should try?
Thanks for any tips!
Beta Was this translation helpful? Give feedback.
All reactions