Learn the best practices for using Web Scraper IDE, including optimizing performance, handling errors, managing retries, and writing efficient scraper code.
dead_page
condition.navigate
command, a ‘dead_page’ condition should be added to check if the page is not found. This will prevent automatic retries. While we automatically handle this when the response status is 404, in some cases, the website may respond with various other status codes:
Here are good and bad practices examples (you can nevigate between them by clicking on the “Bad” “Good” tabs)
rerun_stage()
rerun_stage()
should be called for each page from the root page instead of calling it from each page. This allows the system to parallelize the requests and make the scraper faster.
close_popup()
to close popupsclose_popup('popup_selector', 'close_button_selector')
to close popups. A popup can appear at any time, and in most cases, adding a check before each interaction command is not desirable.
Bad
wait_for_parser_value()
with tag_response()
wait_for_parser_value()
should be used:
$(selector).text_sane()
that removes all unnecessary whitespace characters and replaces them with a single space.