Explore essential coding commands and best practices for using the Web Scraper IDE. Learn how to navigate, parse data, interact with elements, and optimize your scraping tasks efficiently.
input
- Global object available to the interaction code. Provided by trigger input
or next_stage()
calls
navigate
- Navigate the browser session to a URL
url
: A URL to navigate tonavigate
options
parse
- Parse the page data
collect
- Adds a line of data to the dataset created by the crawler
data_line
: A object with the fields you want to collectvalidate_fn
: Optional function to validate that the line data is validnext_stage
- Run the next stage of the crawler with the specified input
input
: Input object to pass to the next browser sessionrerun_stage
- Run this stage of the crawler again with new input
input
: Input object to pass to the next browser sessionrun_stage
- Run a specific stage of the crawler with a new browser session
input
: Input object to pass to the next browser sessionstage
: Which stage to run (1 is first stage)country
- Configure your crawl to run from a specific country
code
: 2-character ISO country codewait
- Wait for an element to appear on the page
selector
: Element selectoropt
: wait options (see examples)wait_for_text
- Wait for an element on the page to include some text
selector
: Element selectortext
: The text to wait forclick
- Click on an element (will wait for the element to appear before clicking on it)
selector
: Element selectortype
- Enter text into an input (will wait for the input to appear before typing)
selector
: Element selectortext
: The text to wait forselect
- Pick a value from a select element
selector
: Element selectorURL
- URL class from NodeJS
standard “url” module
url
: URL stringlocation
- Object with info about current location. Available fields: href
url
: URL stringtag_response
- Save the response data from a browser request
name
: The name of the tagged fieldpattern
: The URL pattern to matchresponse_header
- Returns the response headers of the last page load
console
- Log messages from the interaction code
load_more
- Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites
selector
: Element selectorscroll_to
- Scroll the page so that an element is visible
$
- Helper for jQuery-like expressions
selector
: Element selectorinput
- Global variable available to the parser code
$
- An instance of cheeriolocation
- A global variable available to the parser code. Object with info about current location