Anysite Scraper Frequently Asked Questions
Welcome to the Frequently Asked Questions page for Anysite Scraper. Here you'll find detailed answers to common questions about using the software, configuring projects, handling different website structures, and troubleshooting common issues.
XPath (XML Path Language) is a syntax or language for finding any element on a web page using XML path expression. XPath is used to find the location of any element on a webpage using HTML DOM structure.
For more details visit:
When you need to extract specific leads from any website, the software provides you an environment to generate each field (like business name, address, person name, phone number, email address, website URL, etc.) from the web pages with mouse right clicks on the page. When you complete all fields needed to extract from the web page(s) of the website, you save the configurations in a file called a project.
Some websites load data dynamically on scroll therefore you have to apply auto-scroll on page load before extraction. Set scroll step points and delay of each step enough so that page gets enough time to load. Also your internet speed should be enough.
Example: Your internet speed is medium and you set scroll step points 300 (means 300 pixels) and 500 milliseconds delay.
Some websites display business name, address and contact information on search page also and it is enough information to extract instead of going to detail page for complete profile. It is called short profile information page or multi-record per page information. Short profile information pages are normally search pages containing multiple records on each page. It is fast if short profile fulfills your needs.
Example screenshot showing a search page with 3 records:
Most of the websites display short profile on search page and complete profile on separate page when you click on short profile link. Such types of profiles are called detail profile information pages. In such cases software takes profile links from search page and open detail profile in separate windows to extract data.
First of all select area with mouse over where your required data exists as shown in below image where area is selected which have all fields required like business name, address, telephone, rating, reviews etc.
In browser HTML nodes are shown in a tree structure and there is relationship between nodes like child parent, siblings, ancestors same as human relation. Parent nodes are those nodes which have some fields as child nodes in own tree structure.
You can select any field as a parent node where child nodes exist. First select area with mouse over where your required data exists.
Sometimes HTML field's actual data is not visible and we need to click on that field to see its actual data or extract actual data. For example, first we click on "Telephone" then it shows telephone numbers. When you add click item in software configuration during project creation then software will automatically click to show number then extract.
Steps to configure click action:
To collect these profiles URL links "Right Click" on any field which has URL link and a popup window will be opened.
Important settings:
There are some websites where directly "next page" is not given and pagination is showing like numbered pages (1, 2, 3, 4, etc.). In that case we can select next page by "Right click" on any page number like 2,3,4 etc. that is not already selected.
Configuration:
Sometimes our required field data is the value of any property. For example, telephone number might be stored as the value of a "data-visible-number" property.
To extract property data:
To extract image URL address from any site just Right click on image, a new popup window will appear.
Note: Select field type "Extract the image source address" when you are selecting image link from any site.
When you want to select any email address which is available on web page right click on that field. A popup will appear.
When your selected field type is "Extract email address" software will pick email formatted data from your selected HTML field.