get_html_content
The get_html_content
function is designed to scrape the HTML content from a given URL using Selenium WebDriver. It provides options to wait for a specific element to be present on the page or to wait for a specified amount of time before retrieving the page source.
Parameters
url
(str): The URL of the web page you want to scrape.element_name
(str, optional): The name of the element to wait for before retrieving the page source. Defaults toNone
.by
(By, optional): The method used to locate the element. Defaults toBy.CLASS_NAME
.time_wait
(int, optional): The maximum amount of time (in seconds) to wait for the element or page load. Defaults to10
.
Functionality
Set Up Chrome Options:
Configures the Chrome driver to run in headless mode (no GUI).
Initialize Chrome Driver:
Sets up the Chrome WebDriver with the specified options.
Open the URL:
Navigates to the provided URL.
Wait for Element or Time:
If
element_name
is provided, it waits until the element is located using the specified method (by
).If no element is specified, it waits for the specified amount of time.
Retrieve Page Source:
Fetches the HTML content of the page.
Close the Driver:
Closes the WebDriver to clean up resources.