get_html_content
The get_html_content function is designed to scrape the HTML content from a given URL using Selenium WebDriver. It provides options to wait for a specific element to be present on the page or to wait for a specified amount of time before retrieving the page source.
Parameters
url(str): The URL of the web page you want to scrape.element_name(str, optional): The name of the element to wait for before retrieving the page source. Defaults toNone.by(By, optional): The method used to locate the element. Defaults toBy.CLASS_NAME.time_wait(int, optional): The maximum amount of time (in seconds) to wait for the element or page load. Defaults to10.
Functionality
Set Up Chrome Options:
Configures the Chrome driver to run in headless mode (no GUI).
Initialize Chrome Driver:
Sets up the Chrome WebDriver with the specified options.
Open the URL:
Navigates to the provided URL.
Wait for Element or Time:
If
element_nameis provided, it waits until the element is located using the specified method (by).If no element is specified, it waits for the specified amount of time.
Retrieve Page Source:
Fetches the HTML content of the page.
Close the Driver:
Closes the WebDriver to clean up resources.
Example Usage
html_content = get_html_content(
url="https://example.com",
element_name="main-content",
by=By.ID,
time_wait=15
)