mirror of
https://github.com/FlareSolverr/FlareSolverr.git
synced 2025-12-05 17:18:19 +01:00
Compare commits
16 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6dc279a9d3 | ||
|
|
96fcd21174 | ||
|
|
3a6e8e0f92 | ||
|
|
2d97f88276 | ||
|
|
ac5c64319e | ||
|
|
c93834e2f0 | ||
|
|
e3b4200d94 | ||
|
|
0941861f80 | ||
|
|
8a10eb27a6 | ||
|
|
e9c08c84ef | ||
|
|
2aa1744476 | ||
|
|
a89679a52d | ||
|
|
410ee7981f | ||
|
|
e163019f28 | ||
|
|
7d84f1b663 | ||
|
|
4807e9dbe2 |
3
.github/ISSUE_TEMPLATE/bug_report.yml
vendored
3
.github/ISSUE_TEMPLATE/bug_report.yml
vendored
@@ -32,7 +32,8 @@ body:
|
||||
- Operating system:
|
||||
- Are you using Docker: [yes/no]
|
||||
- FlareSolverr User-Agent (see log traces or / endpoint):
|
||||
- Are you using a proxy or VPN: [yes/no]
|
||||
- Are you using a VPN: [yes/no]
|
||||
- Are you using a Proxy: [yes/no]
|
||||
- Are you using Captcha Solver: [yes/no]
|
||||
- If using captcha solver, which one:
|
||||
- URL to test this issue:
|
||||
|
||||
2
.github/workflows/autotag.yml
vendored
2
.github/workflows/autotag.yml
vendored
@@ -11,7 +11,7 @@ jobs:
|
||||
steps:
|
||||
-
|
||||
name: Checkout
|
||||
uses: actions/checkout@v2
|
||||
uses: actions/checkout@v3
|
||||
-
|
||||
name: Auto Tag
|
||||
uses: Klemensas/action-autotag@stable
|
||||
|
||||
14
.github/workflows/release-docker.yml
vendored
14
.github/workflows/release-docker.yml
vendored
@@ -11,39 +11,39 @@ jobs:
|
||||
steps:
|
||||
-
|
||||
name: Checkout
|
||||
uses: actions/checkout@v2
|
||||
uses: actions/checkout@v3
|
||||
-
|
||||
name: Downcase repo
|
||||
run: echo REPOSITORY=$(echo ${{ github.repository }} | tr '[:upper:]' '[:lower:]') >> $GITHUB_ENV
|
||||
-
|
||||
name: Docker meta
|
||||
id: docker_meta
|
||||
uses: crazy-max/ghaction-docker-meta@v1
|
||||
uses: crazy-max/ghaction-docker-meta@v3
|
||||
with:
|
||||
images: ${{ env.REPOSITORY }},ghcr.io/${{ env.REPOSITORY }}
|
||||
tag-sha: false
|
||||
-
|
||||
name: Set up QEMU
|
||||
uses: docker/setup-qemu-action@v1.0.1
|
||||
uses: docker/setup-qemu-action@v2
|
||||
-
|
||||
name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v1
|
||||
uses: docker/setup-buildx-action@v2
|
||||
-
|
||||
name: Login to DockerHub
|
||||
uses: docker/login-action@v1
|
||||
uses: docker/login-action@v2
|
||||
with:
|
||||
username: ${{ secrets.DOCKERHUB_USERNAME }}
|
||||
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
||||
-
|
||||
name: Login to GitHub Container Registry
|
||||
uses: docker/login-action@v1
|
||||
uses: docker/login-action@v2
|
||||
with:
|
||||
registry: ghcr.io
|
||||
username: ${{ github.repository_owner }}
|
||||
password: ${{ secrets.GH_PAT }}
|
||||
-
|
||||
name: Build and push
|
||||
uses: docker/build-push-action@v2
|
||||
uses: docker/build-push-action@v3
|
||||
with:
|
||||
context: .
|
||||
file: ./Dockerfile
|
||||
|
||||
4
.github/workflows/release.yml
vendored
4
.github/workflows/release.yml
vendored
@@ -11,12 +11,12 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v2
|
||||
uses: actions/checkout@v3
|
||||
with:
|
||||
fetch-depth: 0 # get all commits, branches and tags (required for the changelog)
|
||||
|
||||
- name: Setup Node
|
||||
uses: actions/setup-node@v2
|
||||
uses: actions/setup-node@v3
|
||||
with:
|
||||
node-version: '16'
|
||||
|
||||
|
||||
13
CHANGELOG.md
13
CHANGELOG.md
@@ -1,5 +1,18 @@
|
||||
# Changelog
|
||||
|
||||
## v3.0.2 (2023/01/08)
|
||||
|
||||
* Detect Cloudflare blocked access
|
||||
* Check Chrome / Chromium web browser is installed correctly
|
||||
|
||||
## v3.0.1 (2023/01/06)
|
||||
|
||||
* Kill Chromium processes properly to avoid defunct/zombie processes
|
||||
* Update undetected-chromedriver
|
||||
* Disable Zygote sandbox in Chromium browser
|
||||
* Add more selectors to detect blocked access
|
||||
* Include procps (ps), curl and vim packages in the Docker image
|
||||
|
||||
## v3.0.0 (2023/01/04)
|
||||
|
||||
* This is the first release of FlareSolverr v3. There are some breaking changes
|
||||
|
||||
@@ -29,7 +29,8 @@ RUN dpkg -i /libgl1-mesa-dri.deb \
|
||||
&& dpkg -i /adwaita-icon-theme.deb \
|
||||
# Install dependencies
|
||||
&& apt-get update \
|
||||
&& apt-get install -y --no-install-recommends chromium chromium-common chromium-driver xvfb \
|
||||
&& apt-get install -y --no-install-recommends chromium chromium-common chromium-driver xvfb dumb-init \
|
||||
procps curl vim \
|
||||
# Remove temporary files and hardware decoding libraries
|
||||
&& rm -rf /var/lib/apt/lists/* \
|
||||
&& rm -f /usr/lib/x86_64-linux-gnu/libmfxhw* \
|
||||
@@ -52,6 +53,9 @@ COPY package.json ../
|
||||
|
||||
EXPOSE 8191
|
||||
|
||||
# dumb-init avoids zombie chromium processes
|
||||
ENTRYPOINT ["/usr/bin/dumb-init", "--"]
|
||||
|
||||
CMD ["/usr/local/bin/python", "-u", "/app/flaresolverr.py"]
|
||||
|
||||
# Local build
|
||||
|
||||
@@ -64,14 +64,11 @@ Remember to restart the Docker daemon and the container after the update.
|
||||
|
||||
### Precompiled binaries
|
||||
|
||||
This is the recommended way for Windows users.
|
||||
* Download the [FlareSolverr zip](https://github.com/FlareSolverr/FlareSolverr/releases) from the release's assets. It is available for Windows and Linux.
|
||||
* Extract the zip file. FlareSolverr executable and firefox folder must be in the same directory.
|
||||
* Execute FlareSolverr binary. In the environment variables section you can find how to change the configuration.
|
||||
Precompiled binaries are not currently available for v3. Please see https://github.com/FlareSolverr/FlareSolverr/issues/660 for updates,
|
||||
or below for instructions of how to build FlareSolverr from source code.
|
||||
|
||||
### From source code
|
||||
|
||||
This is the recommended way for macOS users and for developers.
|
||||
* Install [Python 3.10](https://www.python.org/downloads/).
|
||||
* Install [Chrome](https://www.google.com/intl/en_us/chrome/) or [Chromium](https://www.chromium.org/getting-involved/download-chromium/) web browser.
|
||||
* (Only in Linux / macOS) Install [Xvfb](https://en.wikipedia.org/wiki/Xvfb) package.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "flaresolverr",
|
||||
"version": "3.0.0",
|
||||
"version": "3.0.3",
|
||||
"description": "Proxy server to bypass Cloudflare protection",
|
||||
"author": "Diego Heras (ngosang / ngosang@hotmail.es)",
|
||||
"license": "MIT"
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
bottle==0.12.23
|
||||
waitress==2.1.2
|
||||
selenium==4.4.3
|
||||
selenium==4.7.2
|
||||
func-timeout==4.3.5
|
||||
# required by undetected_chromedriver
|
||||
requests==2.28.1
|
||||
websockets==10.3
|
||||
websockets==10.4
|
||||
# only required for linux
|
||||
xvfbwrapper==0.2.9
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import logging
|
||||
import sys
|
||||
import time
|
||||
from urllib.parse import unquote
|
||||
|
||||
@@ -13,11 +14,19 @@ from dtos import V1RequestBase, V1ResponseBase, ChallengeResolutionT, ChallengeR
|
||||
HealthResponse, STATUS_OK, STATUS_ERROR
|
||||
import utils
|
||||
|
||||
ACCESS_DENIED_TITLES = [
|
||||
# Cloudflare
|
||||
'Access denied',
|
||||
# Cloudflare http://bitturk.net/ Firefox
|
||||
'Attention Required! | Cloudflare'
|
||||
]
|
||||
ACCESS_DENIED_SELECTORS = [
|
||||
# Cloudflare
|
||||
'div.cf-error-title span.cf-code-label span'
|
||||
'div.cf-error-title span.cf-code-label span',
|
||||
# Cloudflare http://bitturk.net/ Firefox
|
||||
'#cf-error-details div.cf-error-overview h1'
|
||||
]
|
||||
CHALLENGE_TITLE = [
|
||||
CHALLENGE_TITLES = [
|
||||
# Cloudflare
|
||||
'Just a moment...',
|
||||
# DDoS-GUARD
|
||||
@@ -34,6 +43,21 @@ SHORT_TIMEOUT = 10
|
||||
|
||||
def test_browser_installation():
|
||||
logging.info("Testing web browser installation...")
|
||||
|
||||
chrome_exe_path = utils.get_chrome_exe_path()
|
||||
if chrome_exe_path is None:
|
||||
logging.error("Chrome / Chromium web browser not installed!")
|
||||
sys.exit(1)
|
||||
else:
|
||||
logging.info("Chrome / Chromium path: " + chrome_exe_path)
|
||||
|
||||
chrome_major_version = utils.get_chrome_major_version()
|
||||
if chrome_major_version == '':
|
||||
logging.error("Chrome / Chromium version not detected!")
|
||||
sys.exit(1)
|
||||
else:
|
||||
logging.info("Chrome / Chromium major version: " + chrome_major_version)
|
||||
|
||||
user_agent = utils.get_user_agent()
|
||||
logging.info("FlareSolverr User-Agent: " + user_agent)
|
||||
logging.info("Test successful")
|
||||
@@ -172,7 +196,13 @@ def _evil_logic(req: V1RequestBase, driver: WebDriver, method: str) -> Challenge
|
||||
|
||||
# wait for the page
|
||||
html_element = driver.find_element(By.TAG_NAME, "html")
|
||||
page_title = driver.title
|
||||
|
||||
# find access denied titles
|
||||
for title in ACCESS_DENIED_TITLES:
|
||||
if title == page_title:
|
||||
raise Exception('Cloudflare has blocked this request. '
|
||||
'Probably your IP is banned for this site, check in your web browser.')
|
||||
# find access denied selectors
|
||||
for selector in ACCESS_DENIED_SELECTORS:
|
||||
found_elements = driver.find_elements(By.CSS_SELECTOR, selector)
|
||||
@@ -182,8 +212,7 @@ def _evil_logic(req: V1RequestBase, driver: WebDriver, method: str) -> Challenge
|
||||
|
||||
# find challenge by title
|
||||
challenge_found = False
|
||||
page_title = driver.title
|
||||
for title in CHALLENGE_TITLE:
|
||||
for title in CHALLENGE_TITLES:
|
||||
if title == page_title:
|
||||
challenge_found = True
|
||||
logging.info("Challenge detected. Title found: " + title)
|
||||
@@ -200,8 +229,8 @@ def _evil_logic(req: V1RequestBase, driver: WebDriver, method: str) -> Challenge
|
||||
if challenge_found:
|
||||
while True:
|
||||
try:
|
||||
# wait until the title change
|
||||
for title in CHALLENGE_TITLE:
|
||||
# wait until the title changes
|
||||
for title in CHALLENGE_TITLES:
|
||||
logging.debug("Waiting for title: " + title)
|
||||
WebDriverWait(driver, SHORT_TIMEOUT).until_not(title_is(title))
|
||||
|
||||
|
||||
@@ -1,7 +1,4 @@
|
||||
#!/usr/bin/env python3
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
|
||||
"""
|
||||
|
||||
@@ -17,33 +14,38 @@ Y88b. 888 888 888 Y88..88P 888 888 888 Y8b. Y88b 888 888 888 Y
|
||||
by UltrafunkAmsterdam (https://github.com/ultrafunkamsterdam)
|
||||
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
|
||||
__version__ = "3.1.5r4"
|
||||
|
||||
__version__ = "3.4.6"
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import time
|
||||
import inspect
|
||||
import threading
|
||||
from weakref import finalize
|
||||
|
||||
import selenium.webdriver.chrome.service
|
||||
import selenium.webdriver.chrome.webdriver
|
||||
from selenium.webdriver.common.by import By
|
||||
import selenium.webdriver.common.service
|
||||
import selenium.webdriver.remote.command
|
||||
import selenium.webdriver.remote.webdriver
|
||||
|
||||
from .cdp import CDP
|
||||
from .dprocess import start_detached
|
||||
from .options import ChromeOptions
|
||||
from .patcher import IS_POSIX
|
||||
from .patcher import Patcher
|
||||
from .reactor import Reactor
|
||||
from .dprocess import start_detached
|
||||
from .webelement import UCWebElement
|
||||
from .webelement import WebElement
|
||||
|
||||
|
||||
__all__ = (
|
||||
"Chrome",
|
||||
@@ -108,6 +110,7 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
port=0,
|
||||
enable_cdp_events=False,
|
||||
service_args=None,
|
||||
service_creationflags=None,
|
||||
desired_capabilities=None,
|
||||
advanced_elements=False,
|
||||
service_log_path=None,
|
||||
@@ -119,8 +122,9 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
suppress_welcome=True,
|
||||
use_subprocess=False,
|
||||
debug=False,
|
||||
no_sandbox=True,
|
||||
windows_headless=False,
|
||||
**kw
|
||||
**kw,
|
||||
):
|
||||
"""
|
||||
Creates a new instance of the chrome driver.
|
||||
@@ -147,7 +151,9 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
If not specified, make sure the executable's folder is in $PATH
|
||||
|
||||
port: int, optional, default: 0
|
||||
port you would like the service to run, if left as 0, a free port will be found.
|
||||
port to be used by the chromedriver executable, this is NOT the debugger port.
|
||||
leave it at 0 unless you know what you are doing.
|
||||
the default value of 0 automatically picks an available port.
|
||||
|
||||
enable_cdp_events: bool, default: False
|
||||
:: currently for chrome only
|
||||
@@ -207,11 +213,12 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
now, in case you are nag-fetishist, or a diagnostics data feeder to google, you can set this to False.
|
||||
Note: if you don't handle the nag screen in time, the browser loses it's connection and throws an Exception.
|
||||
|
||||
use_subprocess: bool, optional , default: False,
|
||||
use_subprocess: bool, optional , default: True,
|
||||
|
||||
False (the default) makes sure Chrome will get it's own process (so no subprocess of chromedriver.exe or python
|
||||
This fixes a LOT of issues, like multithreaded run, but mst importantly. shutting corectly after
|
||||
program exits or using .quit()
|
||||
you should be knowing what you're doing, and know how python works.
|
||||
|
||||
unfortunately, there is always an edge case in which one would like to write an single script with the only contents being:
|
||||
--start script--
|
||||
@@ -224,19 +231,24 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
in that case you can set this to `True`. The browser will start via subprocess, and will keep running most of times.
|
||||
! setting it to True comes with NO support when being detected. !
|
||||
|
||||
no_sandbox: bool, optional, default=True
|
||||
uses the --no-sandbox option, and additionally does suppress the "unsecure option" status bar
|
||||
this option has a default of True since many people seem to run this as root (....) , and chrome does not start
|
||||
when running as root without using --no-sandbox flag.
|
||||
"""
|
||||
|
||||
finalize(self, self._ensure_close, self)
|
||||
self.debug = debug
|
||||
patcher = Patcher(
|
||||
self.patcher = Patcher(
|
||||
executable_path=driver_executable_path,
|
||||
force=patcher_force_close,
|
||||
version_main=version_main,
|
||||
)
|
||||
patcher.auto()
|
||||
self.patcher = patcher
|
||||
self.patcher.auto()
|
||||
# self.patcher = patcher
|
||||
if not options:
|
||||
options = ChromeOptions()
|
||||
|
||||
|
||||
try:
|
||||
if hasattr(options, "_session") and options._session is not None:
|
||||
# prevent reuse of options,
|
||||
@@ -248,11 +260,17 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
|
||||
options._session = self
|
||||
|
||||
debug_port = selenium.webdriver.common.service.utils.free_port()
|
||||
debug_host = "127.0.0.1"
|
||||
|
||||
if not options.debugger_address:
|
||||
debug_port = (
|
||||
port
|
||||
if port != 0
|
||||
else selenium.webdriver.common.service.utils.free_port()
|
||||
)
|
||||
debug_host = "127.0.0.1"
|
||||
options.debugger_address = "%s:%d" % (debug_host, debug_port)
|
||||
else:
|
||||
debug_host, debug_port = options.debugger_address.split(":")
|
||||
debug_port = int(debug_port)
|
||||
|
||||
if enable_cdp_events:
|
||||
options.set_capability(
|
||||
@@ -263,13 +281,17 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
options.add_argument("--remote-debugging-port=%s" % debug_port)
|
||||
|
||||
if user_data_dir:
|
||||
options.add_argument('--user-data-dir=%s' % user_data_dir)
|
||||
options.add_argument("--user-data-dir=%s" % user_data_dir)
|
||||
|
||||
language, keep_user_data_dir = None, bool(user_data_dir)
|
||||
|
||||
# see if a custom user profile is specified in options
|
||||
for arg in options.arguments:
|
||||
|
||||
if any([_ in arg for _ in ("--headless", "headless")]):
|
||||
options.arguments.remove(arg)
|
||||
options.headless = True
|
||||
|
||||
if "lang" in arg:
|
||||
m = re.search("(?:--)?lang(?:[ =])?(.*)", arg)
|
||||
try:
|
||||
@@ -294,7 +316,6 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
)
|
||||
|
||||
if not user_data_dir:
|
||||
|
||||
# backward compatiblity
|
||||
# check if an old uc.ChromeOptions is used, and extract the user data dir
|
||||
|
||||
@@ -347,8 +368,15 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
|
||||
if suppress_welcome:
|
||||
options.arguments.extend(["--no-default-browser-check", "--no-first-run"])
|
||||
if no_sandbox:
|
||||
options.arguments.extend(["--no-sandbox", "--test-type"])
|
||||
|
||||
if headless or options.headless:
|
||||
options.headless = True
|
||||
if self.patcher.version_main < 108:
|
||||
options.add_argument("--headless=chrome")
|
||||
elif self.patcher.version_main >= 108:
|
||||
options.add_argument("--headless=new")
|
||||
|
||||
options.add_argument("--window-size=1920,1080")
|
||||
options.add_argument("--start-maximized")
|
||||
options.add_argument("--no-sandbox")
|
||||
@@ -360,7 +388,7 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
or divmod(logging.getLogger().getEffectiveLevel(), 10)[0]
|
||||
)
|
||||
|
||||
if hasattr(options, 'handle_prefs'):
|
||||
if hasattr(options, "handle_prefs"):
|
||||
options.handle_prefs(user_data_dir)
|
||||
|
||||
# fix exit_type flag to prevent tab-restore nag
|
||||
@@ -376,6 +404,7 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
config["profile"]["exit_type"] = None
|
||||
fs.seek(0, 0)
|
||||
json.dump(config, fs)
|
||||
fs.truncate() # the file might be shorter
|
||||
logger.debug("fixed exit_type flag")
|
||||
except Exception as e:
|
||||
logger.debug("did not find a bad exit_type flag ")
|
||||
@@ -403,14 +432,26 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
)
|
||||
self.browser_pid = browser.pid
|
||||
|
||||
if service_creationflags:
|
||||
service = selenium.webdriver.common.service.Service(
|
||||
self.patcher.executable_path, port, service_args, service_log_path
|
||||
)
|
||||
for attr_name in ("creationflags", "creation_flags"):
|
||||
if hasattr(service, attr_name):
|
||||
setattr(service, attr_name, service_creationflags)
|
||||
break
|
||||
else:
|
||||
service = None
|
||||
|
||||
super(Chrome, self).__init__(
|
||||
executable_path=patcher.executable_path,
|
||||
executable_path=self.patcher.executable_path,
|
||||
port=port,
|
||||
options=options,
|
||||
service_args=service_args,
|
||||
desired_capabilities=desired_capabilities,
|
||||
service_log_path=service_log_path,
|
||||
keep_alive=keep_alive,
|
||||
service=service, # needed or the service will be re-created
|
||||
)
|
||||
|
||||
self.reactor = None
|
||||
@@ -425,35 +466,14 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
self.reactor = reactor
|
||||
|
||||
if advanced_elements:
|
||||
from .webelement import WebElement
|
||||
|
||||
self._web_element_cls = UCWebElement
|
||||
else:
|
||||
self._web_element_cls = WebElement
|
||||
|
||||
if options.headless:
|
||||
self._configure_headless()
|
||||
|
||||
def __getattribute__(self, item):
|
||||
|
||||
if not super().__getattribute__("debug"):
|
||||
return super().__getattribute__(item)
|
||||
else:
|
||||
import inspect
|
||||
|
||||
original = super().__getattribute__(item)
|
||||
if inspect.ismethod(original) and not inspect.isclass(original):
|
||||
|
||||
def newfunc(*args, **kwargs):
|
||||
logger.debug(
|
||||
"calling %s with args %s and kwargs %s\n"
|
||||
% (original.__qualname__, args, kwargs)
|
||||
)
|
||||
return original(*args, **kwargs)
|
||||
|
||||
return newfunc
|
||||
return original
|
||||
|
||||
def _configure_headless(self):
|
||||
|
||||
orig_get = self.get
|
||||
logger.info("setting properties for headless")
|
||||
|
||||
@@ -465,18 +485,18 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
{
|
||||
"source": """
|
||||
|
||||
Object.defineProperty(window, 'navigator', {
|
||||
Object.defineProperty(window, "navigator", {
|
||||
Object.defineProperty(window, "navigator", {
|
||||
value: new Proxy(navigator, {
|
||||
has: (target, key) => (key === 'webdriver' ? false : key in target),
|
||||
has: (target, key) => (key === "webdriver" ? false : key in target),
|
||||
get: (target, key) =>
|
||||
key === 'webdriver' ?
|
||||
false :
|
||||
typeof target[key] === 'function' ?
|
||||
target[key].bind(target) :
|
||||
target[key]
|
||||
})
|
||||
key === "webdriver"
|
||||
? false
|
||||
: typeof target[key] === "function"
|
||||
? target[key].bind(target)
|
||||
: target[key],
|
||||
}),
|
||||
});
|
||||
|
||||
"""
|
||||
},
|
||||
)
|
||||
@@ -494,49 +514,139 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
"Page.addScriptToEvaluateOnNewDocument",
|
||||
{
|
||||
"source": """
|
||||
Object.defineProperty(navigator, 'maxTouchPoints', {
|
||||
get: () => 1
|
||||
})"""
|
||||
Object.defineProperty(navigator, 'maxTouchPoints', {get: () => 1});
|
||||
Object.defineProperty(navigator.connection, 'rtt', {get: () => 100});
|
||||
|
||||
// https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/evasions/chrome-runtime.js
|
||||
window.chrome = {
|
||||
app: {
|
||||
isInstalled: false,
|
||||
InstallState: {
|
||||
DISABLED: 'disabled',
|
||||
INSTALLED: 'installed',
|
||||
NOT_INSTALLED: 'not_installed'
|
||||
},
|
||||
RunningState: {
|
||||
CANNOT_RUN: 'cannot_run',
|
||||
READY_TO_RUN: 'ready_to_run',
|
||||
RUNNING: 'running'
|
||||
}
|
||||
},
|
||||
runtime: {
|
||||
OnInstalledReason: {
|
||||
CHROME_UPDATE: 'chrome_update',
|
||||
INSTALL: 'install',
|
||||
SHARED_MODULE_UPDATE: 'shared_module_update',
|
||||
UPDATE: 'update'
|
||||
},
|
||||
OnRestartRequiredReason: {
|
||||
APP_UPDATE: 'app_update',
|
||||
OS_UPDATE: 'os_update',
|
||||
PERIODIC: 'periodic'
|
||||
},
|
||||
PlatformArch: {
|
||||
ARM: 'arm',
|
||||
ARM64: 'arm64',
|
||||
MIPS: 'mips',
|
||||
MIPS64: 'mips64',
|
||||
X86_32: 'x86-32',
|
||||
X86_64: 'x86-64'
|
||||
},
|
||||
PlatformNaclArch: {
|
||||
ARM: 'arm',
|
||||
MIPS: 'mips',
|
||||
MIPS64: 'mips64',
|
||||
X86_32: 'x86-32',
|
||||
X86_64: 'x86-64'
|
||||
},
|
||||
PlatformOs: {
|
||||
ANDROID: 'android',
|
||||
CROS: 'cros',
|
||||
LINUX: 'linux',
|
||||
MAC: 'mac',
|
||||
OPENBSD: 'openbsd',
|
||||
WIN: 'win'
|
||||
},
|
||||
RequestUpdateCheckStatus: {
|
||||
NO_UPDATE: 'no_update',
|
||||
THROTTLED: 'throttled',
|
||||
UPDATE_AVAILABLE: 'update_available'
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/evasions/navigator-permissions.js
|
||||
if (!window.Notification) {
|
||||
window.Notification = {
|
||||
permission: 'denied'
|
||||
}
|
||||
}
|
||||
|
||||
const originalQuery = window.navigator.permissions.query
|
||||
window.navigator.permissions.__proto__.query = parameters =>
|
||||
parameters.name === 'notifications'
|
||||
? Promise.resolve({ state: window.Notification.permission })
|
||||
: originalQuery(parameters)
|
||||
|
||||
const oldCall = Function.prototype.call
|
||||
function call() {
|
||||
return oldCall.apply(this, arguments)
|
||||
}
|
||||
Function.prototype.call = call
|
||||
|
||||
const nativeToStringFunctionString = Error.toString().replace(/Error/g, 'toString')
|
||||
const oldToString = Function.prototype.toString
|
||||
|
||||
function functionToString() {
|
||||
if (this === window.navigator.permissions.query) {
|
||||
return 'function query() { [native code] }'
|
||||
}
|
||||
if (this === functionToString) {
|
||||
return nativeToStringFunctionString
|
||||
}
|
||||
return oldCall.call(oldToString, this)
|
||||
}
|
||||
// eslint-disable-next-line
|
||||
Function.prototype.toString = functionToString
|
||||
"""
|
||||
},
|
||||
)
|
||||
return orig_get(*args, **kwargs)
|
||||
|
||||
self.get = get_wrapped
|
||||
|
||||
def __dir__(self):
|
||||
return object.__dir__(self)
|
||||
|
||||
def _get_cdc_props(self):
|
||||
return self.execute_script(
|
||||
"""
|
||||
let objectToInspect = window,
|
||||
result = [];
|
||||
while(objectToInspect !== null)
|
||||
{ result = result.concat(Object.getOwnPropertyNames(objectToInspect));
|
||||
objectToInspect = Object.getPrototypeOf(objectToInspect); }
|
||||
return result.filter(i => i.match(/.+_.+_(Array|Promise|Symbol)/ig))
|
||||
"""
|
||||
)
|
||||
|
||||
def _hook_remove_cdc_props(self):
|
||||
self.execute_cdp_cmd(
|
||||
"Page.addScriptToEvaluateOnNewDocument",
|
||||
{
|
||||
"source": """
|
||||
let objectToInspect = window,
|
||||
result = [];
|
||||
while(objectToInspect !== null)
|
||||
{ result = result.concat(Object.getOwnPropertyNames(objectToInspect));
|
||||
objectToInspect = Object.getPrototypeOf(objectToInspect); }
|
||||
result.forEach(p => p.match(/.+_.+_(Array|Promise|Symbol)/ig)
|
||||
&&delete window[p]&&console.log('removed',p))
|
||||
"""
|
||||
},
|
||||
)
|
||||
# def _get_cdc_props(self):
|
||||
# return self.execute_script(
|
||||
# """
|
||||
# let objectToInspect = window,
|
||||
# result = [];
|
||||
# while(objectToInspect !== null)
|
||||
# { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
|
||||
# objectToInspect = Object.getPrototypeOf(objectToInspect); }
|
||||
#
|
||||
# return result.filter(i => i.match(/^([a-zA-Z]){27}(Array|Promise|Symbol)$/ig))
|
||||
# """
|
||||
# )
|
||||
#
|
||||
# def _hook_remove_cdc_props(self):
|
||||
# self.execute_cdp_cmd(
|
||||
# "Page.addScriptToEvaluateOnNewDocument",
|
||||
# {
|
||||
# "source": """
|
||||
# let objectToInspect = window,
|
||||
# result = [];
|
||||
# while(objectToInspect !== null)
|
||||
# { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
|
||||
# objectToInspect = Object.getPrototypeOf(objectToInspect); }
|
||||
# result.forEach(p => p.match(/^([a-zA-Z]){27}(Array|Promise|Symbol)$/ig)
|
||||
# &&delete window[p]&&console.log('removed',p))
|
||||
# """
|
||||
# },
|
||||
# )
|
||||
|
||||
def get(self, url):
|
||||
if self._get_cdc_props():
|
||||
self._hook_remove_cdc_props()
|
||||
# if self._get_cdc_props():
|
||||
# self._hook_remove_cdc_props()
|
||||
return super().get(url)
|
||||
|
||||
def add_cdp_listener(self, event_name, callback):
|
||||
@@ -553,6 +663,11 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
if self.reactor and isinstance(self.reactor, Reactor):
|
||||
self.reactor.handlers.clear()
|
||||
|
||||
def window_new(self):
|
||||
self.execute(
|
||||
selenium.webdriver.remote.command.Command.NEW_WINDOW, {"type": "window"}
|
||||
)
|
||||
|
||||
def tab_new(self, url: str):
|
||||
"""
|
||||
this opens a url in a new tab.
|
||||
@@ -597,24 +712,22 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
# super(Chrome, self).start_session(capabilities, browser_profile)
|
||||
|
||||
def quit(self):
|
||||
logger.debug("closing webdriver")
|
||||
if hasattr(self, "service") and getattr(self.service, "process", None):
|
||||
try:
|
||||
self.service.process.kill()
|
||||
self.service.process.wait(5)
|
||||
logger.debug("webdriver process ended")
|
||||
except (AttributeError, RuntimeError, OSError):
|
||||
pass
|
||||
try:
|
||||
if self.reactor and isinstance(self.reactor, Reactor):
|
||||
logger.debug("shutting down reactor")
|
||||
self.reactor.event.set()
|
||||
except Exception: # noqa
|
||||
logger.debug("shutting down reactor")
|
||||
except AttributeError:
|
||||
pass
|
||||
try:
|
||||
logger.debug("killing browser")
|
||||
os.kill(self.browser_pid, 15)
|
||||
|
||||
except TimeoutError as e:
|
||||
logger.debug("gracefully closed browser")
|
||||
except Exception as e: # noqa
|
||||
logger.debug(e, exc_info=True)
|
||||
except Exception: # noqa
|
||||
pass
|
||||
|
||||
if (
|
||||
hasattr(self, "keep_user_data_dir")
|
||||
and hasattr(self, "user_data_dir")
|
||||
@@ -622,7 +735,6 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
):
|
||||
for _ in range(5):
|
||||
try:
|
||||
|
||||
shutil.rmtree(self.user_data_dir, ignore_errors=False)
|
||||
except FileNotFoundError:
|
||||
pass
|
||||
@@ -640,13 +752,24 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
# this must come last, otherwise it will throw 'in use' errors
|
||||
self.patcher = None
|
||||
|
||||
def __del__(self):
|
||||
try:
|
||||
super().quit()
|
||||
# self.service.process.kill()
|
||||
except: # noqa
|
||||
pass
|
||||
self.quit()
|
||||
def __getattribute__(self, item):
|
||||
if not super().__getattribute__("debug"):
|
||||
return super().__getattribute__(item)
|
||||
else:
|
||||
import inspect
|
||||
|
||||
original = super().__getattribute__(item)
|
||||
if inspect.ismethod(original) and not inspect.isclass(original):
|
||||
|
||||
def newfunc(*args, **kwargs):
|
||||
logger.debug(
|
||||
"calling %s with args %s and kwargs %s\n"
|
||||
% (original.__qualname__, args, kwargs)
|
||||
)
|
||||
return original(*args, **kwargs)
|
||||
|
||||
return newfunc
|
||||
return original
|
||||
|
||||
def __enter__(self):
|
||||
return self
|
||||
@@ -660,6 +783,27 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
|
||||
def __hash__(self):
|
||||
return hash(self.options.debugger_address)
|
||||
|
||||
def __dir__(self):
|
||||
return object.__dir__(self)
|
||||
|
||||
def __del__(self):
|
||||
try:
|
||||
self.service.process.kill()
|
||||
except: # noqa
|
||||
pass
|
||||
self.quit()
|
||||
|
||||
@classmethod
|
||||
def _ensure_close(cls, self):
|
||||
# needs to be a classmethod so finalize can find the reference
|
||||
logger.info("ensuring close")
|
||||
if (
|
||||
hasattr(self, "service")
|
||||
and hasattr(self.service, "process")
|
||||
and hasattr(self.service.process, "kill")
|
||||
):
|
||||
self.service.process.kill()
|
||||
|
||||
|
||||
def find_chrome_executable():
|
||||
"""
|
||||
@@ -691,8 +835,10 @@ def find_chrome_executable():
|
||||
)
|
||||
else:
|
||||
for item in map(
|
||||
os.environ.get, ("PROGRAMFILES", "PROGRAMFILES(X86)", "LOCALAPPDATA")
|
||||
os.environ.get,
|
||||
("PROGRAMFILES", "PROGRAMFILES(X86)", "LOCALAPPDATA", "PROGRAMW6432"),
|
||||
):
|
||||
if item is not None:
|
||||
for subitem in (
|
||||
"Google/Chrome/Application",
|
||||
"Google/Chrome Beta/Application",
|
||||
|
||||
@@ -1,259 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
# this module is part of undetected_chromedriver
|
||||
|
||||
|
||||
"""
|
||||
|
||||
888 888 d8b
|
||||
888 888 Y8P
|
||||
888 888
|
||||
.d8888b 88888b. 888d888 .d88b. 88888b.d88b. .d88b. .d88888 888d888 888 888 888 .d88b. 888d888
|
||||
d88P" 888 "88b 888P" d88""88b 888 "888 "88b d8P Y8b d88" 888 888P" 888 888 888 d8P Y8b 888P"
|
||||
888 888 888 888 888 888 888 888 888 88888888 888 888 888 888 Y88 88P 88888888 888
|
||||
Y88b. 888 888 888 Y88..88P 888 888 888 Y8b. Y88b 888 888 888 Y8bd8P Y8b. 888
|
||||
"Y8888P 888 888 888 "Y88P" 888 888 888 "Y8888 "Y88888 888 888 Y88P "Y8888 888 88888888
|
||||
|
||||
by UltrafunkAmsterdam (https://github.com/ultrafunkamsterdam)
|
||||
|
||||
"""
|
||||
|
||||
import io
|
||||
import logging
|
||||
import os
|
||||
import random
|
||||
import re
|
||||
import string
|
||||
import sys
|
||||
import zipfile
|
||||
from distutils.version import LooseVersion
|
||||
from urllib.request import urlopen, urlretrieve
|
||||
|
||||
from selenium.webdriver import Chrome as _Chrome, ChromeOptions as _ChromeOptions
|
||||
|
||||
TARGET_VERSION = 0
|
||||
logger = logging.getLogger("uc")
|
||||
|
||||
|
||||
class Chrome:
|
||||
def __new__(cls, *args, emulate_touch=False, **kwargs):
|
||||
|
||||
if not ChromeDriverManager.installed:
|
||||
ChromeDriverManager(*args, **kwargs).install()
|
||||
if not ChromeDriverManager.selenium_patched:
|
||||
ChromeDriverManager(*args, **kwargs).patch_selenium_webdriver()
|
||||
if not kwargs.get("executable_path"):
|
||||
kwargs["executable_path"] = "./{}".format(
|
||||
ChromeDriverManager(*args, **kwargs).executable_path
|
||||
)
|
||||
if not kwargs.get("options"):
|
||||
kwargs["options"] = ChromeOptions()
|
||||
instance = object.__new__(_Chrome)
|
||||
instance.__init__(*args, **kwargs)
|
||||
|
||||
instance._orig_get = instance.get
|
||||
|
||||
def _get_wrapped(*args, **kwargs):
|
||||
if instance.execute_script("return navigator.webdriver"):
|
||||
instance.execute_cdp_cmd(
|
||||
"Page.addScriptToEvaluateOnNewDocument",
|
||||
{
|
||||
"source": """
|
||||
|
||||
Object.defineProperty(window, 'navigator', {
|
||||
value: new Proxy(navigator, {
|
||||
has: (target, key) => (key === 'webdriver' ? false : key in target),
|
||||
get: (target, key) =>
|
||||
key === 'webdriver'
|
||||
? undefined
|
||||
: typeof target[key] === 'function'
|
||||
? target[key].bind(target)
|
||||
: target[key]
|
||||
})
|
||||
});
|
||||
|
||||
|
||||
"""
|
||||
},
|
||||
)
|
||||
return instance._orig_get(*args, **kwargs)
|
||||
|
||||
instance.get = _get_wrapped
|
||||
instance.get = _get_wrapped
|
||||
instance.get = _get_wrapped
|
||||
|
||||
original_user_agent_string = instance.execute_script(
|
||||
"return navigator.userAgent"
|
||||
)
|
||||
instance.execute_cdp_cmd(
|
||||
"Network.setUserAgentOverride",
|
||||
{
|
||||
"userAgent": original_user_agent_string.replace("Headless", ""),
|
||||
},
|
||||
)
|
||||
if emulate_touch:
|
||||
instance.execute_cdp_cmd(
|
||||
"Page.addScriptToEvaluateOnNewDocument",
|
||||
{
|
||||
"source": """
|
||||
Object.defineProperty(navigator, 'maxTouchPoints', {
|
||||
get: () => 1
|
||||
})"""
|
||||
},
|
||||
)
|
||||
logger.info(f"starting undetected_chromedriver.Chrome({args}, {kwargs})")
|
||||
return instance
|
||||
|
||||
|
||||
class ChromeOptions:
|
||||
def __new__(cls, *args, **kwargs):
|
||||
if not ChromeDriverManager.installed:
|
||||
ChromeDriverManager(*args, **kwargs).install()
|
||||
if not ChromeDriverManager.selenium_patched:
|
||||
ChromeDriverManager(*args, **kwargs).patch_selenium_webdriver()
|
||||
|
||||
instance = object.__new__(_ChromeOptions)
|
||||
instance.__init__()
|
||||
instance.add_argument("start-maximized")
|
||||
instance.add_experimental_option("excludeSwitches", ["enable-automation"])
|
||||
instance.add_argument("--disable-blink-features=AutomationControlled")
|
||||
return instance
|
||||
|
||||
|
||||
class ChromeDriverManager(object):
|
||||
installed = False
|
||||
selenium_patched = False
|
||||
target_version = None
|
||||
|
||||
DL_BASE = "https://chromedriver.storage.googleapis.com/"
|
||||
|
||||
def __init__(self, executable_path=None, target_version=None, *args, **kwargs):
|
||||
|
||||
_platform = sys.platform
|
||||
|
||||
if TARGET_VERSION:
|
||||
# use global if set
|
||||
self.target_version = TARGET_VERSION
|
||||
|
||||
if target_version:
|
||||
# use explicitly passed target
|
||||
self.target_version = target_version # user override
|
||||
|
||||
if not self.target_version:
|
||||
# none of the above (default) and just get current version
|
||||
self.target_version = self.get_release_version_number().version[
|
||||
0
|
||||
] # only major version int
|
||||
|
||||
self._base = base_ = "chromedriver{}"
|
||||
|
||||
exe_name = self._base
|
||||
if _platform in ("win32",):
|
||||
exe_name = base_.format(".exe")
|
||||
if _platform in ("linux",):
|
||||
_platform += "64"
|
||||
exe_name = exe_name.format("")
|
||||
if _platform in ("darwin",):
|
||||
_platform = "mac64"
|
||||
exe_name = exe_name.format("")
|
||||
self.platform = _platform
|
||||
self.executable_path = executable_path or exe_name
|
||||
self._exe_name = exe_name
|
||||
|
||||
def patch_selenium_webdriver(self_):
|
||||
"""
|
||||
Patches selenium package Chrome, ChromeOptions classes for current session
|
||||
|
||||
:return:
|
||||
"""
|
||||
import selenium.webdriver.chrome.service
|
||||
import selenium.webdriver
|
||||
|
||||
selenium.webdriver.Chrome = Chrome
|
||||
selenium.webdriver.ChromeOptions = ChromeOptions
|
||||
logger.info("Selenium patched. Safe to import Chrome / ChromeOptions")
|
||||
self_.__class__.selenium_patched = True
|
||||
|
||||
def install(self, patch_selenium=True):
|
||||
"""
|
||||
Initialize the patch
|
||||
|
||||
This will:
|
||||
download chromedriver if not present
|
||||
patch the downloaded chromedriver
|
||||
patch selenium package if <patch_selenium> is True (default)
|
||||
|
||||
:param patch_selenium: patch selenium webdriver classes for Chrome and ChromeDriver (for current python session)
|
||||
:return:
|
||||
"""
|
||||
if not os.path.exists(self.executable_path):
|
||||
self.fetch_chromedriver()
|
||||
if not self.__class__.installed:
|
||||
if self.patch_binary():
|
||||
self.__class__.installed = True
|
||||
|
||||
if patch_selenium:
|
||||
self.patch_selenium_webdriver()
|
||||
|
||||
def get_release_version_number(self):
|
||||
"""
|
||||
Gets the latest major version available, or the latest major version of self.target_version if set explicitly.
|
||||
|
||||
:return: version string
|
||||
"""
|
||||
path = (
|
||||
"LATEST_RELEASE"
|
||||
if not self.target_version
|
||||
else f"LATEST_RELEASE_{self.target_version}"
|
||||
)
|
||||
return LooseVersion(urlopen(self.__class__.DL_BASE + path).read().decode())
|
||||
|
||||
def fetch_chromedriver(self):
|
||||
"""
|
||||
Downloads ChromeDriver from source and unpacks the executable
|
||||
|
||||
:return: on success, name of the unpacked executable
|
||||
"""
|
||||
base_ = self._base
|
||||
zip_name = base_.format(".zip")
|
||||
ver = self.get_release_version_number().vstring
|
||||
if os.path.exists(self.executable_path):
|
||||
return self.executable_path
|
||||
urlretrieve(
|
||||
f"{self.__class__.DL_BASE}{ver}/{base_.format(f'_{self.platform}')}.zip",
|
||||
filename=zip_name,
|
||||
)
|
||||
with zipfile.ZipFile(zip_name) as zf:
|
||||
zf.extract(self._exe_name)
|
||||
os.remove(zip_name)
|
||||
if sys.platform != "win32":
|
||||
os.chmod(self._exe_name, 0o755)
|
||||
return self._exe_name
|
||||
|
||||
@staticmethod
|
||||
def random_cdc():
|
||||
cdc = random.choices(string.ascii_lowercase, k=26)
|
||||
cdc[-6:-4] = map(str.upper, cdc[-6:-4])
|
||||
cdc[2] = cdc[0]
|
||||
cdc[3] = "_"
|
||||
return "".join(cdc).encode()
|
||||
|
||||
def patch_binary(self):
|
||||
"""
|
||||
Patches the ChromeDriver binary
|
||||
|
||||
:return: False on failure, binary name on success
|
||||
"""
|
||||
linect = 0
|
||||
replacement = self.random_cdc()
|
||||
with io.open(self.executable_path, "r+b") as fh:
|
||||
for line in iter(lambda: fh.readline(), b""):
|
||||
if b"cdc_" in line:
|
||||
fh.seek(-len(line), 1)
|
||||
newline = re.sub(b"cdc_.{22}", replacement, line)
|
||||
fh.write(newline)
|
||||
linect += 1
|
||||
return linect
|
||||
|
||||
|
||||
def install(executable_path=None, target_version=None, *args, **kwargs):
|
||||
ChromeDriverManager(executable_path, target_version, *args, **kwargs).install()
|
||||
@@ -3,11 +3,11 @@
|
||||
|
||||
import json
|
||||
import logging
|
||||
from collections.abc import Mapping, Sequence
|
||||
|
||||
import requests
|
||||
import websockets
|
||||
|
||||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
|
||||
@@ -1,17 +1,16 @@
|
||||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
import traceback
|
||||
from collections.abc import Mapping
|
||||
from collections.abc import Sequence
|
||||
from functools import wraps
|
||||
import logging
|
||||
import threading
|
||||
import time
|
||||
import traceback
|
||||
from typing import Any
|
||||
from typing import Awaitable
|
||||
from typing import Callable
|
||||
from typing import List
|
||||
from typing import Optional
|
||||
from contextlib import ExitStack
|
||||
import threading
|
||||
from functools import wraps, partial
|
||||
|
||||
|
||||
class Structure(dict):
|
||||
|
||||
@@ -1,13 +1,13 @@
|
||||
import atexit
|
||||
import logging
|
||||
import multiprocessing
|
||||
import os
|
||||
import platform
|
||||
import sys
|
||||
import signal
|
||||
from subprocess import PIPE
|
||||
from subprocess import Popen
|
||||
import atexit
|
||||
import traceback
|
||||
import logging
|
||||
import signal
|
||||
import sys
|
||||
|
||||
|
||||
CREATE_NEW_PROCESS_GROUP = 0x00000200
|
||||
DETACHED_PROCESS = 0x00000008
|
||||
@@ -27,12 +27,14 @@ def start_detached(executable, *args):
|
||||
reader, writer = multiprocessing.Pipe(False)
|
||||
|
||||
# do not keep reference
|
||||
multiprocessing.Process(
|
||||
process = multiprocessing.Process(
|
||||
target=_start_detached,
|
||||
args=(executable, *args),
|
||||
kwargs={"writer": writer},
|
||||
daemon=True,
|
||||
).start()
|
||||
)
|
||||
process.start()
|
||||
process.join()
|
||||
# receive pid from pipe
|
||||
pid = reader.recv()
|
||||
REGISTERED.append(pid)
|
||||
@@ -44,7 +46,6 @@ def start_detached(executable, *args):
|
||||
|
||||
|
||||
def _start_detached(executable, *args, writer: multiprocessing.Pipe = None):
|
||||
|
||||
# configure launch
|
||||
kwargs = {}
|
||||
if platform.system() == "Windows":
|
||||
|
||||
@@ -39,10 +39,23 @@ class ChromeOptions(_ChromiumOptions):
|
||||
value = ChromeOptions._undot_key(rest, value)
|
||||
return {key: value}
|
||||
|
||||
@staticmethod
|
||||
def _merge_nested(a, b):
|
||||
"""
|
||||
merges b into a
|
||||
leaf values in a are overwritten with values from b
|
||||
"""
|
||||
for key in b:
|
||||
if key in a:
|
||||
if isinstance(a[key], dict) and isinstance(b[key], dict):
|
||||
ChromeOptions._merge_nested(a[key], b[key])
|
||||
continue
|
||||
a[key] = b[key]
|
||||
return a
|
||||
|
||||
def handle_prefs(self, user_data_dir):
|
||||
prefs = self.experimental_options.get("prefs")
|
||||
if prefs:
|
||||
|
||||
user_data_dir = user_data_dir or self._user_data_dir
|
||||
default_path = os.path.join(user_data_dir, "Default")
|
||||
os.makedirs(default_path, exist_ok=True)
|
||||
@@ -50,12 +63,14 @@ class ChromeOptions(_ChromiumOptions):
|
||||
# undot prefs dict keys
|
||||
undot_prefs = {}
|
||||
for key, value in prefs.items():
|
||||
undot_prefs.update(self._undot_key(key, value))
|
||||
undot_prefs = self._merge_nested(
|
||||
undot_prefs, self._undot_key(key, value)
|
||||
)
|
||||
|
||||
prefs_file = os.path.join(default_path, "Preferences")
|
||||
if os.path.exists(prefs_file):
|
||||
with open(prefs_file, encoding="latin1", mode="r") as f:
|
||||
undot_prefs.update(json.load(f))
|
||||
undot_prefs = self._merge_nested(json.load(f), undot_prefs)
|
||||
|
||||
with open(prefs_file, encoding="latin1", mode="w") as f:
|
||||
json.dump(undot_prefs, f)
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
#!/usr/bin/env python3
|
||||
# this module is part of undetected_chromedriver
|
||||
|
||||
from distutils.version import LooseVersion
|
||||
import io
|
||||
import logging
|
||||
import os
|
||||
@@ -9,15 +10,14 @@ import re
|
||||
import string
|
||||
import sys
|
||||
import time
|
||||
from urllib.request import urlopen
|
||||
from urllib.request import urlretrieve
|
||||
import zipfile
|
||||
from distutils.version import LooseVersion
|
||||
from urllib.request import urlopen, urlretrieve
|
||||
import secrets
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
IS_POSIX = sys.platform.startswith(("darwin", "cygwin", "linux"))
|
||||
IS_POSIX = sys.platform.startswith(("darwin", "cygwin", "linux", "linux2"))
|
||||
|
||||
|
||||
class Patcher(object):
|
||||
@@ -29,7 +29,7 @@ class Patcher(object):
|
||||
if platform.endswith("win32"):
|
||||
zip_name %= "win32"
|
||||
exe_name %= ".exe"
|
||||
if platform.endswith("linux"):
|
||||
if platform.endswith(("linux", "linux2")):
|
||||
zip_name %= "linux64"
|
||||
exe_name %= ""
|
||||
if platform.endswith("darwin"):
|
||||
@@ -38,7 +38,9 @@ class Patcher(object):
|
||||
|
||||
if platform.endswith("win32"):
|
||||
d = "~/appdata/roaming/undetected_chromedriver"
|
||||
elif platform.startswith("linux"):
|
||||
elif "LAMBDA_TASK_ROOT" in os.environ:
|
||||
d = "/tmp/undetected_chromedriver"
|
||||
elif platform.startswith(("linux", "linux2")):
|
||||
d = "~/.local/share/undetected_chromedriver"
|
||||
elif platform.endswith("darwin"):
|
||||
d = "~/Library/Application Support/undetected_chromedriver"
|
||||
@@ -48,7 +50,6 @@ class Patcher(object):
|
||||
|
||||
def __init__(self, executable_path=None, force=False, version_main: int = 0):
|
||||
"""
|
||||
|
||||
Args:
|
||||
executable_path: None = automatic
|
||||
a full file path to the chromedriver executable
|
||||
@@ -57,10 +58,9 @@ class Patcher(object):
|
||||
version_main: 0 = auto
|
||||
specify main chrome version (rounded, ex: 82)
|
||||
"""
|
||||
|
||||
self.force = force
|
||||
self.executable_path = None
|
||||
prefix = secrets.token_hex(8)
|
||||
self._custom_exe_path = False
|
||||
prefix = "undetected"
|
||||
|
||||
if not os.path.exists(self.data_path):
|
||||
os.makedirs(self.data_path, exist_ok=True)
|
||||
@@ -82,8 +82,6 @@ class Patcher(object):
|
||||
os.path.join(".", self.executable_path)
|
||||
)
|
||||
|
||||
self._custom_exe_path = False
|
||||
|
||||
if executable_path:
|
||||
self._custom_exe_path = True
|
||||
self.executable_path = executable_path
|
||||
@@ -91,7 +89,6 @@ class Patcher(object):
|
||||
self.version_full = None
|
||||
|
||||
def auto(self, executable_path=None, force=False, version_main=None):
|
||||
""""""
|
||||
if executable_path:
|
||||
self.executable_path = executable_path
|
||||
self._custom_exe_path = True
|
||||
@@ -203,43 +200,46 @@ class Patcher(object):
|
||||
|
||||
@staticmethod
|
||||
def gen_random_cdc():
|
||||
cdc = random.choices(string.ascii_lowercase, k=26)
|
||||
cdc[-6:-4] = map(str.upper, cdc[-6:-4])
|
||||
cdc[2] = cdc[0]
|
||||
cdc[3] = "_"
|
||||
cdc = random.choices(string.ascii_letters, k=27)
|
||||
return "".join(cdc).encode()
|
||||
|
||||
def is_binary_patched(self, executable_path=None):
|
||||
"""simple check if executable is patched.
|
||||
|
||||
:return: False if not patched, else True
|
||||
"""
|
||||
executable_path = executable_path or self.executable_path
|
||||
try:
|
||||
with io.open(executable_path, "rb") as fh:
|
||||
for line in iter(lambda: fh.readline(), b""):
|
||||
if b"cdc_" in line:
|
||||
return fh.read().find(b"undetected chromedriver") != -1
|
||||
except FileNotFoundError:
|
||||
return False
|
||||
else:
|
||||
return True
|
||||
|
||||
def patch_exe(self):
|
||||
"""
|
||||
Patches the ChromeDriver binary
|
||||
|
||||
:return: False on failure, binary name on success
|
||||
"""
|
||||
start = time.perf_counter()
|
||||
logger.info("patching driver executable %s" % self.executable_path)
|
||||
|
||||
linect = 0
|
||||
replacement = self.gen_random_cdc()
|
||||
with io.open(self.executable_path, "r+b") as fh:
|
||||
for line in iter(lambda: fh.readline(), b""):
|
||||
if b"cdc_" in line:
|
||||
fh.seek(-len(line), 1)
|
||||
newline = re.sub(b"cdc_.{22}", replacement, line)
|
||||
fh.write(newline)
|
||||
linect += 1
|
||||
return linect
|
||||
content = fh.read()
|
||||
# match_injected_codeblock = re.search(rb"{window.*;}", content)
|
||||
match_injected_codeblock = re.search(rb"\{window\.cdc.*?;\}", content)
|
||||
if match_injected_codeblock:
|
||||
target_bytes = match_injected_codeblock[0]
|
||||
new_target_bytes = (
|
||||
b'{console.log("undetected chromedriver 1337!")}'.ljust(
|
||||
len(target_bytes), b" "
|
||||
)
|
||||
)
|
||||
new_content = content.replace(target_bytes, new_target_bytes)
|
||||
if new_content == content:
|
||||
logger.warning(
|
||||
"something went wrong patching the driver binary. could not find injection code block"
|
||||
)
|
||||
else:
|
||||
logger.debug(
|
||||
"found block:\n%s\nreplacing with:\n%s"
|
||||
% (target_bytes, new_target_bytes)
|
||||
)
|
||||
fh.seek(0)
|
||||
fh.write(new_content)
|
||||
logger.debug(
|
||||
"patching took us {:.2f} seconds".format(time.perf_counter() - start)
|
||||
)
|
||||
|
||||
def __repr__(self):
|
||||
return "{0:s}({1:s})".format(
|
||||
@@ -248,7 +248,6 @@ class Patcher(object):
|
||||
)
|
||||
|
||||
def __del__(self):
|
||||
|
||||
if self._custom_exe_path:
|
||||
# if the driver binary is specified by user
|
||||
# we assume it is important enough to not delete it
|
||||
|
||||
@@ -6,6 +6,7 @@ import json
|
||||
import logging
|
||||
import threading
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -63,9 +64,7 @@ class Reactor(threading.Thread):
|
||||
break
|
||||
|
||||
async def listen(self):
|
||||
|
||||
while self.running:
|
||||
|
||||
await self._wait_service_started()
|
||||
await asyncio.sleep(1)
|
||||
|
||||
@@ -74,9 +73,7 @@ class Reactor(threading.Thread):
|
||||
log_entries = self.driver.get_log("performance")
|
||||
|
||||
for entry in log_entries:
|
||||
|
||||
try:
|
||||
|
||||
obj_serialized: str = entry.get("message")
|
||||
obj = json.loads(obj_serialized)
|
||||
message = obj.get("message")
|
||||
|
||||
@@ -1,4 +0,0 @@
|
||||
# for backward compatibility
|
||||
import sys
|
||||
|
||||
sys.modules[__name__] = sys.modules[__package__]
|
||||
@@ -1,7 +1,30 @@
|
||||
from typing import List
|
||||
|
||||
from selenium.webdriver.common.by import By
|
||||
import selenium.webdriver.remote.webelement
|
||||
|
||||
|
||||
class WebElement(selenium.webdriver.remote.webelement.WebElement):
|
||||
def click_safe(self):
|
||||
super().click()
|
||||
self._parent.reconnect(0.1)
|
||||
|
||||
def children(
|
||||
self, tag=None, recursive=False
|
||||
) -> List[selenium.webdriver.remote.webelement.WebElement]:
|
||||
"""
|
||||
returns direct child elements of current element
|
||||
:param tag: str, if supplied, returns <tag> nodes only
|
||||
"""
|
||||
script = "return [... arguments[0].children]"
|
||||
if tag:
|
||||
script += ".filter( node => node.tagName === '%s')" % tag.upper()
|
||||
if recursive:
|
||||
return list(_recursive_children(self, tag))
|
||||
return list(self._parent.execute_script(script, self))
|
||||
|
||||
|
||||
class UCWebElement(WebElement):
|
||||
"""
|
||||
Custom WebElement class which makes it easier to view elements when
|
||||
working in an interactive environment.
|
||||
@@ -14,9 +37,13 @@ class WebElement(selenium.webdriver.remote.webelement.WebElement):
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, parent, id_):
|
||||
super().__init__(parent, id_)
|
||||
self._attrs = None
|
||||
|
||||
@property
|
||||
def attrs(self):
|
||||
if not hasattr(self, "_attrs"):
|
||||
if not self._attrs:
|
||||
self._attrs = self._parent.execute_script(
|
||||
"""
|
||||
var items = {};
|
||||
@@ -35,3 +62,25 @@ class WebElement(selenium.webdriver.remote.webelement.WebElement):
|
||||
if strattrs:
|
||||
strattrs = " " + strattrs
|
||||
return f"{self.__class__.__name__} <{self.tag_name}{strattrs}>"
|
||||
|
||||
|
||||
def _recursive_children(element, tag: str = None, _results=None):
|
||||
"""
|
||||
returns all children of <element> recursively
|
||||
|
||||
:param element: `WebElement` object.
|
||||
find children below this <element>
|
||||
|
||||
:param tag: str = None.
|
||||
if provided, return only <tag> elements. example: 'a', or 'img'
|
||||
:param _results: do not use!
|
||||
"""
|
||||
results = _results or set()
|
||||
for element in element.children():
|
||||
if tag:
|
||||
if element.tag_name == tag:
|
||||
results.add(element)
|
||||
else:
|
||||
results.add(element)
|
||||
results |= _recursive_children(element, tag, results)
|
||||
return results
|
||||
|
||||
@@ -44,6 +44,8 @@ def get_webdriver() -> WebDriver:
|
||||
# todo: this param shows a warning in chrome head-full
|
||||
options.add_argument('--disable-setuid-sandbox')
|
||||
options.add_argument('--disable-dev-shm-usage')
|
||||
# this option removes the zygote sandbox (it seems that the resolution is a bit faster)
|
||||
options.add_argument('--no-zygote')
|
||||
|
||||
# note: headless mode is detected (options.headless = True)
|
||||
# we launch the browser in head-full mode with the window hidden
|
||||
@@ -86,6 +88,10 @@ def get_webdriver() -> WebDriver:
|
||||
return driver
|
||||
|
||||
|
||||
def get_chrome_exe_path() -> str:
|
||||
return uc.find_chrome_executable()
|
||||
|
||||
|
||||
def get_chrome_major_version() -> str:
|
||||
global CHROME_MAJOR_VERSION
|
||||
if CHROME_MAJOR_VERSION is not None:
|
||||
@@ -110,7 +116,6 @@ def get_chrome_major_version() -> str:
|
||||
process.close()
|
||||
|
||||
CHROME_MAJOR_VERSION = complete_version.split('.')[0].split(' ')[-1]
|
||||
logging.info(f"Chrome major version: {CHROME_MAJOR_VERSION}")
|
||||
return CHROME_MAJOR_VERSION
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user