Compare commits

...

16 Commits

Author SHA1 Message Date
ilike2burnthing
6dc279a9d3 Bump version 3.0.3 2023-03-06 13:59:20 +00:00
Artemiy Ryabinkov
96fcd21174 Update undetected_chromedriver version to 3.4.6 (#715)
Co-authored-by: ilike2burnthing <59480337+ilike2burnthing@users.noreply.github.com>
2023-03-06 13:57:38 +00:00
ngosang
3a6e8e0f92 Update GitHub bug report template 2023-01-28 18:00:57 +01:00
ilike2burnthing
2d97f88276 Update README.md 2023-01-09 21:51:49 +00:00
ngosang
ac5c64319e Bump version 3.0.2 2023-01-08 20:48:20 +01:00
ngosang
c93834e2f0 Check Chrome / Chromium web browser is installed correctly 2023-01-08 20:46:11 +01:00
ngosang
e3b4200d94 Detect Cloudflare blocked access 2023-01-08 20:40:10 +01:00
ngosang
0941861f80 Update changelog 2023-01-06 18:49:05 +01:00
ngosang
8a10eb27a6 Bump version 3.0.1 2023-01-06 18:33:02 +01:00
ngosang
e9c08c84ef Update GitHub actions 2023-01-06 18:32:34 +01:00
ngosang
2aa1744476 Add more selectors to detect blocked access 2023-01-06 18:12:26 +01:00
ngosang
a89679a52d Disable Zygote sandbox in Chromium browser 2023-01-06 18:05:23 +01:00
ngosang
410ee7981f Apply undetected-chromedriver patches
* Hide Chrome window in Windows/NT
* Not use subprocess by default (independent process)
* Kill Chromium processes properly to avoid defunct/zombie processes
2023-01-06 17:50:52 +01:00
ngosang
e163019f28 Update undetected-chromedriver 2023-01-06 17:19:11 +01:00
ngosang
7d84f1b663 Kill Chromium processes properly to avoid defunct/zombie processes 2023-01-06 13:58:24 +01:00
ngosang
4807e9dbe2 Include procps (ps), curl and vim packages in the Docker image 2023-01-05 13:25:45 +01:00
21 changed files with 1850 additions and 1858 deletions

View File

@@ -32,7 +32,8 @@ body:
- Operating system:
- Are you using Docker: [yes/no]
- FlareSolverr User-Agent (see log traces or / endpoint):
- Are you using a proxy or VPN: [yes/no]
- Are you using a VPN: [yes/no]
- Are you using a Proxy: [yes/no]
- Are you using Captcha Solver: [yes/no]
- If using captcha solver, which one:
- URL to test this issue:

View File

@@ -11,7 +11,7 @@ jobs:
steps:
-
name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
-
name: Auto Tag
uses: Klemensas/action-autotag@stable

View File

@@ -11,39 +11,39 @@ jobs:
steps:
-
name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
-
name: Downcase repo
run: echo REPOSITORY=$(echo ${{ github.repository }} | tr '[:upper:]' '[:lower:]') >> $GITHUB_ENV
-
name: Docker meta
id: docker_meta
uses: crazy-max/ghaction-docker-meta@v1
uses: crazy-max/ghaction-docker-meta@v3
with:
images: ${{ env.REPOSITORY }},ghcr.io/${{ env.REPOSITORY }}
tag-sha: false
-
name: Set up QEMU
uses: docker/setup-qemu-action@v1.0.1
uses: docker/setup-qemu-action@v2
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
uses: docker/setup-buildx-action@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Login to GitHub Container Registry
uses: docker/login-action@v1
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GH_PAT }}
-
name: Build and push
uses: docker/build-push-action@v2
uses: docker/build-push-action@v3
with:
context: .
file: ./Dockerfile

View File

@@ -11,12 +11,12 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0 # get all commits, branches and tags (required for the changelog)
- name: Setup Node
uses: actions/setup-node@v2
uses: actions/setup-node@v3
with:
node-version: '16'

View File

@@ -1,5 +1,18 @@
# Changelog
## v3.0.2 (2023/01/08)
* Detect Cloudflare blocked access
* Check Chrome / Chromium web browser is installed correctly
## v3.0.1 (2023/01/06)
* Kill Chromium processes properly to avoid defunct/zombie processes
* Update undetected-chromedriver
* Disable Zygote sandbox in Chromium browser
* Add more selectors to detect blocked access
* Include procps (ps), curl and vim packages in the Docker image
## v3.0.0 (2023/01/04)
* This is the first release of FlareSolverr v3. There are some breaking changes

View File

@@ -29,7 +29,8 @@ RUN dpkg -i /libgl1-mesa-dri.deb \
&& dpkg -i /adwaita-icon-theme.deb \
# Install dependencies
&& apt-get update \
&& apt-get install -y --no-install-recommends chromium chromium-common chromium-driver xvfb \
&& apt-get install -y --no-install-recommends chromium chromium-common chromium-driver xvfb dumb-init \
procps curl vim \
# Remove temporary files and hardware decoding libraries
&& rm -rf /var/lib/apt/lists/* \
&& rm -f /usr/lib/x86_64-linux-gnu/libmfxhw* \
@@ -52,6 +53,9 @@ COPY package.json ../
EXPOSE 8191
# dumb-init avoids zombie chromium processes
ENTRYPOINT ["/usr/bin/dumb-init", "--"]
CMD ["/usr/local/bin/python", "-u", "/app/flaresolverr.py"]
# Local build

View File

@@ -64,14 +64,11 @@ Remember to restart the Docker daemon and the container after the update.
### Precompiled binaries
This is the recommended way for Windows users.
* Download the [FlareSolverr zip](https://github.com/FlareSolverr/FlareSolverr/releases) from the release's assets. It is available for Windows and Linux.
* Extract the zip file. FlareSolverr executable and firefox folder must be in the same directory.
* Execute FlareSolverr binary. In the environment variables section you can find how to change the configuration.
Precompiled binaries are not currently available for v3. Please see https://github.com/FlareSolverr/FlareSolverr/issues/660 for updates,
or below for instructions of how to build FlareSolverr from source code.
### From source code
This is the recommended way for macOS users and for developers.
* Install [Python 3.10](https://www.python.org/downloads/).
* Install [Chrome](https://www.google.com/intl/en_us/chrome/) or [Chromium](https://www.chromium.org/getting-involved/download-chromium/) web browser.
* (Only in Linux / macOS) Install [Xvfb](https://en.wikipedia.org/wiki/Xvfb) package.

View File

@@ -1,6 +1,6 @@
{
"name": "flaresolverr",
"version": "3.0.0",
"version": "3.0.3",
"description": "Proxy server to bypass Cloudflare protection",
"author": "Diego Heras (ngosang / ngosang@hotmail.es)",
"license": "MIT"

View File

@@ -1,9 +1,9 @@
bottle==0.12.23
waitress==2.1.2
selenium==4.4.3
selenium==4.7.2
func-timeout==4.3.5
# required by undetected_chromedriver
requests==2.28.1
websockets==10.3
websockets==10.4
# only required for linux
xvfbwrapper==0.2.9

View File

@@ -1,4 +1,5 @@
import logging
import sys
import time
from urllib.parse import unquote
@@ -13,11 +14,19 @@ from dtos import V1RequestBase, V1ResponseBase, ChallengeResolutionT, ChallengeR
HealthResponse, STATUS_OK, STATUS_ERROR
import utils
ACCESS_DENIED_TITLES = [
# Cloudflare
'Access denied',
# Cloudflare http://bitturk.net/ Firefox
'Attention Required! | Cloudflare'
]
ACCESS_DENIED_SELECTORS = [
# Cloudflare
'div.cf-error-title span.cf-code-label span'
'div.cf-error-title span.cf-code-label span',
# Cloudflare http://bitturk.net/ Firefox
'#cf-error-details div.cf-error-overview h1'
]
CHALLENGE_TITLE = [
CHALLENGE_TITLES = [
# Cloudflare
'Just a moment...',
# DDoS-GUARD
@@ -34,6 +43,21 @@ SHORT_TIMEOUT = 10
def test_browser_installation():
logging.info("Testing web browser installation...")
chrome_exe_path = utils.get_chrome_exe_path()
if chrome_exe_path is None:
logging.error("Chrome / Chromium web browser not installed!")
sys.exit(1)
else:
logging.info("Chrome / Chromium path: " + chrome_exe_path)
chrome_major_version = utils.get_chrome_major_version()
if chrome_major_version == '':
logging.error("Chrome / Chromium version not detected!")
sys.exit(1)
else:
logging.info("Chrome / Chromium major version: " + chrome_major_version)
user_agent = utils.get_user_agent()
logging.info("FlareSolverr User-Agent: " + user_agent)
logging.info("Test successful")
@@ -172,7 +196,13 @@ def _evil_logic(req: V1RequestBase, driver: WebDriver, method: str) -> Challenge
# wait for the page
html_element = driver.find_element(By.TAG_NAME, "html")
page_title = driver.title
# find access denied titles
for title in ACCESS_DENIED_TITLES:
if title == page_title:
raise Exception('Cloudflare has blocked this request. '
'Probably your IP is banned for this site, check in your web browser.')
# find access denied selectors
for selector in ACCESS_DENIED_SELECTORS:
found_elements = driver.find_elements(By.CSS_SELECTOR, selector)
@@ -182,8 +212,7 @@ def _evil_logic(req: V1RequestBase, driver: WebDriver, method: str) -> Challenge
# find challenge by title
challenge_found = False
page_title = driver.title
for title in CHALLENGE_TITLE:
for title in CHALLENGE_TITLES:
if title == page_title:
challenge_found = True
logging.info("Challenge detected. Title found: " + title)
@@ -200,8 +229,8 @@ def _evil_logic(req: V1RequestBase, driver: WebDriver, method: str) -> Challenge
if challenge_found:
while True:
try:
# wait until the title change
for title in CHALLENGE_TITLE:
# wait until the title changes
for title in CHALLENGE_TITLES:
logging.debug("Waiting for title: " + title)
WebDriverWait(driver, SHORT_TIMEOUT).until_not(title_is(title))

View File

@@ -1,7 +1,4 @@
#!/usr/bin/env python3
from __future__ import annotations
import subprocess
"""
@@ -17,33 +14,38 @@ Y88b. 888 888 888 Y88..88P 888 888 888 Y8b. Y88b 888 888 888 Y
by UltrafunkAmsterdam (https://github.com/ultrafunkamsterdam)
"""
from __future__ import annotations
__version__ = "3.1.5r4"
__version__ = "3.4.6"
import json
import logging
import os
import re
import shutil
import subprocess
import sys
import tempfile
import time
import inspect
import threading
from weakref import finalize
import selenium.webdriver.chrome.service
import selenium.webdriver.chrome.webdriver
from selenium.webdriver.common.by import By
import selenium.webdriver.common.service
import selenium.webdriver.remote.command
import selenium.webdriver.remote.webdriver
from .cdp import CDP
from .dprocess import start_detached
from .options import ChromeOptions
from .patcher import IS_POSIX
from .patcher import Patcher
from .reactor import Reactor
from .dprocess import start_detached
from .webelement import UCWebElement
from .webelement import WebElement
__all__ = (
"Chrome",
@@ -108,6 +110,7 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
port=0,
enable_cdp_events=False,
service_args=None,
service_creationflags=None,
desired_capabilities=None,
advanced_elements=False,
service_log_path=None,
@@ -119,8 +122,9 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
suppress_welcome=True,
use_subprocess=False,
debug=False,
no_sandbox=True,
windows_headless=False,
**kw
**kw,
):
"""
Creates a new instance of the chrome driver.
@@ -147,7 +151,9 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
If not specified, make sure the executable's folder is in $PATH
port: int, optional, default: 0
port you would like the service to run, if left as 0, a free port will be found.
port to be used by the chromedriver executable, this is NOT the debugger port.
leave it at 0 unless you know what you are doing.
the default value of 0 automatically picks an available port.
enable_cdp_events: bool, default: False
:: currently for chrome only
@@ -207,11 +213,12 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
now, in case you are nag-fetishist, or a diagnostics data feeder to google, you can set this to False.
Note: if you don't handle the nag screen in time, the browser loses it's connection and throws an Exception.
use_subprocess: bool, optional , default: False,
use_subprocess: bool, optional , default: True,
False (the default) makes sure Chrome will get it's own process (so no subprocess of chromedriver.exe or python
This fixes a LOT of issues, like multithreaded run, but mst importantly. shutting corectly after
program exits or using .quit()
you should be knowing what you're doing, and know how python works.
unfortunately, there is always an edge case in which one would like to write an single script with the only contents being:
--start script--
@@ -224,19 +231,24 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
in that case you can set this to `True`. The browser will start via subprocess, and will keep running most of times.
! setting it to True comes with NO support when being detected. !
no_sandbox: bool, optional, default=True
uses the --no-sandbox option, and additionally does suppress the "unsecure option" status bar
this option has a default of True since many people seem to run this as root (....) , and chrome does not start
when running as root without using --no-sandbox flag.
"""
finalize(self, self._ensure_close, self)
self.debug = debug
patcher = Patcher(
self.patcher = Patcher(
executable_path=driver_executable_path,
force=patcher_force_close,
version_main=version_main,
)
patcher.auto()
self.patcher = patcher
self.patcher.auto()
# self.patcher = patcher
if not options:
options = ChromeOptions()
try:
if hasattr(options, "_session") and options._session is not None:
# prevent reuse of options,
@@ -248,11 +260,17 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
options._session = self
debug_port = selenium.webdriver.common.service.utils.free_port()
debug_host = "127.0.0.1"
if not options.debugger_address:
debug_port = (
port
if port != 0
else selenium.webdriver.common.service.utils.free_port()
)
debug_host = "127.0.0.1"
options.debugger_address = "%s:%d" % (debug_host, debug_port)
else:
debug_host, debug_port = options.debugger_address.split(":")
debug_port = int(debug_port)
if enable_cdp_events:
options.set_capability(
@@ -263,13 +281,17 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
options.add_argument("--remote-debugging-port=%s" % debug_port)
if user_data_dir:
options.add_argument('--user-data-dir=%s' % user_data_dir)
options.add_argument("--user-data-dir=%s" % user_data_dir)
language, keep_user_data_dir = None, bool(user_data_dir)
# see if a custom user profile is specified in options
for arg in options.arguments:
if any([_ in arg for _ in ("--headless", "headless")]):
options.arguments.remove(arg)
options.headless = True
if "lang" in arg:
m = re.search("(?:--)?lang(?:[ =])?(.*)", arg)
try:
@@ -294,7 +316,6 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
)
if not user_data_dir:
# backward compatiblity
# check if an old uc.ChromeOptions is used, and extract the user data dir
@@ -347,8 +368,15 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
if suppress_welcome:
options.arguments.extend(["--no-default-browser-check", "--no-first-run"])
if no_sandbox:
options.arguments.extend(["--no-sandbox", "--test-type"])
if headless or options.headless:
options.headless = True
if self.patcher.version_main < 108:
options.add_argument("--headless=chrome")
elif self.patcher.version_main >= 108:
options.add_argument("--headless=new")
options.add_argument("--window-size=1920,1080")
options.add_argument("--start-maximized")
options.add_argument("--no-sandbox")
@@ -360,7 +388,7 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
or divmod(logging.getLogger().getEffectiveLevel(), 10)[0]
)
if hasattr(options, 'handle_prefs'):
if hasattr(options, "handle_prefs"):
options.handle_prefs(user_data_dir)
# fix exit_type flag to prevent tab-restore nag
@@ -376,6 +404,7 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
config["profile"]["exit_type"] = None
fs.seek(0, 0)
json.dump(config, fs)
fs.truncate() # the file might be shorter
logger.debug("fixed exit_type flag")
except Exception as e:
logger.debug("did not find a bad exit_type flag ")
@@ -403,14 +432,26 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
)
self.browser_pid = browser.pid
if service_creationflags:
service = selenium.webdriver.common.service.Service(
self.patcher.executable_path, port, service_args, service_log_path
)
for attr_name in ("creationflags", "creation_flags"):
if hasattr(service, attr_name):
setattr(service, attr_name, service_creationflags)
break
else:
service = None
super(Chrome, self).__init__(
executable_path=patcher.executable_path,
executable_path=self.patcher.executable_path,
port=port,
options=options,
service_args=service_args,
desired_capabilities=desired_capabilities,
service_log_path=service_log_path,
keep_alive=keep_alive,
service=service, # needed or the service will be re-created
)
self.reactor = None
@@ -425,35 +466,14 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
self.reactor = reactor
if advanced_elements:
from .webelement import WebElement
self._web_element_cls = UCWebElement
else:
self._web_element_cls = WebElement
if options.headless:
self._configure_headless()
def __getattribute__(self, item):
if not super().__getattribute__("debug"):
return super().__getattribute__(item)
else:
import inspect
original = super().__getattribute__(item)
if inspect.ismethod(original) and not inspect.isclass(original):
def newfunc(*args, **kwargs):
logger.debug(
"calling %s with args %s and kwargs %s\n"
% (original.__qualname__, args, kwargs)
)
return original(*args, **kwargs)
return newfunc
return original
def _configure_headless(self):
orig_get = self.get
logger.info("setting properties for headless")
@@ -465,18 +485,18 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
{
"source": """
Object.defineProperty(window, 'navigator', {
Object.defineProperty(window, "navigator", {
Object.defineProperty(window, "navigator", {
value: new Proxy(navigator, {
has: (target, key) => (key === 'webdriver' ? false : key in target),
has: (target, key) => (key === "webdriver" ? false : key in target),
get: (target, key) =>
key === 'webdriver' ?
false :
typeof target[key] === 'function' ?
target[key].bind(target) :
target[key]
})
key === "webdriver"
? false
: typeof target[key] === "function"
? target[key].bind(target)
: target[key],
}),
});
"""
},
)
@@ -494,49 +514,139 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
"Page.addScriptToEvaluateOnNewDocument",
{
"source": """
Object.defineProperty(navigator, 'maxTouchPoints', {
get: () => 1
})"""
Object.defineProperty(navigator, 'maxTouchPoints', {get: () => 1});
Object.defineProperty(navigator.connection, 'rtt', {get: () => 100});
// https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/evasions/chrome-runtime.js
window.chrome = {
app: {
isInstalled: false,
InstallState: {
DISABLED: 'disabled',
INSTALLED: 'installed',
NOT_INSTALLED: 'not_installed'
},
RunningState: {
CANNOT_RUN: 'cannot_run',
READY_TO_RUN: 'ready_to_run',
RUNNING: 'running'
}
},
runtime: {
OnInstalledReason: {
CHROME_UPDATE: 'chrome_update',
INSTALL: 'install',
SHARED_MODULE_UPDATE: 'shared_module_update',
UPDATE: 'update'
},
OnRestartRequiredReason: {
APP_UPDATE: 'app_update',
OS_UPDATE: 'os_update',
PERIODIC: 'periodic'
},
PlatformArch: {
ARM: 'arm',
ARM64: 'arm64',
MIPS: 'mips',
MIPS64: 'mips64',
X86_32: 'x86-32',
X86_64: 'x86-64'
},
PlatformNaclArch: {
ARM: 'arm',
MIPS: 'mips',
MIPS64: 'mips64',
X86_32: 'x86-32',
X86_64: 'x86-64'
},
PlatformOs: {
ANDROID: 'android',
CROS: 'cros',
LINUX: 'linux',
MAC: 'mac',
OPENBSD: 'openbsd',
WIN: 'win'
},
RequestUpdateCheckStatus: {
NO_UPDATE: 'no_update',
THROTTLED: 'throttled',
UPDATE_AVAILABLE: 'update_available'
}
}
}
// https://github.com/microlinkhq/browserless/blob/master/packages/goto/src/evasions/navigator-permissions.js
if (!window.Notification) {
window.Notification = {
permission: 'denied'
}
}
const originalQuery = window.navigator.permissions.query
window.navigator.permissions.__proto__.query = parameters =>
parameters.name === 'notifications'
? Promise.resolve({ state: window.Notification.permission })
: originalQuery(parameters)
const oldCall = Function.prototype.call
function call() {
return oldCall.apply(this, arguments)
}
Function.prototype.call = call
const nativeToStringFunctionString = Error.toString().replace(/Error/g, 'toString')
const oldToString = Function.prototype.toString
function functionToString() {
if (this === window.navigator.permissions.query) {
return 'function query() { [native code] }'
}
if (this === functionToString) {
return nativeToStringFunctionString
}
return oldCall.call(oldToString, this)
}
// eslint-disable-next-line
Function.prototype.toString = functionToString
"""
},
)
return orig_get(*args, **kwargs)
self.get = get_wrapped
def __dir__(self):
return object.__dir__(self)
def _get_cdc_props(self):
return self.execute_script(
"""
let objectToInspect = window,
result = [];
while(objectToInspect !== null)
{ result = result.concat(Object.getOwnPropertyNames(objectToInspect));
objectToInspect = Object.getPrototypeOf(objectToInspect); }
return result.filter(i => i.match(/.+_.+_(Array|Promise|Symbol)/ig))
"""
)
def _hook_remove_cdc_props(self):
self.execute_cdp_cmd(
"Page.addScriptToEvaluateOnNewDocument",
{
"source": """
let objectToInspect = window,
result = [];
while(objectToInspect !== null)
{ result = result.concat(Object.getOwnPropertyNames(objectToInspect));
objectToInspect = Object.getPrototypeOf(objectToInspect); }
result.forEach(p => p.match(/.+_.+_(Array|Promise|Symbol)/ig)
&&delete window[p]&&console.log('removed',p))
"""
},
)
# def _get_cdc_props(self):
# return self.execute_script(
# """
# let objectToInspect = window,
# result = [];
# while(objectToInspect !== null)
# { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
# objectToInspect = Object.getPrototypeOf(objectToInspect); }
#
# return result.filter(i => i.match(/^([a-zA-Z]){27}(Array|Promise|Symbol)$/ig))
# """
# )
#
# def _hook_remove_cdc_props(self):
# self.execute_cdp_cmd(
# "Page.addScriptToEvaluateOnNewDocument",
# {
# "source": """
# let objectToInspect = window,
# result = [];
# while(objectToInspect !== null)
# { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
# objectToInspect = Object.getPrototypeOf(objectToInspect); }
# result.forEach(p => p.match(/^([a-zA-Z]){27}(Array|Promise|Symbol)$/ig)
# &&delete window[p]&&console.log('removed',p))
# """
# },
# )
def get(self, url):
if self._get_cdc_props():
self._hook_remove_cdc_props()
# if self._get_cdc_props():
# self._hook_remove_cdc_props()
return super().get(url)
def add_cdp_listener(self, event_name, callback):
@@ -553,6 +663,11 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
if self.reactor and isinstance(self.reactor, Reactor):
self.reactor.handlers.clear()
def window_new(self):
self.execute(
selenium.webdriver.remote.command.Command.NEW_WINDOW, {"type": "window"}
)
def tab_new(self, url: str):
"""
this opens a url in a new tab.
@@ -597,24 +712,22 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
# super(Chrome, self).start_session(capabilities, browser_profile)
def quit(self):
logger.debug("closing webdriver")
if hasattr(self, "service") and getattr(self.service, "process", None):
try:
self.service.process.kill()
self.service.process.wait(5)
logger.debug("webdriver process ended")
except (AttributeError, RuntimeError, OSError):
pass
try:
if self.reactor and isinstance(self.reactor, Reactor):
logger.debug("shutting down reactor")
self.reactor.event.set()
except Exception: # noqa
logger.debug("shutting down reactor")
except AttributeError:
pass
try:
logger.debug("killing browser")
os.kill(self.browser_pid, 15)
except TimeoutError as e:
logger.debug("gracefully closed browser")
except Exception as e: # noqa
logger.debug(e, exc_info=True)
except Exception: # noqa
pass
if (
hasattr(self, "keep_user_data_dir")
and hasattr(self, "user_data_dir")
@@ -622,7 +735,6 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
):
for _ in range(5):
try:
shutil.rmtree(self.user_data_dir, ignore_errors=False)
except FileNotFoundError:
pass
@@ -640,13 +752,24 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
# this must come last, otherwise it will throw 'in use' errors
self.patcher = None
def __del__(self):
try:
super().quit()
# self.service.process.kill()
except: # noqa
pass
self.quit()
def __getattribute__(self, item):
if not super().__getattribute__("debug"):
return super().__getattribute__(item)
else:
import inspect
original = super().__getattribute__(item)
if inspect.ismethod(original) and not inspect.isclass(original):
def newfunc(*args, **kwargs):
logger.debug(
"calling %s with args %s and kwargs %s\n"
% (original.__qualname__, args, kwargs)
)
return original(*args, **kwargs)
return newfunc
return original
def __enter__(self):
return self
@@ -660,6 +783,27 @@ class Chrome(selenium.webdriver.chrome.webdriver.WebDriver):
def __hash__(self):
return hash(self.options.debugger_address)
def __dir__(self):
return object.__dir__(self)
def __del__(self):
try:
self.service.process.kill()
except: # noqa
pass
self.quit()
@classmethod
def _ensure_close(cls, self):
# needs to be a classmethod so finalize can find the reference
logger.info("ensuring close")
if (
hasattr(self, "service")
and hasattr(self.service, "process")
and hasattr(self.service.process, "kill")
):
self.service.process.kill()
def find_chrome_executable():
"""
@@ -691,8 +835,10 @@ def find_chrome_executable():
)
else:
for item in map(
os.environ.get, ("PROGRAMFILES", "PROGRAMFILES(X86)", "LOCALAPPDATA")
os.environ.get,
("PROGRAMFILES", "PROGRAMFILES(X86)", "LOCALAPPDATA", "PROGRAMW6432"),
):
if item is not None:
for subitem in (
"Google/Chrome/Application",
"Google/Chrome Beta/Application",

View File

@@ -1,259 +0,0 @@
#!/usr/bin/env python3
# this module is part of undetected_chromedriver
"""
888 888 d8b
888 888 Y8P
888 888
.d8888b 88888b. 888d888 .d88b. 88888b.d88b. .d88b. .d88888 888d888 888 888 888 .d88b. 888d888
d88P" 888 "88b 888P" d88""88b 888 "888 "88b d8P Y8b d88" 888 888P" 888 888 888 d8P Y8b 888P"
888 888 888 888 888 888 888 888 888 88888888 888 888 888 888 Y88 88P 88888888 888
Y88b. 888 888 888 Y88..88P 888 888 888 Y8b. Y88b 888 888 888 Y8bd8P Y8b. 888
"Y8888P 888 888 888 "Y88P" 888 888 888 "Y8888 "Y88888 888 888 Y88P "Y8888 888 88888888
by UltrafunkAmsterdam (https://github.com/ultrafunkamsterdam)
"""
import io
import logging
import os
import random
import re
import string
import sys
import zipfile
from distutils.version import LooseVersion
from urllib.request import urlopen, urlretrieve
from selenium.webdriver import Chrome as _Chrome, ChromeOptions as _ChromeOptions
TARGET_VERSION = 0
logger = logging.getLogger("uc")
class Chrome:
def __new__(cls, *args, emulate_touch=False, **kwargs):
if not ChromeDriverManager.installed:
ChromeDriverManager(*args, **kwargs).install()
if not ChromeDriverManager.selenium_patched:
ChromeDriverManager(*args, **kwargs).patch_selenium_webdriver()
if not kwargs.get("executable_path"):
kwargs["executable_path"] = "./{}".format(
ChromeDriverManager(*args, **kwargs).executable_path
)
if not kwargs.get("options"):
kwargs["options"] = ChromeOptions()
instance = object.__new__(_Chrome)
instance.__init__(*args, **kwargs)
instance._orig_get = instance.get
def _get_wrapped(*args, **kwargs):
if instance.execute_script("return navigator.webdriver"):
instance.execute_cdp_cmd(
"Page.addScriptToEvaluateOnNewDocument",
{
"source": """
Object.defineProperty(window, 'navigator', {
value: new Proxy(navigator, {
has: (target, key) => (key === 'webdriver' ? false : key in target),
get: (target, key) =>
key === 'webdriver'
? undefined
: typeof target[key] === 'function'
? target[key].bind(target)
: target[key]
})
});
"""
},
)
return instance._orig_get(*args, **kwargs)
instance.get = _get_wrapped
instance.get = _get_wrapped
instance.get = _get_wrapped
original_user_agent_string = instance.execute_script(
"return navigator.userAgent"
)
instance.execute_cdp_cmd(
"Network.setUserAgentOverride",
{
"userAgent": original_user_agent_string.replace("Headless", ""),
},
)
if emulate_touch:
instance.execute_cdp_cmd(
"Page.addScriptToEvaluateOnNewDocument",
{
"source": """
Object.defineProperty(navigator, 'maxTouchPoints', {
get: () => 1
})"""
},
)
logger.info(f"starting undetected_chromedriver.Chrome({args}, {kwargs})")
return instance
class ChromeOptions:
def __new__(cls, *args, **kwargs):
if not ChromeDriverManager.installed:
ChromeDriverManager(*args, **kwargs).install()
if not ChromeDriverManager.selenium_patched:
ChromeDriverManager(*args, **kwargs).patch_selenium_webdriver()
instance = object.__new__(_ChromeOptions)
instance.__init__()
instance.add_argument("start-maximized")
instance.add_experimental_option("excludeSwitches", ["enable-automation"])
instance.add_argument("--disable-blink-features=AutomationControlled")
return instance
class ChromeDriverManager(object):
installed = False
selenium_patched = False
target_version = None
DL_BASE = "https://chromedriver.storage.googleapis.com/"
def __init__(self, executable_path=None, target_version=None, *args, **kwargs):
_platform = sys.platform
if TARGET_VERSION:
# use global if set
self.target_version = TARGET_VERSION
if target_version:
# use explicitly passed target
self.target_version = target_version # user override
if not self.target_version:
# none of the above (default) and just get current version
self.target_version = self.get_release_version_number().version[
0
] # only major version int
self._base = base_ = "chromedriver{}"
exe_name = self._base
if _platform in ("win32",):
exe_name = base_.format(".exe")
if _platform in ("linux",):
_platform += "64"
exe_name = exe_name.format("")
if _platform in ("darwin",):
_platform = "mac64"
exe_name = exe_name.format("")
self.platform = _platform
self.executable_path = executable_path or exe_name
self._exe_name = exe_name
def patch_selenium_webdriver(self_):
"""
Patches selenium package Chrome, ChromeOptions classes for current session
:return:
"""
import selenium.webdriver.chrome.service
import selenium.webdriver
selenium.webdriver.Chrome = Chrome
selenium.webdriver.ChromeOptions = ChromeOptions
logger.info("Selenium patched. Safe to import Chrome / ChromeOptions")
self_.__class__.selenium_patched = True
def install(self, patch_selenium=True):
"""
Initialize the patch
This will:
download chromedriver if not present
patch the downloaded chromedriver
patch selenium package if <patch_selenium> is True (default)
:param patch_selenium: patch selenium webdriver classes for Chrome and ChromeDriver (for current python session)
:return:
"""
if not os.path.exists(self.executable_path):
self.fetch_chromedriver()
if not self.__class__.installed:
if self.patch_binary():
self.__class__.installed = True
if patch_selenium:
self.patch_selenium_webdriver()
def get_release_version_number(self):
"""
Gets the latest major version available, or the latest major version of self.target_version if set explicitly.
:return: version string
"""
path = (
"LATEST_RELEASE"
if not self.target_version
else f"LATEST_RELEASE_{self.target_version}"
)
return LooseVersion(urlopen(self.__class__.DL_BASE + path).read().decode())
def fetch_chromedriver(self):
"""
Downloads ChromeDriver from source and unpacks the executable
:return: on success, name of the unpacked executable
"""
base_ = self._base
zip_name = base_.format(".zip")
ver = self.get_release_version_number().vstring
if os.path.exists(self.executable_path):
return self.executable_path
urlretrieve(
f"{self.__class__.DL_BASE}{ver}/{base_.format(f'_{self.platform}')}.zip",
filename=zip_name,
)
with zipfile.ZipFile(zip_name) as zf:
zf.extract(self._exe_name)
os.remove(zip_name)
if sys.platform != "win32":
os.chmod(self._exe_name, 0o755)
return self._exe_name
@staticmethod
def random_cdc():
cdc = random.choices(string.ascii_lowercase, k=26)
cdc[-6:-4] = map(str.upper, cdc[-6:-4])
cdc[2] = cdc[0]
cdc[3] = "_"
return "".join(cdc).encode()
def patch_binary(self):
"""
Patches the ChromeDriver binary
:return: False on failure, binary name on success
"""
linect = 0
replacement = self.random_cdc()
with io.open(self.executable_path, "r+b") as fh:
for line in iter(lambda: fh.readline(), b""):
if b"cdc_" in line:
fh.seek(-len(line), 1)
newline = re.sub(b"cdc_.{22}", replacement, line)
fh.write(newline)
linect += 1
return linect
def install(executable_path=None, target_version=None, *args, **kwargs):
ChromeDriverManager(executable_path, target_version, *args, **kwargs).install()

View File

@@ -3,11 +3,11 @@
import json
import logging
from collections.abc import Mapping, Sequence
import requests
import websockets
log = logging.getLogger(__name__)

View File

@@ -1,17 +1,16 @@
import asyncio
import logging
import time
import traceback
from collections.abc import Mapping
from collections.abc import Sequence
from functools import wraps
import logging
import threading
import time
import traceback
from typing import Any
from typing import Awaitable
from typing import Callable
from typing import List
from typing import Optional
from contextlib import ExitStack
import threading
from functools import wraps, partial
class Structure(dict):

View File

@@ -1,13 +1,13 @@
import atexit
import logging
import multiprocessing
import os
import platform
import sys
import signal
from subprocess import PIPE
from subprocess import Popen
import atexit
import traceback
import logging
import signal
import sys
CREATE_NEW_PROCESS_GROUP = 0x00000200
DETACHED_PROCESS = 0x00000008
@@ -27,12 +27,14 @@ def start_detached(executable, *args):
reader, writer = multiprocessing.Pipe(False)
# do not keep reference
multiprocessing.Process(
process = multiprocessing.Process(
target=_start_detached,
args=(executable, *args),
kwargs={"writer": writer},
daemon=True,
).start()
)
process.start()
process.join()
# receive pid from pipe
pid = reader.recv()
REGISTERED.append(pid)
@@ -44,7 +46,6 @@ def start_detached(executable, *args):
def _start_detached(executable, *args, writer: multiprocessing.Pipe = None):
# configure launch
kwargs = {}
if platform.system() == "Windows":

View File

@@ -39,10 +39,23 @@ class ChromeOptions(_ChromiumOptions):
value = ChromeOptions._undot_key(rest, value)
return {key: value}
@staticmethod
def _merge_nested(a, b):
"""
merges b into a
leaf values in a are overwritten with values from b
"""
for key in b:
if key in a:
if isinstance(a[key], dict) and isinstance(b[key], dict):
ChromeOptions._merge_nested(a[key], b[key])
continue
a[key] = b[key]
return a
def handle_prefs(self, user_data_dir):
prefs = self.experimental_options.get("prefs")
if prefs:
user_data_dir = user_data_dir or self._user_data_dir
default_path = os.path.join(user_data_dir, "Default")
os.makedirs(default_path, exist_ok=True)
@@ -50,12 +63,14 @@ class ChromeOptions(_ChromiumOptions):
# undot prefs dict keys
undot_prefs = {}
for key, value in prefs.items():
undot_prefs.update(self._undot_key(key, value))
undot_prefs = self._merge_nested(
undot_prefs, self._undot_key(key, value)
)
prefs_file = os.path.join(default_path, "Preferences")
if os.path.exists(prefs_file):
with open(prefs_file, encoding="latin1", mode="r") as f:
undot_prefs.update(json.load(f))
undot_prefs = self._merge_nested(json.load(f), undot_prefs)
with open(prefs_file, encoding="latin1", mode="w") as f:
json.dump(undot_prefs, f)

View File

@@ -1,6 +1,7 @@
#!/usr/bin/env python3
# this module is part of undetected_chromedriver
from distutils.version import LooseVersion
import io
import logging
import os
@@ -9,15 +10,14 @@ import re
import string
import sys
import time
from urllib.request import urlopen
from urllib.request import urlretrieve
import zipfile
from distutils.version import LooseVersion
from urllib.request import urlopen, urlretrieve
import secrets
logger = logging.getLogger(__name__)
IS_POSIX = sys.platform.startswith(("darwin", "cygwin", "linux"))
IS_POSIX = sys.platform.startswith(("darwin", "cygwin", "linux", "linux2"))
class Patcher(object):
@@ -29,7 +29,7 @@ class Patcher(object):
if platform.endswith("win32"):
zip_name %= "win32"
exe_name %= ".exe"
if platform.endswith("linux"):
if platform.endswith(("linux", "linux2")):
zip_name %= "linux64"
exe_name %= ""
if platform.endswith("darwin"):
@@ -38,7 +38,9 @@ class Patcher(object):
if platform.endswith("win32"):
d = "~/appdata/roaming/undetected_chromedriver"
elif platform.startswith("linux"):
elif "LAMBDA_TASK_ROOT" in os.environ:
d = "/tmp/undetected_chromedriver"
elif platform.startswith(("linux", "linux2")):
d = "~/.local/share/undetected_chromedriver"
elif platform.endswith("darwin"):
d = "~/Library/Application Support/undetected_chromedriver"
@@ -48,7 +50,6 @@ class Patcher(object):
def __init__(self, executable_path=None, force=False, version_main: int = 0):
"""
Args:
executable_path: None = automatic
a full file path to the chromedriver executable
@@ -57,10 +58,9 @@ class Patcher(object):
version_main: 0 = auto
specify main chrome version (rounded, ex: 82)
"""
self.force = force
self.executable_path = None
prefix = secrets.token_hex(8)
self._custom_exe_path = False
prefix = "undetected"
if not os.path.exists(self.data_path):
os.makedirs(self.data_path, exist_ok=True)
@@ -82,8 +82,6 @@ class Patcher(object):
os.path.join(".", self.executable_path)
)
self._custom_exe_path = False
if executable_path:
self._custom_exe_path = True
self.executable_path = executable_path
@@ -91,7 +89,6 @@ class Patcher(object):
self.version_full = None
def auto(self, executable_path=None, force=False, version_main=None):
""""""
if executable_path:
self.executable_path = executable_path
self._custom_exe_path = True
@@ -203,43 +200,46 @@ class Patcher(object):
@staticmethod
def gen_random_cdc():
cdc = random.choices(string.ascii_lowercase, k=26)
cdc[-6:-4] = map(str.upper, cdc[-6:-4])
cdc[2] = cdc[0]
cdc[3] = "_"
cdc = random.choices(string.ascii_letters, k=27)
return "".join(cdc).encode()
def is_binary_patched(self, executable_path=None):
"""simple check if executable is patched.
:return: False if not patched, else True
"""
executable_path = executable_path or self.executable_path
try:
with io.open(executable_path, "rb") as fh:
for line in iter(lambda: fh.readline(), b""):
if b"cdc_" in line:
return fh.read().find(b"undetected chromedriver") != -1
except FileNotFoundError:
return False
else:
return True
def patch_exe(self):
"""
Patches the ChromeDriver binary
:return: False on failure, binary name on success
"""
start = time.perf_counter()
logger.info("patching driver executable %s" % self.executable_path)
linect = 0
replacement = self.gen_random_cdc()
with io.open(self.executable_path, "r+b") as fh:
for line in iter(lambda: fh.readline(), b""):
if b"cdc_" in line:
fh.seek(-len(line), 1)
newline = re.sub(b"cdc_.{22}", replacement, line)
fh.write(newline)
linect += 1
return linect
content = fh.read()
# match_injected_codeblock = re.search(rb"{window.*;}", content)
match_injected_codeblock = re.search(rb"\{window\.cdc.*?;\}", content)
if match_injected_codeblock:
target_bytes = match_injected_codeblock[0]
new_target_bytes = (
b'{console.log("undetected chromedriver 1337!")}'.ljust(
len(target_bytes), b" "
)
)
new_content = content.replace(target_bytes, new_target_bytes)
if new_content == content:
logger.warning(
"something went wrong patching the driver binary. could not find injection code block"
)
else:
logger.debug(
"found block:\n%s\nreplacing with:\n%s"
% (target_bytes, new_target_bytes)
)
fh.seek(0)
fh.write(new_content)
logger.debug(
"patching took us {:.2f} seconds".format(time.perf_counter() - start)
)
def __repr__(self):
return "{0:s}({1:s})".format(
@@ -248,7 +248,6 @@ class Patcher(object):
)
def __del__(self):
if self._custom_exe_path:
# if the driver binary is specified by user
# we assume it is important enough to not delete it

View File

@@ -6,6 +6,7 @@ import json
import logging
import threading
logger = logging.getLogger(__name__)
@@ -63,9 +64,7 @@ class Reactor(threading.Thread):
break
async def listen(self):
while self.running:
await self._wait_service_started()
await asyncio.sleep(1)
@@ -74,9 +73,7 @@ class Reactor(threading.Thread):
log_entries = self.driver.get_log("performance")
for entry in log_entries:
try:
obj_serialized: str = entry.get("message")
obj = json.loads(obj_serialized)
message = obj.get("message")

View File

@@ -1,4 +0,0 @@
# for backward compatibility
import sys
sys.modules[__name__] = sys.modules[__package__]

View File

@@ -1,7 +1,30 @@
from typing import List
from selenium.webdriver.common.by import By
import selenium.webdriver.remote.webelement
class WebElement(selenium.webdriver.remote.webelement.WebElement):
def click_safe(self):
super().click()
self._parent.reconnect(0.1)
def children(
self, tag=None, recursive=False
) -> List[selenium.webdriver.remote.webelement.WebElement]:
"""
returns direct child elements of current element
:param tag: str, if supplied, returns <tag> nodes only
"""
script = "return [... arguments[0].children]"
if tag:
script += ".filter( node => node.tagName === '%s')" % tag.upper()
if recursive:
return list(_recursive_children(self, tag))
return list(self._parent.execute_script(script, self))
class UCWebElement(WebElement):
"""
Custom WebElement class which makes it easier to view elements when
working in an interactive environment.
@@ -14,9 +37,13 @@ class WebElement(selenium.webdriver.remote.webelement.WebElement):
"""
def __init__(self, parent, id_):
super().__init__(parent, id_)
self._attrs = None
@property
def attrs(self):
if not hasattr(self, "_attrs"):
if not self._attrs:
self._attrs = self._parent.execute_script(
"""
var items = {};
@@ -35,3 +62,25 @@ class WebElement(selenium.webdriver.remote.webelement.WebElement):
if strattrs:
strattrs = " " + strattrs
return f"{self.__class__.__name__} <{self.tag_name}{strattrs}>"
def _recursive_children(element, tag: str = None, _results=None):
"""
returns all children of <element> recursively
:param element: `WebElement` object.
find children below this <element>
:param tag: str = None.
if provided, return only <tag> elements. example: 'a', or 'img'
:param _results: do not use!
"""
results = _results or set()
for element in element.children():
if tag:
if element.tag_name == tag:
results.add(element)
else:
results.add(element)
results |= _recursive_children(element, tag, results)
return results

View File

@@ -44,6 +44,8 @@ def get_webdriver() -> WebDriver:
# todo: this param shows a warning in chrome head-full
options.add_argument('--disable-setuid-sandbox')
options.add_argument('--disable-dev-shm-usage')
# this option removes the zygote sandbox (it seems that the resolution is a bit faster)
options.add_argument('--no-zygote')
# note: headless mode is detected (options.headless = True)
# we launch the browser in head-full mode with the window hidden
@@ -86,6 +88,10 @@ def get_webdriver() -> WebDriver:
return driver
def get_chrome_exe_path() -> str:
return uc.find_chrome_executable()
def get_chrome_major_version() -> str:
global CHROME_MAJOR_VERSION
if CHROME_MAJOR_VERSION is not None:
@@ -110,7 +116,6 @@ def get_chrome_major_version() -> str:
process.close()
CHROME_MAJOR_VERSION = complete_version.split('.')[0].split(' ')[-1]
logging.info(f"Chrome major version: {CHROME_MAJOR_VERSION}")
return CHROME_MAJOR_VERSION