Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[馃悰 Bug]: Fail to download PDF or zip file from remote to client on Remote webdriver #13956

Open
15975518086 opened this issue May 17, 2024 · 3 comments

Comments

@15975518086
Copy link

15975518086 commented May 17, 2024

What happened?

error:

D:\Python\Python311\python.exe D:/OfflineaCare/ndb/program/test/test_oooooooo.py
Traceback (most recent call last):
File "D:\OfflineaCare\ndb\program\test\test_oooooooo.py", line 51, in
driver.download_file(downloadable_file, target_directory)
File "D:\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1155, in download_file
zip_ref.extractall(target_directory)
File "D:\Python\Python311\Lib\zipfile.py", line 1679, in extractall
self._extract_member(zipinfo, path, pwd)
File "D:\Python\Python311\Lib\zipfile.py", line 1734, in _extract_member
shutil.copyfileobj(source, target)
File "D:\Python\Python311\Lib\shutil.py", line 197, in copyfileobj
buf = fsrc_read(length)
^^^^^^^^^^^^^^^^^
File "D:\Python\Python311\Lib\zipfile.py", line 953, in read
data = self._read1(n)
^^^^^^^^^^^^^^
File "D:\Python\Python311\Lib\zipfile.py", line 1021, in _read1
data += self._read2(n - len(data))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Python\Python311\Lib\zipfile.py", line 1056, in _read2
raise EOFError
EOFError

Process finished with exit code 1

How can we reproduce the issue?

The code bellow is click the button,then download the .docx file(or zip or pdf)
code:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

options = webdriver.ChromeOptions()
options.enable_downloads = True
driver = webdriver.Remote(command_executor='http://192.168.3.35:4444/wd/hub', options=options)
driver.maximize_window()
driver.implicitly_wait(5)
driver.get("http://127.0.0.1:8000/login_page")
driver.find_element(By.XPATH,"//button[text()='瀵煎嚭']").click()
time.sleep(5)
file_names = driver.get_downloadable_files()
downloadable_file = file_names[0]
target_directory = r'D:\dtmp'
driver.download_file(downloadable_file, target_directory)
time.sleep(10)


node setting:
java -jar selenium-server-4.20.0.jar node --hub http://192.168.3.35:4444   --host 192.168.3.35 --port 5557  --enable-managed-downloads true



I found the the source code in webdriver.py the method :def get_downloadable_files, has some issues
if i set the name to be zip like 'file_name = 'package.zip' ,then i can run successfully, but without this ,it will fail


        contents = self.execute(Command.DOWNLOAD_FILE, {"name": file_name})["value"]["contents"]
        # file_name = 'package.zip'
        target_file = os.path.join(target_directory, file_name)
        with open(target_file, "wb") as file:
            file.write(base64.b64decode(contents))

        with zipfile.ZipFile(target_file, "r") as zip_ref:
            zip_ref.extractall(target_directory)

Relevant log output

D:\Python\Python311\python.exe D:/OfflineaCare/ndb/program/test/test_oooooooo.py
Traceback (most recent call last):
  File "D:\OfflineaCare\ndb\program\test\test_oooooooo.py", line 51, in <module>
    driver.download_file(downloadable_file, target_directory)
  File "D:\Python\Python311\Lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1155, in download_file
    zip_ref.extractall(target_directory)
  File "D:\Python\Python311\Lib\zipfile.py", line 1679, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "D:\Python\Python311\Lib\zipfile.py", line 1734, in _extract_member
    shutil.copyfileobj(source, target)
  File "D:\Python\Python311\Lib\shutil.py", line 197, in copyfileobj
    buf = fsrc_read(length)
          ^^^^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 953, in read
    data = self._read1(n)
           ^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 1021, in _read1
    data += self._read2(n - len(data))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Python\Python311\Lib\zipfile.py", line 1056, in _read2
    raise EOFError
EOFError

Process finished with exit code 1

Operating System

WINDOWS10

Selenium version

selenium 4.20.0 python 3.11.3

What are the browser(s) and version(s) where you see this issue?

Chrome 124

What are the browser driver(s) and version(s) where you see this issue?

124.0.6367.61

Are you using Selenium Grid?

selenium-server-4.20.0.jar

Copy link

@15975518086, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@15975518086 15975518086 changed the title [馃悰 Bug]: Remote webdriver to download PDF or zip file from remote to client [馃悰 Bug]: Fail to download PDF or zip file from remote to client on Remote webdriver May 17, 2024
@M1troll
Copy link
Contributor

M1troll commented May 27, 2024

Hi!

I encountered the same problem when trying to download a zip file.

Also in the process of debugging I catch another error message here (maybe it help:
image

Operating System: Manjaro Linux
Selenium version: 4.21
Python version: 3.12
Browsers: Chrome , Firefox, Edge (latest versions of selenium/standalone)

Traceback:

tests/modules/test_internal_export.py:104: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.venv/lib/python3.12/site-packages/selenium/webdriver/remote/webdriver.py:1155: in download_file
   zip_ref.extractall(target_directory)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1720: in extractall
   self._extract_member(zipinfo, path, pwd)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1778: in _extract_member
   shutil.copyfileobj(source, target)
../../../.pyenv/versions/3.12.0/lib/python3.12/shutil.py:203: in copyfileobj
   while buf := fsrc_read(length):
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:978: in read
   data = self._read1(n)
../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1046: in _read1
   data += self._read2(n - len(data))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <zipfile.ZipExtFile [closed]>, n = 3094

   def _read2(self, n):
       if self._compress_left <= 0:
           return b''
   
       n = max(n, self.MIN_READ_SIZE)
       n = min(n, self._compress_left)
   
       data = self._fileobj.read(n)
       self._compress_left -= len(data)
       if not data:
>           raise EOFError
E           EOFError

../../../.pyenv/versions/3.12.0/lib/python3.12/zipfile/__init__.py:1081: EOFError

Docker-compose file

version: '3'

services:
 chrome:
   image: selenium/standalone-chrome
   shm_size: 2gb
   ports:
     - 4444:4444  # Selenium service
     - 5900:5900  # VNC server
     - 7900:7900  # VNC browser client
   environment:
     - SE_OPTS=--enable-managed-downloads true

@mormamn
Copy link

mormamn commented Jun 2, 2024

We are also experiencing the same issue...
The root issue, is that it's writing the zip-file content with the same name of the desired file, when it starts to uncompress, the "zip" file get's overwritten and then the file goes empty resulting with the EOF exception

ATM we are bypassing it by calling the self.execute directly with a similar solution to what millin did in his PR

    def __download_file(self, file_name: str, target_directory: str) -> None:
        if not os.path.exists(target_directory):
            os.makedirs(target_directory)

        contents = self.execute(Command.DOWNLOAD_FILE, {"name": file_name})["value"]["contents"]

        zip_target_file = os.path.join(target_directory, f"{file_name}.zip")
        with open(zip_target_file, "wb") as file:
            file.write(base64.b64decode(contents))

        with zipfile.ZipFile(zip_target_file, "r") as zip_ref:
            zip_ref.extractall(target_directory)
        os.remove(zip_target_file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants