Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ftp文件过多程序崩溃 #1020

Closed
Bear-big-code opened this issue Mar 12, 2024 · 3 comments
Closed

[Bug]: ftp文件过多程序崩溃 #1020

Bear-big-code opened this issue Mar 12, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@Bear-big-code
Copy link

Bear-big-code commented Mar 12, 2024

What happened?

配置:

{
  "job": {
    "setting": {
      "speed": {
        "byte": -1,
        "channel": 8
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
        "reader": {
          "name": "ftpreader",
          "parameter": {
            "column": ["*"],
            "protocol": "ftp",
            "host": "127.0.0.1",
            "port": "3021",
            "username": "admin",
            "password": "123456",
            "compress": "stream",
            "skipDelimiter": true,
            "path": "/2024-01-06"
          }
        },
        "writer": {
          "name": "ftpwriter",
          "parameter": {
            "column": ["*"],
            "protocol": "ftp",
            "host": "127.0.0.1",
            "port": "3021",
            "username": "admin",
            "password": "123456",
            "path": "/data-xmh",
            "fileName": "101-测试",
            "writeMode": "truncate",
            "compress": "stream",
            "skipDelimiter": true
          }
        }
      }
    ]
  }
}

Version

4.1.3 (Default)

OS Type

No response

Java JDK Version

Oracle JDK 1.8.0

Relevant log output

23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_8arwds66]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_46r477u6]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_zfca1d5d]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_72y56sya]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_t3qmgt8q]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_2v9g1mq6]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_u04ha4x0]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- split write file name:[101-测试__20240312_235620_467_y22h13w2]
23:56:20.467 [job-0] INFO com.wgzhao.addax.storage.writer.StorageWriterUtil -- Finished split.
23:56:20.467 [job-0] INFO com.wgzhao.addax.core.job.JobContainer -- The Writer.Job [ftpwriter] is divided into [158632] task(s).
23:56:20.467 [job-0] DEBUG com.wgzhao.addax.core.job.JobContainer -- The transformer configuration:[] 
23:56:36.145 [job-0] DEBUG com.wgzhao.addax.core.job.JobContainer -- 
	 [total cpu info] => 
		averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
		-1.00%                         | -1.00%                         | -1.00%
                        

	 [total gc info] => 
		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
		 G1 Young Generation  | 18                 | 18                 | 18                 | 0.625s             | 0.625s             | 0.625s             
		 G1 Old Generation    | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

已与地址为 ''127.0.0.1:50283',传输: '套接字'' 的目标虚拟机断开连接
@Bear-big-code Bear-big-code added the bug Something isn't working label Mar 12, 2024
@wgzhao
Copy link
Owner

wgzhao commented Mar 13, 2024

这是我本地测试的结果,读取指定目录下 311个文件。为了减少篇幅,相似日志输出做了截断

2024-03-13 13:32:58.801 [        main] INFO  VMInfo               - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-03-13 13:32:58.810 [        main] INFO  Engine               -
{
	"setting":{
		"speed":{
			"byte":-1,
			"channel":8
		},
		"errorLimit":{
			"record":0,
			"percentage":0.02
		}
	},
	"content":{
		"reader":{
			"name":"ftpreader",
			"parameter":{
				"column":[
					"*"
				],
				"protocol":"ftp",
				"host":"127.0.0.1",
				"port":"21",
				"username":"wgzhao",
				"password":"*****",
				"skipDelimiter":true,
				"path":"/home/wgzhao/ftptest/ftpreader"
			}
		},
		"writer":{
			"name":"ftpwriter",
			"parameter":{
				"column":[
					"*"
				],
				"protocol":"ftp",
				"host":"127.0.0.1",
				"port":"21",
				"username":"wgzhao",
				"password":"*****",
				"path":"/home/wgzhao/ftptest/ftpwriter",
				"fileName":"101-测试",
				"writeMode":"truncate",
				"compress":"gz",
				"skipDelimiter":true
			}
		}
	}
}

2024-03-13 13:32:58.823 [        main] INFO  JobContainer         - The jobContainer begins to process the job.
2024-03-13 13:32:58.856 [       job-0] WARN  StorageWriterUtil    - The item encoding is empty, uses [UTF-8] as default.
2024-03-13 13:32:58.856 [       job-0] WARN  StorageWriterUtil    - The item delimiter is empty, uses [,] as default.
2024-03-13 13:32:58.874 [       job-0] INFO  JobContainer         - The Reader.Job [ftpreader] perform prepare work .
2024-03-13 13:32:58.916 [       job-0] INFO  FtpReader$Job        - 您即将读取的文件数为: [311]
2024-03-13 13:32:58.916 [       job-0] INFO  JobContainer         - The Writer.Job [ftpwriter] perform prepare work .
2024-03-13 13:32:58.917 [       job-0] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:58.938 [       job-0] INFO  FtpWriter$Job        - The current writeMode is truncate, begin to cleanup all files with prefix [101-测试] under [/home/wgzhao/ftptest/ftpwriter].
2024-03-13 13:32:58.938 [       job-0] INFO  FtpWriter$Job        - The following file(s) will be deleted: [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_793_tdwnz8ut.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_783_4rmxunsq.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_817_44veuzg4.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_byy3pc11.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_gcfcw8vm.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_790_3xvm4f94.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_821_5t4agz8d.txt, /home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133157_817_gu145tv1.txt].
2024-03-13 13:32:58.938 [       job-0] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:58.940 [       job-0] INFO  JobContainer         - Job set Channel-Number to 8 channel(s).
2024-03-13 13:32:58.956 [       job-0] INFO  JobContainer         - The Reader.Job [ftpreader] is divided into [311] task(s).
2024-03-13 13:32:58.956 [       job-0] INFO  StorageWriterUtil    - Begin to split...
2024-03-13 13:32:58.963 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_962_x27b8yt1]
2024-03-13 13:32:58.964 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_963_2n52sntt]
2024-03-13 13:32:58.964 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_964_e623a1s8]
2024-03-13 13:32:58.964 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_964_hgva8zyv]
2024-03-13 13:32:58.964 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_964_68uqewte]
2024-03-13 13:32:58.964 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_964_xwreawz9]
2024-03-13 13:32:58.965 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_964_6f3q3fyb]
2024-03-13 13:32:58.965 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_965_g3xqcuz4]
2024-03-13 13:32:58.965 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_965_u4ux9ywb]
2024-03-13 13:32:58.965 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_965_cdqacs9e]
2024-03-13 13:32:58.965 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_965_hb98t8ag]
2024-03-13 13:32:58.965 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_965_nshwrf8v]
2024-03-13 13:32:58.966 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133258_966_ct0fr1vx]
......
2024-03-13 13:32:59.000 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_000_ayxeax1h]
2024-03-13 13:32:59.001 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_upb2bgwv]
2024-03-13 13:32:59.001 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_9v2acmmw]
2024-03-13 13:32:59.001 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_bfmfes78]
2024-03-13 13:32:59.001 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_efg8bsz3]
2024-03-13 13:32:59.001 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_w8yu94xs]
2024-03-13 13:32:59.001 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_wtv1v4qp]
2024-03-13 13:32:59.001 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_utvqbwwa]
2024-03-13 13:32:59.001 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_uen8emwr]
2024-03-13 13:32:59.002 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_001_fexsrc16]
2024-03-13 13:32:59.002 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_002_5rb40g5w]
2024-03-13 13:32:59.002 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_002_9bsxftnm]
2024-03-13 13:32:59.002 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_002_1pqh18dv]
2024-03-13 13:32:59.002 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_002_fpppg6ep]
2024-03-13 13:32:59.002 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_002_n4vx5rq6]
2024-03-13 13:32:59.002 [       job-0] INFO  StorageWriterUtil    - split write file name:[101-测试__20240313_133259_002_huh199rz]
2024-03-13 13:32:59.010 [       job-0] INFO  StorageWriterUtil    - Finished split.
2024-03-13 13:32:59.010 [       job-0] INFO  JobContainer         - The Writer.Job [ftpwriter] is divided into [311] task(s).
2024-03-13 13:32:59.079 [       job-0] INFO  JobContainer         - The Scheduler launches [1] taskGroup(s).
2024-03-13 13:32:59.092 [ taskGroup-0] INFO  TaskGroupContainer   - The taskGroupId=[0] started [8] channels for [311] tasks.
2024-03-13 13:32:59.095 [ taskGroup-0] INFO  Channel              - The Channel set byte_speed_limit to -1, No bps activated.
2024-03-13 13:32:59.095 [ taskGroup-0] INFO  Channel              - The Channel set record_speed_limit to -1, No tps activated.
2024-03-13 13:32:59.116 [writer-0-284] INFO  FtpWriter$Task       - begin do write...
2024-03-13 13:32:59.116 [writer-0-284] INFO  FtpWriter$Task       - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133259_007_g1c7ysae.txt]
2024-03-13 13:32:59.116 [reader-0-284] INFO  FtpReader$Task       - reading file : [/home/wgzhao/ftptest/ftpreader/000851.SZ-300660.SZ.csv]
2024-03-13 13:32:59.116 [writer-0-284] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:32:59.117 [writer-0-284] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:32:59.118 [writer-0-279] INFO  FtpWriter$Task       - begin do write...
2024-03-13 13:32:59.118 [writer-0-182] INFO  FtpWriter$Task       - begin do write...
2024-03-13 13:32:59.119 [writer-0-182] INFO  FtpWriter$Task       - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_995_uqn2751q.txt]
2024-03-13 13:32:59.119 [writer-0-279] INFO  FtpWriter$Task       - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133259_007_25dn2gmv.txt]
2024-03-13 13:32:59.120 [writer-0-182] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao
.....
2024-03-13 13:33:03.463 [writer-0-205] INFO  FtpWriter$Task       - begin do write...
2024-03-13 13:33:03.463 [reader-0-173] WARN  StorageReaderUtil    - Uses [,] as delimiter by default
2024-03-13 13:33:03.463 [writer-0-205] INFO  FtpWriter$Task       - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_998_f4c0mn89.txt]
2024-03-13 13:33:03.463 [writer-0-205] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:33:03.463 [writer-0-205] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:33:03.463 [writer-0-171] INFO  FtpWriter$Task       - end do write
2024-03-13 13:33:03.464 [writer-0-173] INFO  FtpWriter$Task       - begin do write...
2024-03-13 13:33:03.464 [writer-0-173] INFO  FtpWriter$Task       - write to file : [/home/wgzhao/ftptest/ftpwriter/101-测试__20240313_133258_994_s8z8tsdr.txt]
2024-03-13 13:33:03.464 [writer-0-173] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao
2024-03-13 13:33:03.464 [writer-0-173] INFO  StandardFtpHelperImpl - current working directory:/home/wgzhao/ftptest/ftpwriter
2024-03-13 13:33:03.465 [writer-0-173] INFO  FtpWriter$Task       - end do write
2024-03-13 13:33:03.465 [reader-0-205] INFO  FtpReader$Task       - reading file : [/home/wgzhao/ftptest/ftpreader/600575.SH-000677.SZ.csv]
2024-03-13 13:33:03.466 [reader-0-205] WARN  StorageReaderUtil    - Uses [,] as delimiter by default
2024-03-13 13:33:03.466 [writer-0-205] INFO  FtpWriter$Task       - end do write
2024-03-13 13:33:03.467 [ reader-0-42] INFO  FtpReader$Task       - reading file : [/home/wgzhao/ftptest/ftpreader/688018.SH-000632.SZ.csv]
2024-03-13 13:33:03.467 [ reader-0-42] WARN  StorageReaderUtil    - Uses [,] as delimiter by default
2024-03-13 13:33:03.468 [ writer-0-42] INFO  FtpWriter$Task       - end do write
2024-03-13 13:33:03.499 [writer-0-158] INFO  FtpWriter$Task       - end do write
2024-03-13 13:33:05.096 [       job-0] INFO  AbstractScheduler    - The scheduler has completed all tasks.
2024-03-13 13:33:05.096 [       job-0] INFO  JobContainer         - The Writer.Job [ftpwriter] perform post work.
2024-03-13 13:33:05.096 [       job-0] INFO  JobContainer         - The Reader.Job [ftpreader] perform post work.
2024-03-13 13:33:05.103 [       job-0] INFO  StandAloneJobContainerCommunicator - Total 225262 records, 23043447 bytes | Speed 3.66MB/s, 37543 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.974s |  All Task WaitReaderTime 0.156s | Percentage 100.00%
2024-03-13 13:33:05.103 [       job-0] INFO  JobContainer         -
Job start  at             : 2024-03-13 13:32:58
Job end    at             : 2024-03-13 13:33:05
Job took secs             :                  6s
Average   bps             :            3.66MB/s
Average   rps             :          37543rec/s
Number of rec             :              225262
Failed record             :                   0

@Bear-big-code
Copy link
Author

Bear-big-code commented Mar 16, 2024

基本上文件夹下的文件在1000~2000左右的时候问题不大,但是我现在的情况是,每天都会有8~20万个文件的增量但文件不大,只有几百个字节一个文件,以天作为的文件夹;
咱这个支持分批去同步吗,比如20万个文件,分成200次,每次1000个文件?

@wgzhao
Copy link
Owner

wgzhao commented Mar 18, 2024

目前的模式每一个文件对应一个任务,也就是一个线程,如果一次读取上万,乃至上十万的话,等于一次性要开这么多个线程,在我本地模拟读取15万个文件时,直接退出了。如果你的文件命名是有规则的话,你可以在 path 项使用通配符的方式一次制定一批文件,这样应该可以临时解决你的问题,类似如下:

{
          "parameter": {
            "path": "/home/wgzhao/ftptest/ftpreader/100*.csv"
          }
}

@wgzhao wgzhao closed this as completed May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants