Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding the environment step #15

Open
teslacool opened this issue Jun 12, 2021 · 1 comment
Open

Regarding the environment step #15

teslacool opened this issue Jun 12, 2021 · 1 comment

Comments

@teslacool
Copy link

Hi, @MishaLaskin. Thanks for sharing your code.

After checking your code and running two examples on Cheetah Run, I have been confused about the definition of "step" used in your code. In each "step", the agent will interact with the Env once, and the step should be "policy step". However, In your readme, you mean that the step "S" is the total number of environment steps. After running your code by myself, I think the scores you reported in RAD paper should be consistent with the "S" in your log, which is not consistent with the definition of 100K/500K environment steps.

So, can you tell me what is wrong in my words above?

The attached logs are from two eval.log files.

{"episode": 0.0, "episode_reward": 0.4356266596720902, "eval_time": 36.183101415634155, "mean_episode_reward": 0.4356266596720902, "best_episode_reward": 0.5738957398540733, "step": 0}
{"episode": 40.0, "episode_reward": 0.3922430059714884, "eval_time": 33.28531265258789, "mean_episode_reward": 0.3922430059714884, "best_episode_reward": 0.508110810837342, "step": 10000}
{"episode": 80.0, "episode_reward": 220.56826508015843, "eval_time": 35.10482168197632, "mean_episode_reward": 220.56826508015843, "best_episode_reward": 307.75619211624013, "step": 20000}
{"episode": 120.0, "episode_reward": 398.2377631669576, "eval_time": 33.28292393684387, "mean_episode_reward": 398.2377631669575, "best_episode_reward": 444.95074732252925, "step": 30000}
{"episode": 160.0, "episode_reward": 250.83722006270668, "eval_time": 33.52418875694275, "mean_episode_reward": 250.83722006270668, "best_episode_reward": 480.00584557605333, "step": 40000}
{"episode": 200.0, "episode_reward": 460.00020802529644, "eval_time": 32.313072204589844, "mean_episode_reward": 460.00020802529644, "best_episode_reward": 522.8574259829004, "step": 50000}
{"episode": 240.0, "episode_reward": 479.65149522971086, "eval_time": 34.7544846534729, "mean_episode_reward": 479.65149522971086, "best_episode_reward": 501.6123850779641, "step": 60000}
{"episode": 280.0, "episode_reward": 476.09042867994657, "eval_time": 35.83760476112366, "mean_episode_reward": 476.09042867994657, "best_episode_reward": 529.4322697389259, "step": 70000}
{"episode": 320.0, "episode_reward": 494.73414263618014, "eval_time": 36.749940156936646, "mean_episode_reward": 494.73414263618025, "best_episode_reward": 532.1412143794047, "step": 80000}
{"episode": 360.0, "episode_reward": 503.4997884511032, "eval_time": 35.34500050544739, "mean_episode_reward": 503.49978845110326, "best_episode_reward": 555.0621061796511, "step": 90000}
{"episode": 400.0, "episode_reward": 521.1039573445707, "eval_time": 33.78158783912659, "mean_episode_reward": 521.1039573445709, "best_episode_reward": 591.2907124375452, "step": 100000}
{"episode": 440.0, "episode_reward": 535.4868600367246, "eval_time": 38.792107820510864, "mean_episode_reward": 535.4868600367246, "best_episode_reward": 555.9852780069997, "step": 110000}
{"episode": 480.0, "episode_reward": 570.3194295888213, "eval_time": 36.30574131011963, "mean_episode_reward": 570.3194295888212, "best_episode_reward": 600.9994753574631, "step": 120000}
{"episode": 520.0, "episode_reward": 468.78191709698757, "eval_time": 34.748950719833374, "mean_episode_reward": 468.7819170969875, "best_episode_reward": 621.3911181333744, "step": 130000}
{"episode": 560.0, "episode_reward": 570.5285485782709, "eval_time": 32.7656946182251, "mean_episode_reward": 570.528548578271, "best_episode_reward": 634.9763550491491, "step": 140000}
{"episode": 600.0, "episode_reward": 589.9510409716655, "eval_time": 37.38888740539551, "mean_episode_reward": 589.9510409716655, "best_episode_reward": 613.628584188713, "step": 150000}
{"episode": 640.0, "episode_reward": 608.6108533553306, "eval_time": 34.01275634765625, "mean_episode_reward": 608.6108533553305, "best_episode_reward": 630.7835900332116, "step": 160000}
{"episode": 680.0, "episode_reward": 621.349796039415, "eval_time": 34.877559185028076, "mean_episode_reward": 621.349796039415, "best_episode_reward": 652.1881834920341, "step": 170000}
{"episode": 720.0, "episode_reward": 639.9826285214389, "eval_time": 35.06516098976135, "mean_episode_reward": 639.9826285214391, "best_episode_reward": 692.7047252421308, "step": 180000}
{"episode": 760.0, "episode_reward": 657.0809243258393, "eval_time": 34.67800784111023, "mean_episode_reward": 657.0809243258393, "best_episode_reward": 701.9084114329122, "step": 190000}
{"episode": 800.0, "episode_reward": 660.2335865549261, "eval_time": 35.4080011844635, "mean_episode_reward": 660.2335865549261, "best_episode_reward": 693.3057471600971, "step": 200000}
{"episode": 840.0, "episode_reward": 697.402447344539, "eval_time": 34.40719747543335, "mean_episode_reward": 697.402447344539, "best_episode_reward": 715.7758698341383, "step": 210000}
{"episode": 880.0, "episode_reward": 592.9811909358905, "eval_time": 35.32376289367676, "mean_episode_reward": 592.9811909358905, "best_episode_reward": 693.0957377543289, "step": 220000}
{"episode": 920.0, "episode_reward": 624.3706808478648, "eval_time": 34.04270315170288, "mean_episode_reward": 624.370680847865, "best_episode_reward": 735.8346539087496, "step": 230000}
{"episode": 960.0, "episode_reward": 694.2616569645118, "eval_time": 35.66026258468628, "mean_episode_reward": 694.2616569645118, "best_episode_reward": 720.5584114947538, "step": 240000}
{"episode": 1000.0, "episode_reward": 718.3903847034084, "eval_time": 35.21766233444214, "mean_episode_reward": 718.3903847034085, "best_episode_reward": 753.6853742455136, "step": 250000}
{"episode": 1040.0, "episode_reward": 702.1211235500631, "eval_time": 33.36829137802124, "mean_episode_reward": 702.121123550063, "best_episode_reward": 717.4916829146014, "step": 260000}
{"episode": 1080.0, "episode_reward": 609.803098127629, "eval_time": 33.54708957672119, "mean_episode_reward": 609.803098127629, "best_episode_reward": 766.2049338352589, "step": 270000}
{"episode": 1120.0, "episode_reward": 707.1653135867184, "eval_time": 34.566941022872925, "mean_episode_reward": 707.1653135867184, "best_episode_reward": 739.0773991353415, "step": 280000}
{"episode": 1160.0, "episode_reward": 688.0882931992711, "eval_time": 35.06712627410889, "mean_episode_reward": 688.0882931992711, "best_episode_reward": 761.1328032472578, "step": 290000}
{"episode": 1200.0, "episode_reward": 703.0424713866912, "eval_time": 34.390944719314575, "mean_episode_reward": 703.0424713866912, "best_episode_reward": 724.5926418921965, "step": 300000}
{"episode": 1240.0, "episode_reward": 711.8778025060885, "eval_time": 33.79719519615173, "mean_episode_reward": 711.8778025060883, "best_episode_reward": 731.9311859193582, "step": 310000}
{"episode": 1280.0, "episode_reward": 733.082565145378, "eval_time": 38.129096031188965, "mean_episode_reward": 733.0825651453781, "best_episode_reward": 764.223996955497, "step": 320000}
{"episode": 1320.0, "episode_reward": 739.5864927476559, "eval_time": 33.41835594177246, "mean_episode_reward": 739.5864927476559, "best_episode_reward": 774.7373150210634, "step": 330000}
{"episode": 1360.0, "episode_reward": 755.9029077573136, "eval_time": 35.0951726436615, "mean_episode_reward": 755.9029077573135, "best_episode_reward": 793.3344257663872, "step": 340000}
{"episode": 1400.0, "episode_reward": 743.4416125196194, "eval_time": 34.633925914764404, "mean_episode_reward": 743.4416125196193, "best_episode_reward": 770.059243984834, "step": 350000}
{"episode": 1440.0, "episode_reward": 741.9526813307564, "eval_time": 33.76918888092041, "mean_episode_reward": 741.9526813307564, "best_episode_reward": 774.7818297751057, "step": 360000}
{"episode": 1480.0, "episode_reward": 764.8214025481939, "eval_time": 33.95463514328003, "mean_episode_reward": 764.821402548194, "best_episode_reward": 791.9539171365969, "step": 370000}
{"episode": 1520.0, "episode_reward": 790.6449580962574, "eval_time": 34.05108332633972, "mean_episode_reward": 790.6449580962574, "best_episode_reward": 826.7780160596269, "step": 380000}
{"episode": 1560.0, "episode_reward": 773.8353241728249, "eval_time": 34.79716157913208, "mean_episode_reward": 773.8353241728248, "best_episode_reward": 785.9659530128242, "step": 390000}
{"episode": 1600.0, "episode_reward": 802.0251560266711, "eval_time": 34.08993315696716, "mean_episode_reward": 802.0251560266711, "best_episode_reward": 842.8616140833693, "step": 400000}
{"episode": 1640.0, "episode_reward": 792.1746969223113, "eval_time": 34.50390124320984, "mean_episode_reward": 792.1746969223113, "best_episode_reward": 809.7959418657359, "step": 410000}
{"episode": 1680.0, "episode_reward": 794.8947277337959, "eval_time": 33.073450326919556, "mean_episode_reward": 794.8947277337959, "best_episode_reward": 828.8074622256813, "step": 420000}
{"episode": 1720.0, "episode_reward": 811.7859554210896, "eval_time": 33.70336389541626, "mean_episode_reward": 811.7859554210897, "best_episode_reward": 832.9644620530241, "step": 430000}
{"episode": 1760.0, "episode_reward": 805.4085435876792, "eval_time": 32.99878263473511, "mean_episode_reward": 805.408543587679, "best_episode_reward": 853.4787709685797, "step": 440000}
{"episode": 1800.0, "episode_reward": 778.5883336320175, "eval_time": 34.7266731262207, "mean_episode_reward": 778.5883336320175, "best_episode_reward": 823.3223646673323, "step": 450000}
{"episode": 1840.0, "episode_reward": 794.9255518584372, "eval_time": 34.07718515396118, "mean_episode_reward": 794.9255518584372, "best_episode_reward": 827.8027143150523, "step": 460000}
{"episode": 1880.0, "episode_reward": 751.8466694444098, "eval_time": 34.40741038322449, "mean_episode_reward": 751.8466694444098, "best_episode_reward": 777.1449433874147, "step": 470000}
{"episode": 1920.0, "episode_reward": 780.8817593299839, "eval_time": 33.594924449920654, "mean_episode_reward": 780.8817593299839, "best_episode_reward": 816.295442656762, "step": 480000}
{"episode": 1960.0, "episode_reward": 792.405602392688, "eval_time": 34.01686120033264, "mean_episode_reward": 792.405602392688, "best_episode_reward": 844.5211862361997, "step": 490000}
{"episode": 2000.0, "episode_reward": 762.5517965343424, "eval_time": 33.41407632827759, "mean_episode_reward": 762.5517965343424, "best_episode_reward": 803.3526927641574, "step": 500000}
{"episode": 2040.0, "episode_reward": 793.6626955732088, "eval_time": 33.39684510231018, "mean_episode_reward": 793.6626955732088, "best_episode_reward": 828.2489271676886, "step": 510000}
{"episode": 2080.0, "episode_reward": 799.2933831819107, "eval_time": 33.60454487800598, "mean_episode_reward": 799.2933831819107, "best_episode_reward": 843.1845330559579, "step": 520000}
{"episode": 2120.0, "episode_reward": 836.847778730096, "eval_time": 37.38401985168457, "mean_episode_reward": 836.847778730096, "best_episode_reward": 883.1506616575153, "step": 530000}
{"episode": 2160.0, "episode_reward": 806.6992951067512, "eval_time": 33.07817530632019, "mean_episode_reward": 806.6992951067509, "best_episode_reward": 852.9010909366383, "step": 540000}
{"episode": 2200.0, "episode_reward": 788.178802548356, "eval_time": 34.04860234260559, "mean_episode_reward": 788.1788025483562, "best_episode_reward": 854.0410925817004, "step": 550000}
{"episode": 2240.0, "episode_reward": 827.3052089627469, "eval_time": 34.09475040435791, "mean_episode_reward": 827.305208962747, "best_episode_reward": 852.7861605672563, "step": 560000}
{"episode": 2280.0, "episode_reward": 822.6227486401601, "eval_time": 34.39635348320007, "mean_episode_reward": 822.6227486401601, "best_episode_reward": 862.5865496074978, "step": 570000}
{"episode": 2320.0, "episode_reward": 813.5607311026849, "eval_time": 32.63203525543213, "mean_episode_reward": 813.5607311026849, "best_episode_reward": 849.9138325938014, "step": 580000}
{"episode": 2360.0, "episode_reward": 829.4307070315997, "eval_time": 33.414421796798706, "mean_episode_reward": 829.4307070315997, "best_episode_reward": 850.2699967071469, "step": 590000}
{"episode": 0.0, "episode_reward": 0.4356266596720902, "eval_time": 36.183101415634155, "mean_episode_reward": 0.4356266596720902, "best_episode_reward": 0.5738957398540733, "step": 0}
{"episode": 40.0, "episode_reward": 0.3922430059714884, "eval_time": 33.28531265258789, "mean_episode_reward": 0.3922430059714884, "best_episode_reward": 0.508110810837342, "step": 10000}
{"episode": 80.0, "episode_reward": 220.56826508015843, "eval_time": 35.10482168197632, "mean_episode_reward": 220.56826508015843, "best_episode_reward": 307.75619211624013, "step": 20000}
{"episode": 120.0, "episode_reward": 398.2377631669576, "eval_time": 33.28292393684387, "mean_episode_reward": 398.2377631669575, "best_episode_reward": 444.95074732252925, "step": 30000}
{"episode": 160.0, "episode_reward": 250.83722006270668, "eval_time": 33.52418875694275, "mean_episode_reward": 250.83722006270668, "best_episode_reward": 480.00584557605333, "step": 40000}
{"episode": 200.0, "episode_reward": 460.00020802529644, "eval_time": 32.313072204589844, "mean_episode_reward": 460.00020802529644, "best_episode_reward": 522.8574259829004, "step": 50000}
{"episode": 240.0, "episode_reward": 479.65149522971086, "eval_time": 34.7544846534729, "mean_episode_reward": 479.65149522971086, "best_episode_reward": 501.6123850779641, "step": 60000}
{"episode": 280.0, "episode_reward": 476.09042867994657, "eval_time": 35.83760476112366, "mean_episode_reward": 476.09042867994657, "best_episode_reward": 529.4322697389259, "step": 70000}
{"episode": 320.0, "episode_reward": 494.73414263618014, "eval_time": 36.749940156936646, "mean_episode_reward": 494.73414263618025, "best_episode_reward": 532.1412143794047, "step": 80000}
{"episode": 360.0, "episode_reward": 503.4997884511032, "eval_time": 35.34500050544739, "mean_episode_reward": 503.49978845110326, "best_episode_reward": 555.0621061796511, "step": 90000}
{"episode": 400.0, "episode_reward": 521.1039573445707, "eval_time": 33.78158783912659, "mean_episode_reward": 521.1039573445709, "best_episode_reward": 591.2907124375452, "step": 100000}
{"episode": 440.0, "episode_reward": 535.4868600367246, "eval_time": 38.792107820510864, "mean_episode_reward": 535.4868600367246, "best_episode_reward": 555.9852780069997, "step": 110000}
{"episode": 480.0, "episode_reward": 570.3194295888213, "eval_time": 36.30574131011963, "mean_episode_reward": 570.3194295888212, "best_episode_reward": 600.9994753574631, "step": 120000}
{"episode": 520.0, "episode_reward": 468.78191709698757, "eval_time": 34.748950719833374, "mean_episode_reward": 468.7819170969875, "best_episode_reward": 621.3911181333744, "step": 130000}
{"episode": 560.0, "episode_reward": 570.5285485782709, "eval_time": 32.7656946182251, "mean_episode_reward": 570.528548578271, "best_episode_reward": 634.9763550491491, "step": 140000}
{"episode": 600.0, "episode_reward": 589.9510409716655, "eval_time": 37.38888740539551, "mean_episode_reward": 589.9510409716655, "best_episode_reward": 613.628584188713, "step": 150000}
{"episode": 640.0, "episode_reward": 608.6108533553306, "eval_time": 34.01275634765625, "mean_episode_reward": 608.6108533553305, "best_episode_reward": 630.7835900332116, "step": 160000}
{"episode": 680.0, "episode_reward": 621.349796039415, "eval_time": 34.877559185028076, "mean_episode_reward": 621.349796039415, "best_episode_reward": 652.1881834920341, "step": 170000}
{"episode": 720.0, "episode_reward": 639.9826285214389, "eval_time": 35.06516098976135, "mean_episode_reward": 639.9826285214391, "best_episode_reward": 692.7047252421308, "step": 180000}
{"episode": 760.0, "episode_reward": 657.0809243258393, "eval_time": 34.67800784111023, "mean_episode_reward": 657.0809243258393, "best_episode_reward": 701.9084114329122, "step": 190000}
{"episode": 800.0, "episode_reward": 660.2335865549261, "eval_time": 35.4080011844635, "mean_episode_reward": 660.2335865549261, "best_episode_reward": 693.3057471600971, "step": 200000}
{"episode": 840.0, "episode_reward": 697.402447344539, "eval_time": 34.40719747543335, "mean_episode_reward": 697.402447344539, "best_episode_reward": 715.7758698341383, "step": 210000}
{"episode": 880.0, "episode_reward": 592.9811909358905, "eval_time": 35.32376289367676, "mean_episode_reward": 592.9811909358905, "best_episode_reward": 693.0957377543289, "step": 220000}
{"episode": 920.0, "episode_reward": 624.3706808478648, "eval_time": 34.04270315170288, "mean_episode_reward": 624.370680847865, "best_episode_reward": 735.8346539087496, "step": 230000}
{"episode": 960.0, "episode_reward": 694.2616569645118, "eval_time": 35.66026258468628, "mean_episode_reward": 694.2616569645118, "best_episode_reward": 720.5584114947538, "step": 240000}
{"episode": 1000.0, "episode_reward": 718.3903847034084, "eval_time": 35.21766233444214, "mean_episode_reward": 718.3903847034085, "best_episode_reward": 753.6853742455136, "step": 250000}
{"episode": 1040.0, "episode_reward": 702.1211235500631, "eval_time": 33.36829137802124, "mean_episode_reward": 702.121123550063, "best_episode_reward": 717.4916829146014, "step": 260000}
{"episode": 1080.0, "episode_reward": 609.803098127629, "eval_time": 33.54708957672119, "mean_episode_reward": 609.803098127629, "best_episode_reward": 766.2049338352589, "step": 270000}
{"episode": 1120.0, "episode_reward": 707.1653135867184, "eval_time": 34.566941022872925, "mean_episode_reward": 707.1653135867184, "best_episode_reward": 739.0773991353415, "step": 280000}
{"episode": 1160.0, "episode_reward": 688.0882931992711, "eval_time": 35.06712627410889, "mean_episode_reward": 688.0882931992711, "best_episode_reward": 761.1328032472578, "step": 290000}
{"episode": 1200.0, "episode_reward": 703.0424713866912, "eval_time": 34.390944719314575, "mean_episode_reward": 703.0424713866912, "best_episode_reward": 724.5926418921965, "step": 300000}
{"episode": 1240.0, "episode_reward": 711.8778025060885, "eval_time": 33.79719519615173, "mean_episode_reward": 711.8778025060883, "best_episode_reward": 731.9311859193582, "step": 310000}
{"episode": 1280.0, "episode_reward": 733.082565145378, "eval_time": 38.129096031188965, "mean_episode_reward": 733.0825651453781, "best_episode_reward": 764.223996955497, "step": 320000}
{"episode": 1320.0, "episode_reward": 739.5864927476559, "eval_time": 33.41835594177246, "mean_episode_reward": 739.5864927476559, "best_episode_reward": 774.7373150210634, "step": 330000}
{"episode": 1360.0, "episode_reward": 755.9029077573136, "eval_time": 35.0951726436615, "mean_episode_reward": 755.9029077573135, "best_episode_reward": 793.3344257663872, "step": 340000}
{"episode": 1400.0, "episode_reward": 743.4416125196194, "eval_time": 34.633925914764404, "mean_episode_reward": 743.4416125196193, "best_episode_reward": 770.059243984834, "step": 350000}
{"episode": 1440.0, "episode_reward": 741.9526813307564, "eval_time": 33.76918888092041, "mean_episode_reward": 741.9526813307564, "best_episode_reward": 774.7818297751057, "step": 360000}
{"episode": 1480.0, "episode_reward": 764.8214025481939, "eval_time": 33.95463514328003, "mean_episode_reward": 764.821402548194, "best_episode_reward": 791.9539171365969, "step": 370000}
{"episode": 1520.0, "episode_reward": 790.6449580962574, "eval_time": 34.05108332633972, "mean_episode_reward": 790.6449580962574, "best_episode_reward": 826.7780160596269, "step": 380000}
{"episode": 1560.0, "episode_reward": 773.8353241728249, "eval_time": 34.79716157913208, "mean_episode_reward": 773.8353241728248, "best_episode_reward": 785.9659530128242, "step": 390000}
{"episode": 1600.0, "episode_reward": 802.0251560266711, "eval_time": 34.08993315696716, "mean_episode_reward": 802.0251560266711, "best_episode_reward": 842.8616140833693, "step": 400000}
{"episode": 1640.0, "episode_reward": 792.1746969223113, "eval_time": 34.50390124320984, "mean_episode_reward": 792.1746969223113, "best_episode_reward": 809.7959418657359, "step": 410000}
{"episode": 1680.0, "episode_reward": 794.8947277337959, "eval_time": 33.073450326919556, "mean_episode_reward": 794.8947277337959, "best_episode_reward": 828.8074622256813, "step": 420000}
{"episode": 1720.0, "episode_reward": 811.7859554210896, "eval_time": 33.70336389541626, "mean_episode_reward": 811.7859554210897, "best_episode_reward": 832.9644620530241, "step": 430000}
{"episode": 1760.0, "episode_reward": 805.4085435876792, "eval_time": 32.99878263473511, "mean_episode_reward": 805.408543587679, "best_episode_reward": 853.4787709685797, "step": 440000}
{"episode": 1800.0, "episode_reward": 778.5883336320175, "eval_time": 34.7266731262207, "mean_episode_reward": 778.5883336320175, "best_episode_reward": 823.3223646673323, "step": 450000}
{"episode": 1840.0, "episode_reward": 794.9255518584372, "eval_time": 34.07718515396118, "mean_episode_reward": 794.9255518584372, "best_episode_reward": 827.8027143150523, "step": 460000}
{"episode": 1880.0, "episode_reward": 751.8466694444098, "eval_time": 34.40741038322449, "mean_episode_reward": 751.8466694444098, "best_episode_reward": 777.1449433874147, "step": 470000}
{"episode": 1920.0, "episode_reward": 780.8817593299839, "eval_time": 33.594924449920654, "mean_episode_reward": 780.8817593299839, "best_episode_reward": 816.295442656762, "step": 480000}
{"episode": 1960.0, "episode_reward": 792.405602392688, "eval_time": 34.01686120033264, "mean_episode_reward": 792.405602392688, "best_episode_reward": 844.5211862361997, "step": 490000}
{"episode": 2000.0, "episode_reward": 762.5517965343424, "eval_time": 33.41407632827759, "mean_episode_reward": 762.5517965343424, "best_episode_reward": 803.3526927641574, "step": 500000}
{"episode": 2040.0, "episode_reward": 793.6626955732088, "eval_time": 33.39684510231018, "mean_episode_reward": 793.6626955732088, "best_episode_reward": 828.2489271676886, "step": 510000}
{"episode": 2080.0, "episode_reward": 799.2933831819107, "eval_time": 33.60454487800598, "mean_episode_reward": 799.2933831819107, "best_episode_reward": 843.1845330559579, "step": 520000}
{"episode": 2120.0, "episode_reward": 836.847778730096, "eval_time": 37.38401985168457, "mean_episode_reward": 836.847778730096, "best_episode_reward": 883.1506616575153, "step": 530000}
{"episode": 2160.0, "episode_reward": 806.6992951067512, "eval_time": 33.07817530632019, "mean_episode_reward": 806.6992951067509, "best_episode_reward": 852.9010909366383, "step": 540000}
{"episode": 2200.0, "episode_reward": 788.178802548356, "eval_time": 34.04860234260559, "mean_episode_reward": 788.1788025483562, "best_episode_reward": 854.0410925817004, "step": 550000}
{"episode": 2240.0, "episode_reward": 827.3052089627469, "eval_time": 34.09475040435791, "mean_episode_reward": 827.305208962747, "best_episode_reward": 852.7861605672563, "step": 560000}
{"episode": 2280.0, "episode_reward": 822.6227486401601, "eval_time": 34.39635348320007, "mean_episode_reward": 822.6227486401601, "best_episode_reward": 862.5865496074978, "step": 570000}
{"episode": 2320.0, "episode_reward": 813.5607311026849, "eval_time": 32.63203525543213, "mean_episode_reward": 813.5607311026849, "best_episode_reward": 849.9138325938014, "step": 580000}
{"episode": 2360.0, "episode_reward": 829.4307070315997, "eval_time": 33.414421796798706, "mean_episode_reward": 829.4307070315997, "best_episode_reward": 850.2699967071469, "step": 590000}

@TaoHuang13
Copy link

Hi, I met the same question. Maybe you can check the output file in the format 'xxx_eval_scores.npy'. The 'steps' there refer to the environment steps. It should be equal to the 'S' in the log times action repeat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants