Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce flash size #243

Open
dbuezas opened this issue Jan 26, 2023 · 15 comments
Open

Reduce flash size #243

dbuezas opened this issue Jan 26, 2023 · 15 comments

Comments

@dbuezas
Copy link
Owner

dbuezas commented Jan 26, 2023

I did the experiment @LaZsolt and @nerdralph propose

Just replacing Timer3 initialisation frees up 16 bytes

before

sbi(TCCR3B, CS31);		// set timer 3 prescale factor to 64
sbi(TCCR3B, CS30);
sbi(TCCR3A, WGM30);		// put timer 3 in 8-bit phase correct pwm mode

after

TCCR3B = _BV(CS31) | _BV(CS30);		// set timer 3 prescale factor to 64
TCCR3A = _BV(WGM30);		// put timer 3 in 8-bit phase correct pwm mode

Modifying the other timers and the ADC prescaler code frees up a total of 78 bytes.

I assume we could also port the newer Wire & co libraries from the latest uno core, they seem to have become more efficient. I think @jg1uaa was trying that.

And of course making the EEPROM space configurable via menu would add another 2k

Originally posted by @dbuezas in #33 (comment)

Will take PRs if anybody wants to give it a shot :)

@LaZsolt
Copy link
Collaborator

LaZsolt commented Feb 28, 2023

1

I found that the main.cpp source code is a bit untidy because some CLOCK_SOURCE dependent code snippets can be found in several places. In the function void lgt8fx8x_init() and void lgt8fx8x_clk_src()
It would be clearer after the reorganization into one function.

2

On the other hand, four bytes could be saved in all such cases

// switch to internal crystal
GPIOR0 = PMCR & 0x9f;
PMCR = 0x80;
PMCR = GPIOR0;

if code replaced to this:

uint8_t xyz_temp = PMCR & 0x9f;
PMCR = 0x80;
PMCR = xyz_temp;

3

And one more. When main clock switched to external I think no need to wait after core switched.

// enable external crystal
PMCR = 0x80;
PMCR = 0x97;
// waiting for crystal stable
for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);
for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);
// switch to external crystal
PMCR = 0x80;
PMCR = 0xb7;
// waiting for crystal stable
for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);
for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);

There is also an opportunity to save here some bytes:

  uint8_t c = 1;
  do { for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0); } while (c--);

@dbuezas
Copy link
Owner Author

dbuezas commented Feb 28, 2023

Why does not using GPIOR0 save space?

Being a IO register, I think guarantees the compiler doesn't remove the code (b/c it is volatile).
It may be a way to have a very predictable busy wait in the loops.

@LaZsolt
Copy link
Collaborator

LaZsolt commented Feb 28, 2023

I compiled an original and a recommended code. After it done, I turned the code to assembler source.
C:\Program Files (x86)\Arduino\hardware\tools\avr\bin\avr-objdump.exe" -S test.elf > x.txt

Initial code snippet:

} else if(mode == INT_OSC) {
    // prescaler settings     - CLKPR addr 0x61
    CLKPR = 0x80;             // 80 e8         ldi r24, 0x80
                              // 80 93 61 00   sts 0x0061, r24
    CLKPR = 0x01;             // 91 e0         ldi r25, 0x01
                              // 90 93 61 00   sts 0x0061, r25

    // switch to int. crystal - PMCR addr 0xF2
    GPIOR0 = PMCR & 0x9f;     // 90 91 f2 00   lds r25, 0x00F2
                              // 9f 79         andi r25, 0x9F
                              // 9e bb         out 0x1e, r25
    PMCR = 0x80;              // 80 93 f2 00   sts 0x00F2, r24 - 0x80 is already in r24
    PMCR = GPIOR0;            // 9e b3         in r25, 0x1e
                              // 90 93 f2 00   sts 0x00F2, r25
                              - GPIOR0 addr 0x1e
    // disable ext. crystal
    GPIOR0 = PMCR & 0xfb;     // 90 91 f2 00   lds r25, 0x00F2
                              // 9b 7f         andi r25, 0xFB
                              // 9e bb         out 0x1e, r25
    PMCR = 0x80;              // 80 93 f2 00   sts 0x00F2, r24
    PMCR = GPIOR0;            // 9e b3         in r25, 0x1e
                              // 90 93 f2 00   sts 0x00F2, r25
}

Recommended code snippet:

} else if(mode == INT_OSC) {
    // prescaler settings     - CLKPR addr 0x61
    CLKPR = 0x80;             // 80 e8         ldi r24, 0x80
                              // 80 93 61 00   sts 0x0061, r24
    CLKPR = 0x01;             // 91 e0         ldi r25, 0x01
                              // 90 93 61 00   sts 0x0061, r25

    // switch to int. crystal - PMCR addr 0xF2
unint8_t temp_ = PMCR & 0x9f; // 90 91 f2 00   lds r25, 0x00F2
                              // 9f 79         andi r25, 0x9F
    PMCR = 0x80;              // 80 93 f2 00   sts 0x00F2, r24 - 0x80 is already in r24
    PMCR = temp_;             // 90 93 f2 00   sts 0x00F2, r25

    // disable ext. crystal
    temp_ = PMCR & 0xfb;      // 90 91 f2 00   lds r25, 0x00F2
                              // 9b 7f         andi r25, 0xFB
    PMCR = 0x80;              // 80 93 f2 00   sts 0x00F2, r24
    PMCR = temp_;             // 90 93 f2 00   sts 0x00F2, r25
}

@dbuezas
Copy link
Owner Author

dbuezas commented Feb 28, 2023

Oh, that makes sense now. If it's put in a variable, the compiler just keeps that at hand in a CPU register. GPIO0R is not a cpu register so extra operations are needed to move the value in and out.

Regarding the waits, I think I see what you are up to: using __asm__ __volatile__ ("nop\n\t"); inside a loop with a normal variable would also save space (maybe even two nop's or a uint16_t counter to avoid the having two loops, not sure what would take instructions)

@LaZsolt
Copy link
Collaborator

LaZsolt commented Mar 1, 2023

(maybe even two nop's or a uint16_t counter to avoid the having two loops, not sure what would take instructions)

Yes! This is the solution.

This code takes about 3566 cycles. (16 instructions, 32 bytes)

for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);
for(GPIOR0 = 0xff; GPIOR0 > 0; --GPIOR0);

Tis replacement code takes the same amount of cycles. (4 instructions, 8 bytes)

_lgt8fx_delay_cycles(3566);

_lgt8fx_delay_cycles() is part of the delayMicroseconds() and is based on __builtin_avr_delay_cycles()

@dbuezas
Copy link
Owner Author

dbuezas commented Mar 1, 2023

Nice! another 24 bytes down!
BTW, did you give the timer setup code a look? (the one in the first post in the issue)

@LaZsolt
Copy link
Collaborator

LaZsolt commented Mar 1, 2023

Replacing sbi with pure bit manipulation save a bunch of space too.

Yes, save a bunch of space. That was a good suggestion.
It will be a lot of work to prepare an addition menu for the EEPROM's size selection.

@dbuezas
Copy link
Owner Author

dbuezas commented Mar 1, 2023

It will be a lot of work to prepare an addition menu for the EEPROM's size selection.

The menu side is not that bad, I made that for the clock originally (that was originally the main reason for this repo to exist).

@dbuezas
Copy link
Owner Author

dbuezas commented Mar 1, 2023

If you give me a constant name and values for each EEPROM size I can add the menu side.
Code wise it is just a bunch of if macros

@dbuezas
Copy link
Owner Author

dbuezas commented Mar 1, 2023

Ok, I'm assuming that ot os enough to add variants to the boards.txt that change:

  • upload.maximum_data_size=xxx (currently only 2048)
  • upload.maximum_size=xxx (currently only 29696)
    Plus something inside main.cpp so the EEPROM library k ow how much space it had to set/clear without touching the program space.

Is there anything else that needs to be done?

Repository owner deleted a comment from LaZsolt Mar 1, 2023
@LaZsolt
Copy link
Collaborator

LaZsolt commented Mar 1, 2023

Few more byte savings when initializing Timer1.

before

#if defined(TCCR1B) && defined(CS11) && defined(CS10)
	TCCR1B = 0;

	// set timer 1 prescale factor to 64
	sbi(TCCR1B, CS11);
#if F_CPU >= 8000000L
	sbi(TCCR1B, CS10);
#endif
#elif defined(TCCR1) && defined(CS11) && defined(CS10)
	sbi(TCCR1, CS11);
#if F_CPU >= 8000000L
	sbi(TCCR1, CS10);
#endif
#endif

after

#if defined(TCCR1B) && defined(CS11) && defined(CS10)
    #if F_CPU >= 8000000L
        TCCR1B = _BV(CS11) | _BV(CS10);
    #else
        TCCR1B = _BV(CS11);
    #endif
#elif defined(TCCR1) && defined(CS11) && defined(CS10)
    #if F_CPU >= 8000000L
        TCCR1 = _BV(CS11) | _BV(CS10);
    #else
        TCCR1 = _BV(CS11);
    #endif
#endif

@dbuezas
Copy link
Owner Author

dbuezas commented Mar 1, 2023

Great!
Should we get rid of all those #if defined ? we're only supporting the 328p variant, right?

@LaZsolt
Copy link
Collaborator

LaZsolt commented Mar 1, 2023

Should we get rid of all those #if defined ? we're only supporting the 328p variant, right?

We get rid of almost all of them, but we supporting 328d too, isn't it?

@LaZsolt
Copy link
Collaborator

LaZsolt commented Mar 1, 2023

I'll be away from the computer for a few days, so I'm not sure I'll be able to answer during that time.

@LaZsolt
Copy link
Collaborator

LaZsolt commented Apr 11, 2023

Add menu options to disable PWM functions to save FLASH space.

In lot of cases PWM functions not used. If someone doesn't want to use PWM capability with this menu will frees up additional FLASH space by not initializing the timers in wiring.c init() function.

In the same way, perhaps the ADC initialization could also be omitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants