Why use -mfloat-abi=softfp on MODM7AE70?

Discussion to talk about software related topics only.
Post Reply
KE5FX
Posts: 22
Joined: Tue Dec 10, 2019 11:17 pm

Why use -mfloat-abi=softfp on MODM7AE70?

Post by KE5FX »

My particular application needs as much double-precision performance as it can get. On a Teensy 4 with -mfpu=fpv5-d16 -mfloat-abi=hard , a particular chunk of my DSP code is running at 1 MS/sec, but on the MODM7AE70 it's only doing 67 kS/s.

Admittedly the iMXRT1062 on the Teensy 4 is running at 600 MHz instead of 300, but that's still almost 8x difference at the same clock speed. I tried setting -mfloat-abi=hard in every makefile I could find and running make clean/make, but the performance is exactly the same.

In general, are there any downsides to compiling with -mfloat-abi=hard, given that no precompiled third-party libraries are being used? My understanding is that softfp versus hard should make almost no difference, but something is really hosing double-precision FP performance on the SAME70.
User avatar
dciliske
Posts: 623
Joined: Mon Feb 06, 2012 9:37 am
Location: San Diego, CA
Contact:

Re: Why use -mfloat-abi=softfp on MODM7AE70?

Post by dciliske »

Where are your stack and samples located?

If you've simply created a random variable, it will be allocated is SDRAM, which will end up bottle necked. Almost all hard problems end up being memory bandwidth bound before being CPU bound. You will want to use the FAST_USER_VAR and if you've allocated a stack for running this in a new task, you'll want to use FAST_USER_STK for the stack.

This will tell the linker to allocate the variables in the on-die SRAM, which is much higher bandwidth. (If you are using 'OSSimpleTaskCreatewNameSRAM' for task creation, this already places the task stack in SRAM.)

Code: Select all

#include <constants.h>
...
sample_type_t samples[SAMPLE_BUF_SIZ] FAST_USER_VAR;
uint32_t DSP_Task_Stk[DSP_TASK_STK_SIZ] FAST_USER_STK;
...
Dan Ciliske
Project Engineer
Netburner, Inc
KE5FX
Posts: 22
Joined: Tue Dec 10, 2019 11:17 pm

Re: Why use -mfloat-abi=softfp on MODM7AE70?

Post by KE5FX »

Good to know, Dan -- thanks again for your help. FAST_USER_VAR bought me a 7x speedup.

You guys might consider documenting this stuff. Now that I know what to Google for, I see a helpful blog entry from a couple of years ago, and there's a 2008-era NNDK Programming Guide .PDF floating around that talks about FAST_USER_VAR with respect to the ColdFire platforms. But there's no entry for it in the current help files, at least that I could find. A 7x-8x speedup is worth mentioning someplace where people will see it.
User avatar
dciliske
Posts: 623
Joined: Mon Feb 06, 2012 9:37 am
Location: San Diego, CA
Contact:

Re: Why use -mfloat-abi=softfp on MODM7AE70?

Post by dciliske »

The problem I always run into with documenting these things is that I know too much. I'm the person who designs those linker files these days after all!

For things that are not 'driver' or 'software library' related, the question is always how to collect the information in a way that is accessible and findable for those who need to know. Do you have any commentary of where you would search for a feature that 'moves where variables are stored' or allows for 'performance optimizations'?

If only I could get ARM and other vendors to write decent docs as well... (I still haven't actually figured out how to get SWO instruction tracing to work, even though I've written an ARM debugger.)
Dan Ciliske
Project Engineer
Netburner, Inc
KE5FX
Posts: 22
Joined: Tue Dec 10, 2019 11:17 pm

Re: Why use -mfloat-abi=softfp on MODM7AE70?

Post by KE5FX »

dciliske wrote: Wed Mar 25, 2020 10:45 pm The problem I always run into with documenting these things is that I know too much. I'm the person who designs those linker files these days after all!

For things that are not 'driver' or 'software library' related, the question is always how to collect the information in a way that is accessible and findable for those who need to know. Do you have any commentary of where you would search for a feature that 'moves where variables are stored' or allows for 'performance optimizations'?

If only I could get ARM and other vendors to write decent docs as well... (I still haven't actually figured out how to get SWO instruction tracing to work, even though I've written an ARM debugger.)
Yeah, the ARM and chip vendor manuals are often the last resort for me, just because they leave out so much despite being 3000+ pages long. Either the information is in another castle^W manual, probably one that's only available under NDA, or the information I need is there someplace but inaccessible unless I already know what to look for. As in this case, if I already know that, I probably would have just looked in the OS source or examples.

I will say that it's nice to have a single flat .PDF that collects all of the information available about the platform, like your older manuals. The hypertext help system is valuable overall, but specific feature and function descriptions are often too terse to be useful [1], and if I'm browsing in the wrong section entirely I may not realize it. It doesn't look like you maintain a separate .PDF version any longer, but that's one thing I'd vote for.

In this specific case I'd look for a section called "Performance Tuning and Optimization" or something similar. In a .PDF that's just a ToC entry. But in a hypertext system, if I don't have the sidebar open to the right section and/or don't know what to search for, I probably will never see it.

[1] As a recent example, I ended up doing this:

Code: Select all

#define OSSemHz IntervalOSSem
... because I can never remember if an "interval" is defined as a frequency in Hz, a period in seconds, a number of ticks, or whatever. The documentation for IntervalOSSem simply says "Posts to a semaphore at the requested interval." Uh, OK, thanks. :? Anyway, that's a digression from the topic, but I'm pretty opinionated when it comes to docs...
Post Reply