<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>VFP &#8211; richliu&#039;s blog</title>
	<atom:link href="https://blog.richliu.com/tag/vfp/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.richliu.com</link>
	<description>Linux, 工作, 生活, 家人</description>
	<lastBuildDate>Mon, 30 Apr 2012 15:18:13 +0000</lastBuildDate>
	<language>zh-TW</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.6.2</generator>
	<item>
		<title>如何確認 uclibc 是有 VFP 最佳化的?</title>
		<link>https://blog.richliu.com/2010/09/01/985/%e5%a6%82%e4%bd%95%e7%a2%ba%e8%aa%8d-uclibc-%e6%98%af%e6%9c%89-vfp-%e6%9c%80%e4%bd%b3%e5%8c%96%e7%9a%84/</link>
					<comments>https://blog.richliu.com/2010/09/01/985/%e5%a6%82%e4%bd%95%e7%a2%ba%e8%aa%8d-uclibc-%e6%98%af%e6%9c%89-vfp-%e6%9c%80%e4%bd%b3%e5%8c%96%e7%9a%84/#respond</comments>
		
		<dc:creator><![CDATA[richliu]]></dc:creator>
		<pubDate>Wed, 01 Sep 2010 10:59:00 +0000</pubDate>
				<category><![CDATA[ARM]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[arm]]></category>
		<category><![CDATA[eabi]]></category>
		<category><![CDATA[GCC]]></category>
		<category><![CDATA[uclibc]]></category>
		<category><![CDATA[VFP]]></category>
		<guid isPermaLink="false">http://blog.richliu.com/?p=985</guid>

					<description><![CDATA[<p>這標題隨便下的, 只是一個筆記的 Note. 一般來說使用 arm 的平台很少會去注意是不是有 VFP 支援, [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://blog.richliu.com/2010/09/01/985/%e5%a6%82%e4%bd%95%e7%a2%ba%e8%aa%8d-uclibc-%e6%98%af%e6%9c%89-vfp-%e6%9c%80%e4%bd%b3%e5%8c%96%e7%9a%84/">如何確認 uclibc 是有 VFP 最佳化的?</a> appeared first on <a rel="nofollow" href="https://blog.richliu.com">richliu&#039;s blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>這標題隨便下的, 只是一個筆記的 Note.<br />
<span id="more-985"></span></p>
<p>一般來說使用 arm 的平台很少會去注意是不是有 VFP 支援, 一般 Embedded Linux 內會用到數學函式運算的系統不多. (當然 Android 之後就變多了).</p>
<p>首先看來一個簡單的程式<br />
[C]<br />
int main(void){<br />
double a=2.2,b=1.1,c;</p>
<p>c=a*b;<br />
return 0;<br />
}<br />
[/C]<br />
這時用 arm-linux-gcc 去 compile 這行程式, 預設是 software floating point<br />
我們會看到, 程式會去呼叫 function __aeabi_mul 去運算 dmul.</p>
<p>[BASH]<br />
# arm-linux-gcc -c -g -Wa,-a,-ad 1.c ; arm-linux-objdump -dS 1.o  | less<br />
[/BASH]</p>
<p>[TEXT]<br />
c=a*b;<br />
30:   e24b002c        sub     r0, fp, #44     ; 0x2c<br />
34:   e8900003        ldm     r0, {r0, r1}<br />
38:   e24b2024        sub     r2, fp, #36     ; 0x24<br />
3c:   e892000c        ldm     r2, {r2, r3}<br />
40:   ebfffffe        bl      0 &lt;__aeabi_dmul&gt;<br />
44:   e1a03000        mov     r3, r0<br />
48:   e1a04001        mov     r4, r1<br />
4c:   e50b301c        str     r3, [fp, #-28]<br />
50:   e50b4018        str     r4, [fp, #-24]<br />
[/TEXT]</p>
<p>如果是使用 VFP 呢? 只要簡單幾行指令就可以完成原來還要呼叫 eabi_dmul 的事情<br />
[BASH]<br />
# arm-linux-gcc -mfloat-abi=softfp -c -g -Wa,-a,-ad 1.c ; arm-linux-objdump -dS 1.o  | less<br />
[/BASH]<br />
可以得到<br />
[TEXT]<br />
c=a*b;<br />
30:   ed1b6b0b        vldr    d6, [fp, #-44]<br />
34:   ed1b7b09        vldr    d7, [fp, #-36]<br />
38:   ee267b07        fmuld   d7, d6, d7<br />
3c:   ed0b7b07        vstr    d7, [fp, #-28]<br />
[/TEXT]</p>
<p>而 __eabi_dmul 藏在那邊呢? 答案就在 gcc 內<br />
在 gcc source code , gcc/config/arm/arm.c:  內有這一行 define<br />
set_optab_libfunc (smul_optab, DFmode, &#8220;__aeabi_dmul&#8221;);<br />
而 gcc/config/arm/ieee754-df.S 內記錄著 aeabi_dmul 的實際 software code.<br />
看起來落落長, 難怪效能差很多 :p</p>
<p>那 uclibc binary 有沒有支援 vfp 怎麼檢查呢 ?<br />
我先暫時是去 dump libm.so 的內容, 如果有使用到 vldr, fmuld 之類的指令, 表示確實有用到 VFP 的指令.<br />
[BASH]<br />
# arm-linux-objdump -dS libm-0.9.31.so<br />
[/BASH]<br />
*註: 看起來 uclibc 會有自己的數學指令(不確定)</p>
<p>uclibc 要如何支援, 如果是用 buildroot, 在 uclibc 的 configure file 加上一行<br />
[TEXT]<br />
UCLIBC_EXTRA_CFLAGS=&#8221;-mfloat-abi=softfp&#8221;<br />
[/TEXT]</p>
<p>編完 uclibc 之後, 我們可以用 objdump 看  libm.so 的 Assembly code<br />
[BASH]<br />
# arm-linux-objdump -D libm-0.9.31.so | less<br />
[/BASH]</p>
<p>有 v 開頭的指令很多都是 vfp 的指令<br />
[TEXT]<br />
00007dd0 <sin>:<br />
    7dd0:       e1a03001        mov     r3, r1<br />
    7dd4:       e3c32102        bic     r2, r3, #-2147483648    ; 0x80000000<br />
    7dd8:       e59f30a4        ldr     r3, [pc, #164]  ; 7e84 </sin><sin +0xb4><br />
    7ddc:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)<br />
    7de0:       e1520003        cmp     r2, r3<br />
    7de4:       e24dd01c        sub     sp, sp, #28     ; 0x1c<br />
    7de8:       ec410b17        vmov    d7, r0, r1<br />
    7dec:       d3a02000        movle   r2, #0  ; 0x0<br />
    7df0:       d3a03000        movle   r3, #0  ; 0x0<br />
    7df4:       d3a0c000        movle   ip, #0  ; 0x0<br />
    7df8:       da000010        ble     7e40 </sin><sin +0x70><br />
    7dfc:       e59f3084        ldr     r3, [pc, #132]  ; 7e88 </sin><sin +0xb8><br />
    7e00:       e1520003        cmp     r2, r3<br />
    7e04:       ce376b47        fsubdgt d6, d7, d7<br />
    7e08:       cc510b16        vmovgt  r0, r1, d6<br />
[/TEXT]</p>
<p>最後測試 Library 的速度, 以下是測試 Code.<br />
[C]<br />
cat 1.c<br />
#include <stdio .h><br />
#include <math .h></p>
<p>int main(void){</p>
<p>        double result,result2;
        int i,j,count=0;</p>
<p>        for(j=0;j&lt;10000;j++){
        for(i=0;i&lt;180;i++){
        result = sin(i);
        result2 = sin(0-i);
        if(result == -result2){
         count++;
        }
        }
        }</p>
<p>        printf(&#8221; count:%d\n&#8221;, count);</p>
<p>        return 0;
}
[/C]
無 VFP Library Support 的數據.
[TEXT]
count:1800000
real    0m 15.64s
user    0m 15.64s
sys     0m 0.00s
[/TEXT]
有 VFP Library Support 的數據
[TEXT]
count:1800000
real    0m 2.42s
user    0m 2.42s
sys     0m 0.00s
[/TEXT]
差了 6.46 倍. </math></stdio></sin></p>
<p>The post <a rel="nofollow" href="https://blog.richliu.com/2010/09/01/985/%e5%a6%82%e4%bd%95%e7%a2%ba%e8%aa%8d-uclibc-%e6%98%af%e6%9c%89-vfp-%e6%9c%80%e4%bd%b3%e5%8c%96%e7%9a%84/">如何確認 uclibc 是有 VFP 最佳化的?</a> appeared first on <a rel="nofollow" href="https://blog.richliu.com">richliu&#039;s blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.richliu.com/2010/09/01/985/%e5%a6%82%e4%bd%95%e7%a2%ba%e8%aa%8d-uclibc-%e6%98%af%e6%9c%89-vfp-%e6%9c%80%e4%bd%b3%e5%8c%96%e7%9a%84/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>ARM11 VFP</title>
		<link>https://blog.richliu.com/2010/03/22/890/arm11-vfp/</link>
					<comments>https://blog.richliu.com/2010/03/22/890/arm11-vfp/#respond</comments>
		
		<dc:creator><![CDATA[richliu]]></dc:creator>
		<pubDate>Mon, 22 Mar 2010 05:01:33 +0000</pubDate>
				<category><![CDATA[隨手札記]]></category>
		<category><![CDATA[ARM11]]></category>
		<category><![CDATA[GCC]]></category>
		<category><![CDATA[VFP]]></category>
		<guid isPermaLink="false">http://blog.richliu.com/?p=890</guid>

					<description><![CDATA[<p>如果要使用 ARM11 的 VFP 功能, 在 compile 時加上 -mfpu=vfp -mfloat-a [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://blog.richliu.com/2010/03/22/890/arm11-vfp/">ARM11 VFP</a> appeared first on <a rel="nofollow" href="https://blog.richliu.com">richliu&#039;s blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>如果要使用 ARM11 的 VFP 功能, 在 compile 時加上 -mfpu=vfp -mfloat-abi=softfp</p>
<p>我是使用 debian for ARM , gcc 4.4.2<span id="more-890"></span><br />
[C]# cat f.c<br />
int main()<br />
{</p>
<p>float f1=1.2, f2=1.3;<br />
f1 = f2*f1;<br />
}[/C]<br />
# gcc -mfpu=vfp -mfloat-abi=softfp -c f.c<br />
# objdump -d f.o</p>
<p>f.o:     file format elf32-littlearm</p>
<p>Disassembly of section .text:</p>
<p>00000000 :<br />
0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)<br />
4:   e28db000        add     fp, sp, #0<br />
8:   e24dd00c        sub     sp, sp, #12<br />
c:   eddf7a09        vldr    s15, [pc, #36]  ; 0x24<br />
10:   ed4b7a03        vstr    s15, [fp, #-12]<br />
14:   eddf7a08        vldr    s15, [pc, #32]<br />
18:   ed4b7a02        vstr    s15, [fp, #-8]<br />
1c:   ed1b7a03        vldr    s14, [fp, #-12]<br />
20:   ed5b7a02        vldr    s15, [fp, #-8]<br />
24:   ee677a27        vmul.f32        s15, s14, s15<br />
28:   ed4b7a03        vstr    s15, [fp, #-12]<br />
2c:   e28bd000        add     sp, fp, #0<br />
30:   e8bd0800        pop     {fp}<br />
34:   e12fff1e        bx      lr<br />
38:   3f99999a        .word   0x3f99999a<br />
3c:   3fa66666        .word   0x3fa66666<br />
有 vldr, vstr, vmul.f32 等 instruction .</p>
<blockquote><p># cat test2.c<br />
[C]<br />
#include &lt;unistd.h&gt;<br />
#include &lt;stdio.h&gt;<br />
void vfp_regs_load(float arrays[32])<br />
{<br />
asm volatile(&#8220;fldmias %0, {s0-s31}\n&#8221;<br />
:<br />
:&#8221;r&#8221;(arrays));<br />
}<br />
void vfp_regs_save(float arrays[32])<br />
{<br />
asm volatile (&#8220;fstmias %0, {s0-s31}&#8221;<br />
:<br />
:&#8221;r&#8221;(arrays));<br />
}<br />
void print_array(float array[32])<br />
{<br />
int i;<br />
for(i=0; i&lt;32; i++)<br />
{<br />
if(i%8==0)<br />
printf(&#8220;\n&#8221;);<br />
printf(&#8220;%f &#8220;,i, array[i]);<br />
}<br />
printf(&#8220;\n&#8221;);<br />
}<br />
int main()<br />
{<br />
unsigned int fpscr;<br />
float f1=1.0, f2=1.0;<br />
float farrays[32], farrays2[32];<br />
int i;<br />
fpscr = 0x130000;<br />
asm volatile (&#8220;fmxr fpscr, %0\n&#8221;<br />
:<br />
:&#8221;r&#8221;(fpscr));<br />
asm volatile (&#8220;fmrx %0, fpscr\n&#8221;<br />
:&#8221;=r&#8221;(fpscr));<br />
vfp_regs_save(farrays2);<br />
for(i=0; i&lt;32; i++)         farrays[i] = f1+f2*(float) i;     vfp_regs_load(farrays);     vfp_regs_save(farrays2);     printf(&#8220;\n1:ScalarA op ScalarB-&gt;ScalarD&#8221;);<br />
vfp_regs_load(farrays);<br />
asm volatile(&#8220;fadds s0, s1, s2&#8221;);<br />
vfp_regs_save(farrays2);<br />
print_array(farrays2);<br />
printf(&#8220;\n2:VectorA[?] op ScalarB-&gt;VectorD[?]&#8221;);<br />
vfp_regs_load(farrays);<br />
asm volatile(&#8220;fadds s8,  s24, s0&#8221;);<br />
vfp_regs_save(farrays2);<br />
print_array(farrays2);<br />
printf(&#8220;\n3:VectorA[?] op VectorB[?]-&gt;VectorD[?]&#8221;);<br />
vfp_regs_load(farrays);<br />
asm volatile(&#8220;fadds s8,  s16, s24&#8221;);<br />
vfp_regs_save(farrays2);<br />
print_array(farrays2);<br />
}[/C]</p></blockquote>
<p>Vector Instruciton 的範例</p>
<p># ./a.out</p>
<p>1:ScalarA op ScalarB-&gt;ScalarD<br />
5.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000<br />
9.000000 10.000000 11.000000 12.000000 13.000000 14.000000 15.000000 16.000000<br />
17.000000 18.000000 19.000000 20.000000 21.000000 22.000000 23.000000 24.000000<br />
25.000000 26.000000 27.000000 28.000000 29.000000 30.000000 31.000000 32.000000</p>
<p>2:VectorA[?] op ScalarB-&gt;VectorD[?]<br />
1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000<br />
26.000000 10.000000 28.000000 12.000000 30.000000 14.000000 32.000000 16.000000<br />
17.000000 18.000000 19.000000 20.000000 21.000000 22.000000 23.000000 24.000000<br />
25.000000 26.000000 27.000000 28.000000 29.000000 30.000000 31.000000 32.000000</p>
<p>3:VectorA[?] op VectorB[?]-&gt;VectorD[?]<br />
1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000<br />
42.000000 10.000000 46.000000 12.000000 50.000000 14.000000 54.000000 16.000000<br />
17.000000 18.000000 19.000000 20.000000 21.000000 22.000000 23.000000 24.000000<br />
25.000000 26.000000 27.000000 28.000000 29.000000 30.000000 31.000000 32.000000</p>
<p>1:ScalarA op ScalarB-&gt;ScalarD<br />
單純的二個浮點運算<br />
2:VectorA[?] op ScalarB-&gt;VectorD[?]<br />
一個 Vector * Scalar 運算<br />
3:VectorA[?] op VectorB[?]-&gt;VectorD[?]<br />
Vector * Vector 運算</p>
<p>Ref.</p>
<p><a href="http://linux.chinaunix.net/bbs/viewthread.php?tid=1125926" target="_blank" rel="noopener">ARM VFP的一点体会</a> 寫的不錯, 範例很好, 就.. 照作一次就 OK 了<br />
<a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0274h/index.html" target="_blank" rel="noopener">VFP11 <img src="https://s.w.org/images/core/emoji/15.0.3/72x72/2122.png" alt="™" class="wp-smiley" style="height: 1em; max-height: 1em;" /> VectorFloating-point Coprocessor Technical Reference Manual<br />
for ARM1136JF-S processorr1p5</a></p>
<p>The post <a rel="nofollow" href="https://blog.richliu.com/2010/03/22/890/arm11-vfp/">ARM11 VFP</a> appeared first on <a rel="nofollow" href="https://blog.richliu.com">richliu&#039;s blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.richliu.com/2010/03/22/890/arm11-vfp/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
