1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391
|
<!--startcut ==============================================-->
<!-- *** BEGIN HTML header *** -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>Introduction to UNIX Assembly Programming LG #53</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!-- *** END HTML header *** -->
<IMG ALT="LINUX GAZETTE" SRC="../gx/lglogo.jpg"
WIDTH="600" HEIGHT="124" border="0"><BR CLEAR="all">
<!-- *** BEGIN navbar *** -->
<A HREF="baptista.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A>
<IMG ALT=""
SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom" >
<A HREF="index.html"><IMG ALT="[ Table of Contents ]"
SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A>
<A HREF="../index.html"><IMG ALT="[ Front Page ]"
SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A>
<A HREF="../faq/index.html"><IMG ALT="[ FAQ ]"
SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A>
<A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue53/boldyshev.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A>
<IMG ALT=""
SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom" >
<A HREF="collinge.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A>
<!-- *** END navbar *** -->
<P>
<!-- A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue53/boldyshev.html">
<FONT SIZE="+2"><EM>Talkback:</EM> Discuss this article with peers</FONT></A -->
<!--endcut ============================================================-->
<H4>
"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>
<P> <HR> <P>
<!--===================================================================-->
<center>
<H1><font color="maroon">Introduction to UNIX Assembly Programming</font></H1>
<H4>By <a href="mailto:konst@linuxassembly.org">Konstantin Boldyshev</a></H4>
</center>
<P> <HR> <P>
<!-- END header -->
<P>
<EM>This document is intended to be a tutorial, showing how to write
a simple assembly program in
several UNIX operating systems on IA32 (i386) platform.
Included material may or may not be applicable
to other hardware and/or software platforms.
Document explains program layout, system call convention,
and build process.
It accompanies Linux Assembly HOWTO, which may be of your interest as well,
though is more Linux specific.</EM>
<P>
v0.3, April 09, 2000
<HR>
<H2><A NAME="s1">1. Introduction</A></H2>
<H2><A NAME="ss1.1">1.1 Legal blurb</A>
</H2>
<P>Copyright © 1999-2000 Konstantin Boldyshev.
Permission is granted to copy, distribute and/or modify
this document under the terms of the GNU
<A HREF="http://www.gnu.org/copyleft/fdl.html">Free Documentation License</A>,
Version 1.1 or any later version published by the Free Software Foundation.
<P>
<H2><A NAME="ss1.2">1.2 Obtatining this document</A>
</H2>
<P>The latest version of this document is available from
<A HREF="http://linuxassembly.org/intro.html">http://linuxassembly.org/intro.html</A>.
If you are reading a few-months-old copy,
please check the url above for a new version.
<P>
<H2><A NAME="ss1.3">1.3 Tools you need</A>
</H2>
<P>You will need several tools to play with programs included in this tutorial.
<P>First of all you need assembler (compiler).
As a rule modern UNIX distribution includes <CODE>gas</CODE> (GNU Assembler),
but all examples specified here use another assembler -- <CODE>nasm</CODE> (Netwide Assembler).
You can download it from the
<A HREF="http://www.cryogen.com/Nasm/">nasm page</A>,
it comes with full source code.
Compile it, or try to find precompiled binary for your OS;
note that several distributions (at least Linux ones)
already have <CODE>nasm</CODE>, check first.
<P>Second, you need linker -- <CODE>ld</CODE>, since <CODE>nasm</CODE> produces only object code.
Any distribution should embrace <CODE>ld</CODE>.
<P>If you're going to dig in, you should also install include files for your OS,
and if possible, kernel source.
<P>Now you should be ready to start, welcome..
<P>
<HR>
<H2><A NAME="s2">2. Hello, world!</A></H2>
<P>
<P>Now we will write our program, classical "Hello, world" (hello.asm).
You can download its sources and binaries
<A HREF="http://linuxassembly.org/intro/hello.tgz">here</A>.
But before let me explain several basics.
<P>
<H2><A NAME="ss2.1">2.1 System call</A>
</H2>
<P>Unless program is just implementing some math algorithms in assembly,
it will deal with such things as getting input, producing output,
and exiting. Here comes a need to call some OS service.
In fact, programming in assembly language is quite the same in different OSes,
unless OS services are touched.
<P>There are two common ways of performing a system call in UNIX OS:
trough the C library (libc) wrapper, or directly.
<P>Using or not using libc in assembly programming is more a question
of taste/belief than something practical.
Libc wrappers are made to protect program from possible system call convention change,
and to provide POSIX compatible interface, if kernel lacks it for some call.
However usually UNIX kernel is more or less POSIX compliant,
this means that syntax of most libc "system calls" exactly
matches syntax of real kernel system calls (and vice versa).
But main drawback of throwing libc away is that are loosing several functions
that are not just syscall wrappers, like printf(), malloc() and similar.
<P>This tutorial will show how to use <B>direct</B> kernel calls,
since this is the fastest way to call kernel service;
our code is not linked to any library,
it communicates with kernel directly.
<P>Things that differ in different UNIX kernels
are set of system calls and system call convention
(however as they strive for POSIX compliance, there's a lot of common between them).
<P><EM>Note for (former) DOS programmers: so, what is that system call?
Better to explain it in such a way:
if you ever wrote a DOS assembly program (and most IA32 assembly programmers did),
you remember DOS services <CODE>int 0x21, int 0x25, int 0x26</CODE> etc..
This is what can be designated as system call.
However the actual implementation is absolutely different,
and this doesn't mean that system calls necessary are done via some interrupt.
Also, quite often DOS programmers mix OS services with BIOS services
like <CODE>int 0x10</CODE> or <CODE>int 0x16</CODE>, and are very surprised when they fail
to perform them in UNIX, since these are not OS services).</EM>
<P>
<H2><A NAME="ss2.2">2.2 Program layout</A>
</H2>
<P>As a rule, modern IA32 UNIXes are 32bit (*grin*), run in protected mode,
have flat memory model, and use ELF format for binaries.
<P>Program can be divided into sections (or segments):
<CODE>.text</CODE> for your code (read-only),
<CODE>.data</CODE> for your data (read-write),
<CODE>.bss</CODE> for uninitialized data (read-write);
actually there can be few other, as well as user-defined sections,
but there's rare need to use them and they are out of our interest here.
Program must have at least <CODE>.text</CODE> section.
<P>Ok, now we'll dive into OS specific details.
<P>
<H2><A NAME="ss2.3">2.3 Linux</A>
</H2>
<P>System calls in Linux are done through int 0x80.
(actually there's a kernel patch allowing system calls to be done
via <EM>syscall (sysenter)</EM> instruction on newer CPUs, but this
thing is still experimental).
<P>Linux differs from usual UNIX calling convention,
and features "fastcall" convention
for system calls (it resembles DOS).
System function number is passed in <CODE>eax</CODE>,
and arguments are passed through registers, not the stack.
There can be up to five arguments in <CODE>ebx, ecx, edx, esi, edi</CODE> consequently.
If there's more than five arguments, they are simply passed though the
structure as first argument.
Result is returned in <CODE>eax</CODE>, stack is not touched at all.
<P>System call function numbers are in sys/syscall.h,
but actually in asm/unistd.h,
some documentation is in the 2nd section of manual
(f.e. to find info on <CODE>write</CODE> system call, issue <CODE>man 2 write</CODE>).
<P>There are several attempts to made up-to-date documentation of Linux system calls,
examine URLs in the
<A HREF="boldyshev.html#references">references</A>.
<P>So, our Linux program will look like:
<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
section .text
global _start ;must be declared for linker (ld)
msg db 'Hello, world!',0xa ;our dear string
len equ $ - msg ;length of our dear string
_start: ;we tell linker where is entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<P>As you will see futther, Linux syscall convention is the most compact one.
<P>Kernel source references:
<UL>
<LI>arch/i386/kernel/entry.S</LI>
<LI>include/asm-i386/unistd.h</LI>
<LI>include/linux/sys.h</LI>
</UL>
<P>
<P>
<H2><A NAME="ss2.4">2.4 FreeBSD</A>
</H2>
<P>FreeBSD has "usual" calling convention,
when syscall number is in eax, and parameters are on the stack
(the first argument is pushed the last).
System call is to be performed through the <B>function call</B> to a
function containing <CODE>int 0x80</CODE> and <CODE>ret</CODE>, not just <CODE>int 0x80</CODE> itself
(return address MUST be on the stack before <CODE>int 0x80</CODE> is issued!).
Caller must clean up the stack after call.
Result is returned as usual in <CODE>eax</CODE>.
<P>Also there's an alternate way of using <CODE>call 7:0</CODE> gate instead of <CODE>int 0x80</CODE>.
End-result is the same, not counting increase of program size,
since you will also need to <CODE>push eax</CODE> before,
and these two instructions occupy more bytes.
<P>System call function numbers are in sys/syscall.h,
documentation is in the 2nd section of man.
<P>Ok, I think the source will explain this better:
<P><EM>Note: Included code may run on other *BSD as well, I think.</EM>
<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
section .text
global _start ;must be declared for linker (ld)
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
_syscall:
int 0x80 ;system call
ret
_start: ;tell linker entry point
push dword len ;message length
push dword msg ;message to write
push dword 1 ;file descriptor (stdout)
mov eax,0x4 ;system call number (sys_write)
call _syscall ;call kernel
;actually there's an alternate
;way to call kernel:
;push eax
;call 7:0
add esp,12 ;clean stack (3 arguments * 4)
push dword 0 ;exit code
mov eax,0x1 ;system call number (sys_exit)
call _syscall ;call kernel
;we do not return from sys_exit,
;there's no need to clean stack
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<P>Kernel source references:
<UL>
<LI>i386/i386/exception.s</LI>
<LI>i386/i386/trap.c</LI>
<LI>sys/syscall.h</LI>
</UL>
<P>
<H2><A NAME="ss2.5">2.5 BeOS</A>
</H2>
<P>BeOS kernel is using "usual" UNIX calling convention too.
The difference from FreeBSD example is that you call <CODE>int 0x25</CODE>.
<P>On information where to find system call function numbers and other
interesting details, examine
<A HREF="boldyshev.html#references">asmutils</A>,
especially os_beos.inc file.
<P><EM>Note: to make <CODE>nasm</CODE> compile correctly on BeOS you need
to insert <CODE>#include "nasm.h"</CODE> into <CODE>float.h</CODE>,
and <CODE>#include <stdio.h></CODE> into <CODE>nasm.h</CODE>.</EM>
<P>
<BLOCKQUOTE><CODE>
<HR>
<PRE>
section .text
global _start ;must be declared for linker (ld)
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
_syscall: ;system call
int 0x25
ret
_start: ;tell linker entry point
push dword len ;message length
push dword msg ;message to write
push dword 1 ;file descriptor (stdout)
mov eax,0x3 ;system call number (sys_write)
call _syscall ;call kernel
add esp,12 ;clean stack (3 * 4)
push dword 0 ;exit code
mov eax,0x3f ;system call number (sys_exit)
call _syscall ;call kernel
;no need to clean stack
</PRE>
<HR>
</CODE></BLOCKQUOTE>
<P>
<H2><A NAME="ss2.6">2.6 Building binary</A>
</H2>
<P>
<P>Building binary is usual two-step process of compiling and linking.
To make binary from our hello.asm we must do the following:
<P>
<HR>
<PRE>
$ nasm -f elf hello.asm # this will produce hello.o object file
$ ld -s -o hello hello.o # this will produce hello executable
</PRE>
<HR>
<P>That's it. Simple.
Now you can launch hello program by entering <CODE>./hello</CODE>, it should work.
Look at the binary size -- surprised?
<P>
<HR>
<H2><A NAME="references"></A> <A NAME="s3">3. References</A></H2>
<P>I hope you enjoyed the journey. If you get interested in assembly
programming for UNIX, I strongly encourage you to visit
<A HREF="http://linuxassembly.org">Linux Assembly</A>
for more information, and download
<A HREF="http://linuxassembly.org/asmutils.html">asmutils</A> package,
it contains a lot of sample code.
For comprehensive overview of Linux/UNIX assembly programming refer to the
<A HREF="http://linuxassembly.org/howto.html">Linux Assembly HOWTO</A>.
<P>Thank you for your interest!
<!-- *** BEGIN copyright *** -->
<P> <hr> <!-- P -->
<H5 ALIGN=center>
Copyright © 2000, Konstantin Boldyshev<BR>
Published in Issue 53 of <i>Linux Gazette</i>, May 2000</H5>
<!-- *** END copyright *** -->
<!--startcut ==========================================================-->
<!-- P --> <HR> <!-- P -->
<!-- A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue53/boldyshev.html">
<FONT SIZE="+2"><EM>Talkback:</EM> Discuss this article with peers</FONT></A -->
<P>
<!-- *** BEGIN navbar *** -->
<A HREF="baptista.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A>
<IMG ALT=""
SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom" >
<A HREF="index.html"><IMG ALT="[ Table of Contents ]"
SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A>
<A HREF="../index.html"><IMG ALT="[ Front Page ]"
SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A>
<A HREF="../faq/index.html"><IMG ALT="[ FAQ ]"
SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A>
<A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue53/boldyshev.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A>
<IMG ALT=""
SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom" >
<A HREF="collinge.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A>
<!-- *** END navbar *** -->
</BODY></HTML>
<!--endcut ============================================================-->
|