Can I do `ret` instruction from code at _start in MacOS? Linux?
MacOS Dynamic Executables
When you are using MacOS and link with:
ld foo.o -lc -macosx_version_min 10.12.0 -e _start -o foo
you are getting a dynamically loaded version of your code. _start
isn't the true entry point, the dynamic loader is. The dynamic loader as one of its last steps does C/C++/Objective-C runtime initialization, and then calls your specified entry point specified with the -e
option. The Apple documentation about Forking and Executing the Process has these paragraphs:
A Mach-O executable file contains a header consisting of a set of load commands. For programs that use shared libraries or frameworks, one of these commands specifies the location of the linker to be used to load the program. If you use Xcode, this is always /usr/lib/dyld, the standard OS X dynamic linker.
When you call the execve routine, the kernel first loads the specified program file and examines the mach_header structure at the start of the file. The kernel verifies that the file appear to be a valid Mach-O file and interprets the load commands stored in the header. The kernel then loads the dynamic linker specified by the load commands into memory and executes the dynamic linker on the program file.
The dynamic linker loads all the shared libraries that the main program links against (the dependent libraries) and binds enough of the symbols to start the program. It then calls the entry point function. At build time, the static linker adds the standard entry point function to the main executable file from the object file /usr/lib/crt1.o. This function sets up the runtime environment state for the kernel and calls static initializers for C++ objects, initializes the Objective-C runtime, and then calls the program’s main function
In your case that is _start
. In this environment where you are creating a dynamically linked executable you can do a ret
and have it return back to the code that called _start
which does an exit system call for you. This is why it doesn't crash. If you review the generated object file with gobjdump -Dx foo
you should get:
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000001 0000000000001fff 0000000000001fff 00000fff 2**0
CONTENTS, ALLOC, LOAD, CODE
SYMBOL TABLE:
0000000000001000 g 03 ABS 01 0010 __mh_execute_header
0000000000001fff g 0f SECT 01 0000 [.text] _start
0000000000000000 g 01 UND 00 0100 dyld_stub_binder
Disassembly of section .text:
0000000000001fff <_start>:
1fff: c3 retq
Notice that start address
is 0. And the code at 0 is dyld_stub_binder
. This is the dynamic loader stub that eventually sets up a C runtime environment and then calls your entry point _start
. If you don't override the entry point it defaults to main
.
MacOS Static Executables
If however you build as a static executable, there is no code executed before your entry point and ret
should crash since there is no valid return address on the stack. In the documentation quoted above is this:
For programs that use shared libraries or frameworks, one of these commands specifies the location of the linker to be used to load the program.
A statically built executable doesn't use the dynamic loader dyld
with crt1.o
embedded in it. CRT = C runtime library which covers C++/Objective-C as well on MacOS. The processes of dealing with dynamic loading are not done, C/C++/Objective-C initialization code is not executed, and control is transferred directly to your entry point.
To build statically drop the -lc
(or -lSystem
) from the linker command and add -static
option:
ld foo.o -macosx_version_min 10.12.0 -e _start -o foo -static
If you run this version it should produce a segmentation fault. gobjdump -Dx foo
produces
start address 0x0000000000001fff
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000001 0000000000001fff 0000000000001fff 00000fff 2**0
CONTENTS, ALLOC, LOAD, CODE
1 LC_THREAD.x86_THREAD_STATE64.0 000000a8 0000000000000000 0000000000000000 00000198 2**0
CONTENTS
SYMBOL TABLE:
0000000000001000 g 03 ABS 01 0010 __mh_execute_header
0000000000001fff g 0f SECT 01 0000 [.text] _start
Disassembly of section .text:
0000000000001fff <_start>:
1fff: c3 retq
You should notice start_address
is now 0x1fff. 0x1fff is the entry point you specified (_start
). There is no dynamic loader stub as an intermediary.
Linux
Under Linux when you specify your own entry point it will segmentation fault whether you are building as a static or shared executable. There is good information on how ELF executables are run on Linux in this article and the dynamic linker documentation. The key point that should be observed is that the Linux one makes no mention of doing C/C++/Objective-C runtime initialisation unlike the MacOS dynamic linker documentation.
The key difference between the Linux dynamic loader (ld.so) and the MacOS one (dynld) is that the MacOS dynamic loader performs C/C++/Objective-C startup initialization by including the entry point from crt1.o
. The code in crt1.o
then transfers control to the entry point you specified with -e
(default is main
). In Linux the dynamic loader makes no assumption about the type of code that will be run. After the shared objects are processed and initialized control is transferred directly to the entry point.
Stack Layout at Process Creation
FreeBSD (on which MacOS is based) and Linux share one thing in common. When loading 64-bit executables the layout of the user stack when a process is created is the same. The stack for 32-bit processes is similar but pointers and data are 4 bytes wide, not 8.
Although there isn't a return address on the stack, there is other data representing the number of arguments, the arguments, environment variables, and other information. This layout is not the same as what the main
function in C/C++ expects. It is part of the C startup code to convert the stack at process creation to something compatible with the C calling convention and the expectations of the function main
(argc
, argv
, envp
).
I wrote more information on this subject in this Stackoverflow answer that shows how a statically linked MacOS executable can traverse through the program arguments passed by the kernel at process creation.
Return values in main vs _start
TL:DR: function return values and system-call arguments use separate registers because they're completely unrelated.
When you compile with gcc
, it links CRT startup code that defines a _start
. That _start
(indirectly) calls main
, and passes main
's return value (which main leaves in EAX) to the exit()
library function. (Which eventually makes an exit system call, after doing any necessary libc cleanup like flushing stdio buffers.)
See also Return vs Exit from main function in C - this is exactly analogous to what you're doing, except you're using _exit()
which bypasses libc cleanup, instead of exit()
. Syscall implementation of exit()
An int $0x80
system call takes its argument in EBX, as per the 32-bit system-call ABI (which you shouldn't be using in 64-bit code). It's not a return value from a function, it's the process exit status. See Hello, world in assembly language with Linux system calls? for more about system calls.
Note that _start
is not a function; it can't return in that sense because there's no return address on the stack. You're taking a casual description like "return to the OS" and conflating that with a function's "return value". You can call exit
from main
if you want, but you can't ret
from _start
.
EAX is the return-value register for int
-sized values in the function-calling convention. (The high 32 bits of RAX are ignored because main
returns int
. But also, $?
exit status can only get the low 8 bits of the value passed to exit()
.)
Related:
- Why am I allowed to exit main using ret?
- What happens with the return value of main()?
- where goes the ret instruction of the main
- What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? explains why you should use
syscall
, and shows some of the kernel side of what happens inside the kernel after a system call.
Why am I allowed to exit main using ret?
C main
is called (indirectly) from CRT startup code, not directly from the kernel.
After main
returns, that code calls atexit
functions to do stuff like flushing stdio buffers, then passes main's return value to a raw _exit
system call. Or exit_group
which exits all threads.
You make several wrong assumptions, all I think based on a misunderstanding of how kernels work.
The kernel runs at a different privilege level from user-space (ring 0 vs. ring 3 on x86). Even if user-space knew the right address to jump to, it can't jump into kernel code. (And even if it could, it wouldn't be running with kernel privilege level).
ret
isn't magic, it's basically justpop %rip
and doesn't let you jump anywhere you couldn't jump to with other instructions. Also doesn't change privilege level1.Kernel addresses aren't mapped / accessible when user-space code is running; those page-table entries are marked as supervisor-only. (Or they're not mapped at all in kernels that mitigate the Meltdown vulnerability, so entering the kernel goes through a "wrapper" block of code that changes CR3.)
Virtual memory is how the kernel protects itself from user-space. User-space can't modify page tables directly, only by asking the kernel to do it via
mmap
andmprotect
system calls. (And user-space can't execute privileged instructions likemov cr3, rax
to install new page tables. That's the purpose of having ring 0 (kernel mode) vs. ring 3 (user mode).)The kernel stack is separate from the user-space stack for a process. (In the kernel, there's also a small kernel stack for each task (aka thread) that's used during system calls / interrupts while that user-space thread is running. At least that's how Linux does it, IDK about others.)
The kernel doesn't literally
call
user-space code; The user-space stack doesn't hold any return address back into the kernel. A kernel->user transition involves swapping stack pointers, as well as changing privilege levels. e.g. with an instruction likeiret
(interrupt-return).Plus, leaving a kernel code address anywhere user-space can see it would defeat kernel ASLR.
Footnote 1: (The compiler-generated ret
will always be a normal near ret
, not a retf
that could return through a call gate or something to a privileged cs
value. x86 handles privilege levels via the low 2 bits of CS but nevermind that. MacOS / Linux don't set up call gates that user-space can use to call into the kernel; that's done with syscall
or int 0x80
instructions.)
In a fresh process (after an execve
system call replaced the previous process with this PID with a new one), execution begins at the process entry point (usually labeled _start
), not at the C main
function directly.
C implementations come with CRT (C RunTime) startup code that has (among other things) a hand-written asm implementation of _start
which (indirectly) calls main
, passing args to main according to the calling convention.
_start
itself is not a function. On process entry, RSP points at argc
, and above that on the user-space stack is argv[0]
, argv[1]
, etc. (i.e. the char *argv[]
array is right there by value, and above that the envp
array.) _start
loads argc
into a register and puts pointers to the argv and envp into registers. (The x86-64 System V ABI that MacOS and Linux both use documents all this, including the process-startup environment and the calling convention.)
If you try to ret
from _start
, you're just going to pop argc
into RIP, and then code-fetch from absolute address 1
or 2
(or other small number) will segfault. For example, Nasm segmentation fault on RET in _start shows an attempt to ret
from the process entry point (linked without CRT startup code). It has a hand-written _start
that just falls through into main
.
When you run gcc main.c
, the gcc
front-end runs multiple other programs (use gcc -v
to show details). This is how the CRT startup code gets linked into your process:
- gcc preprocesses (CPP) and compiles+assembles
main.c
tomain.o
(or a temporary file). On MacOS, thegcc
command is actually clang which has a built-in assembler, but realgcc
really does compile to asm and then runas
on that. (The C preprocessor is built-in to the compiler, though.) - gcc runs something like
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie /usr/lib/Scrt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/crtbeginS.o main.o -lc -lgcc /usr/lib/gcc/x86_64-pc-linux-gnu/9.1.0/crtendS.o
. That's actually simplified a lot, with some of the CRT files left out, and paths canonicalized to remove../../lib
parts. Also, it doesn't runld
directly, it runscollect2
which is a wrapper forld
. But anyway, that statically links in those.o
CRT files that contain_start
and some other stuff, and dynamically links libc (-lc
) and libgcc (for GCC helper functions like implementing__int128
multiply and divide with 64-bit registers, in case your program uses those).
.intel_syntax
.text:
.global _rbp
_rbp:
mov rax, rbp
ret;
This is not allowed, ...
The only reason that doesn't assemble is because you tried to declare .text:
as a label, instead of using the .text
directive. If you remove the trailing :
it does assemble with clang (which treats .intel_syntax
the same as .intel_syntax noprefix
).
For GCC / GAS to assemble it, you'd also need the noprefix
to tell it that register names aren't prefixed by %
. (Yes you can have Intel op dst, src order but still with %rsp
register names. No you shouldn't do this!) And of course GNU/Linux doesn't use leading underscores.
Not that it would always do what you want if you called it, though! If you compiled main
without optimization (so -fno-omit-frame-pointer
was in effect), then yes you'd get a pointer to the stack slot below the return address.
And you definitely use the value incorrectly. (*p)-4;
loads the saved RBP value (*p
) and then offsets by four 8-byte void-pointers. (Because that's how C pointer math works; *p
has type void*
because p
has type void **
).
I think you're trying to get your own return address and re-run the call
instruction (in main's caller) that reached main, eventually leading to a stack overflow from pushing more return addresses. In GNU C, use void * __builtin_return_address (0)
to get your own return address.
x86 call rel32
instructions are 5 bytes, but the call
that called main was probably an indirect call, using a pointer in a register. So it might be a 2-byte call *%rax
or a 3-byte call *%r12
, you don't know unless you disassemble your caller. (I'd suggest single-stepping by instructions (GDB / LLDB stepi
) off the end of main
using a debugger in disassembly mode. If it has any symbol info for main's caller, you'll be able to scroll backward and see what the previous instruction was.
If not, you might have to try and see what looks sane; x86 machine code can't be unambiguously decoded backwards because it's variable-length. You can't tell the difference between a byte within an instruction (like an immediate or ModRM) vs. the start of an instruction. It all depends on where you start disassembling from. If you try a few byte offsets, usually only one will produce anything that looks sane.
asm("movq %rax, 0"); //Exit code is 11, so now it should be 0
This is a store of RAX to absolute address 0
, in AT&T syntax. This of course segfaults. exit code 11 is from SIGSEGV, which is signal 11. (Use kill -l
to see signal numbers).
Perhaps you wanted mov $0, %eax
. Although that's still pointless here, you're about to call through your function pointer. In debug mode, the compiler might load it into RAX and step on your value.
Also, writing a register in an asm
statement is never safe when you don't tell the compiler which registers you're modifying (using constraints).
printf("Main: %p\n", main);
printf("&Main: %p\n", &main); //WTF
main
and &main
are the same thing because main
is a function. That's just how C syntax works for function names. main
isn't an object that can have its address taken. & operator optional in function pointer assignment
It's similar for arrays: the bare name of an array can be assigned to a pointer or passed to functions as a pointer arg. But &array
is also the same pointer, same as &array[0]
. This is true only for arrays like int array[10]
, not for pointers like int *ptr
; in the latter case the pointer object itself has storage space and can have its own address taken.
Nasm segmentation fault on RET in _start
Because ret
is NOT the proper way to exit a program in Linux, Windows, or Mac!!!!
_start
is not a function, there is no return address on the stack because there is no user-space caller to return to. Execution in user-space started here (in a static executable), at the process entry point. (Or with dynamic linking, it jumped here after the dynamic linker finished, but same result).
On Linux / OS X, the stack pointer is pointing at argc
on entry to _start
(see the i386 or x86-64 System V ABI doc for more details on the process startup environment); the kernel puts command line args into user-space stack memory before starting user-space. (So if you do try to ret
, EIP/RIP = argc = a small integer, not a valid address. If your debugger shows a fault at address 0x00000001
or something, that's why.)
For Windows it is ExitProcess
and Linux is is system call -int 80H
using sys_exit
, for x86 or using syscall
using 60
for 64-bit or a call to exit
from the C Library if you are linking to it.
32-bit Linux (i386)
%define SYS_exit 1 ; call number __NR_exit from <asm/unistd_32.h>
mov eax, SYS_exit ; use the NASM macro we defined earlier
xor ebx, ebx ; ebx = 0 exit status
int 80H ; _exit(0)
64-bit Linux (amd64)
mov rax, 60 ; SYS_exit aka __NR_exit from asm/unistd_64.h
xor rdi, rdi ; edi = 0 first arg to 64-bit system calls
syscall ; _exit(0)
(In GAS you can actually #include <sys/syscall.h>
or <asm/unistd.h>
to get the right numbers for the mode you're assembling a .S
for, but NASM can't easily use the C preprocessor.
See Polygot include file for nasm/yasm and C for hints.)
32-bit Windows (x86)
push 0
call ExitProcess
Or Windows/Linux linking against the C Library
; pass an int exit_status as appropriate for the calling convention
; push 0 / xor edi,edi / xor ecx,ecx
call exit
(Or for 32-bit x86 Windows, call _exit
, because C names get prepended with an underscore, unlike in x86-64 Windows. The POSIX _exit
function would be call __exit
, if Windows had one.)
Windows x64's calling convention includes shadow space which the caller has to reserve, but exit
isn't going to return so it's ok to let it step on that space above its return address. Also, 16-byte stack alignment is required by the calling convention before call exit
except for 32-bit Windows, but often won't actually crash for a simple function like exit()
.
call exit
(unlike a raw exit system call or libc _exit
) will flush stdio buffers first. If you used printf
from _start
, use exit
to make sure all output is printed before you exit, even if stdout is redirected to a file (making stdout full-buffered, not line-buffered).
It's generally recommended that if you use libc functions, you write a main
function and link with gcc so it's called by the normal CRT start functions which you can ret
to.
See also
- Syscall implementation of exit()
- How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?
Defining main
as something that _start
falls through into doesn't make it special, it's just confusing to use a main
label if it's not like a C main
function called by a _start
that's prepared to exit after main
returns.
Mac assembly: segfault with libc exit
@fuz is almost certainly correct: you crash because you didn't initialize libc. There's probably a NULL pointer somewhere in the data structures that exit(3)
checks before actually exiting. e.g. it flushes stdout
if needed, and it runs any functions registered with atexit(3)
.
If you don't want it to do all that work, then either make the sys_exit
system call directly with a syscall
instruction, or call
the thin _exit(2)
libc wrapper function for it. (The basics of the situation will be the same as on Linux, because exit(3)
vs. _exit(2)
are standardized by POSIX: see Syscall implementation of exit().
I think the tutorial you're following mostly looks good, but perhaps some older version of OS X allowed libc functions (including printf
?!?) to be used without calling any libc init functions. Or else they didn't test their code after an edit to the build commands. (Assuming they tested at all, maybe it was with dynamic linking, which would work.)
OS X prefixes symbol names in assembly with an _
, so use call __exit
(two underscores) to call _exit()
. (e.g. call _printf
calls the C printf
function).
_exit(2)
probably won't crash if you call it without initializing libc, but it's still a bad idea to call any libc functions without having called libc init functions first. Better to make the system call directly (see later in the tutorial), or even better, build it with gcc hello_asm.S -o hello_asm
to make sure libc is initialized. Then you can follow the rest of the tutorial, including the printf
.
Don't call your Mach-O entry point _main
or main
in a static executable. CRT startup code hasn't run yet. The usual convention is to call it _start
for the process entry point.
(Note that OS X puts the CRT start code in the dynamic linker, so the "entry point" in a dynamically-linked executable is the C main
function, unlike in Linux where dynamic executables can avoid the CRT startup code.
libc would be initialized for you if you linked with gcc exit2.o -o exit
instead of ld
, which you're using to do the equivalent of gcc -static -nostartfiles
.)
How can I modify the stack with nasm, x86_64, linux functions (using `ret` keyword)?
tl;dr
Remember that call
is technically a push rip
, and ret
is technically a pop rip
, so you pretty much messed up your stack in your example because you inadvertently pop it in the wrong spot.
More of an answer
Although you should probably properly learn how calling conventions work, I'm going to attempt an answer to briefly "soften" the idea, and for the fun of learning.
Abstractly speaking, in order to have functions, you must have something called stack frames, or else you'd have a pretty hard time managing local variables and getting ret
to work. On x86_64, a stack frame is pretty much composed of a few things, in order.
- The function arguments, if there are any0,
- If some arguments were passed in registers, this may be omitted.
- the return address,
- The
call
instruction will push this onto the stack. - It's on you to make sure the
ret
instruction will pop this off the stack.
- The
- optionally a frame pointer,
- If your stack grows by a dynamic amount, this can keep track of the start of the frame.
- Otherwise, if you know the stack size ahead of time, it's optional.
- and then your local state on the stack.
As long as execution stays within your little assembly space, you are technically free to pass arguments however you want1 as long as you are aware of how instructions like call
and ret
manipulate the stack. The simplest way, in my opinion, is to make it sort of stack-based, so that your compiler would not need to worry about register allocation as much2.
To keep things simple, I'd suggest using something like the x86 convention but applied to x86_64, as you seem to be using 64-bit code. That is to say, the caller function would push
all of its arguments onto the stack (usually in reverse order), and then call
the callee function. For example, for a 3-argument function, your stack would end up looking something like this (beware that the top of the stack is actually on the bottom).
+----------------+
| argument 2 |
+----------------+
| argument 1 |
+----------------+
| argument 0 |
+----------------+
| return address |
+----------------+
| local state |
| ... |
+----------------+
Also, I noticed that you never really made use of the rsp
register. Depending on the design of your compiler, you technically could get away with this. Stack machines like the JVM rely solely on pushes and pops, anyway, I believe. As long as your pushes and pops match (especially call
and ret
, which act as a special push and pop), you should be fine.
0 Windows actually allocates at least an extra 32 bytes here for argument spilling, but you can probably ignore that in this case.
1 There are specific calling conventions that dictate how parameters are passed from caller to callee and back. Beyond your programming exercise, I highly recommend reading about how they work, so that your compiler can output code that can easily be called by and easily call functions that weren't emitted by your compiler, or go the Forth way as Nate mentioned.
2 goto 1
How do I return to mainline code from a signal handler in assembler?
A simple ret
will return so as to reattempt the faulting instruction. When using sigaction
to register the signal handler with the flag SA_SIGINFO
, the third argument is a pointer to a ucontext_t
that contains the saved state, which may be altered.
What is the first variables of my stack program?
You're targetting modern MacOS, hence ld
will emit dyld assisted LC_MAIN
load command for entry point handling.
The [rsp]
is the return address to libdyld _start
function epilogue:
mov edi, eax ; pass your process return code as 1st argument under System V 64bit ABI
call exit ;from libSystem
hlt
What it means you don't need to exit your process through a system call like you do in:
; return (0)
mov rax, 0x2000001
mov rdi, 0x0
syscall
Instead:
xor eax,eax
ret
is enough (and that's what compilers will emit btw).
Your buffer will also get flushed in the ret
/ libdyld
approach. That's irrelevant for your system write call you are doing, but could be for a printf
for instance.
Here's a great article that describes lots of details.
Capture input in assembly arm 64 bit mac os
First you need to move msg
to a writeable segment:
.data
msg: .ds 4 //memory buffer for keyboard input
.text // keep everything else in __TEXT
Related Topics
Elf Header or Installation Issue with Bcrypt in Docker Container
Delete a Column from a Delimited File in Linux
Linux 64 Command Line Parameters in Assembly
How to Automate Telnet Session Using Expect
How to Get Cmake to Use the Default Compiler on System Path
Running a Script After Startx Automatically
How to Build a Linux Kernel Module So That It Is Compatible with All Kernel Releases
Pyqt5 Error "Pycapsule_Getpointer Called with Incorrect Name"
Generate Random Float Number in Given Specific Range of Numbers Using Bash
G++ Searches /Lib/../Lib/, Then /Lib/
Bash Output Stream Write to a File
How to Pack Multiple Library Archives (.A) into One Archive File
Difference Between Printf and Echo in Bash
Where Is the Linux Isr Entry Point
How to Cut an Existing Variable and Assign to a New Variable in Bash