Skip to content

Commit 29ce25c

Browse files
committed
Start writing formal specification for APE
1 parent 7996bf6 commit 29ce25c

File tree

2 files changed

+272
-1
lines changed

2 files changed

+272
-1
lines changed

ape/specification.md

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
# Actually Portable Executable Specification v0.1
2+
3+
Actually Portable Executable (APE) is an executable file format that
4+
polyglots the Windows Portable Executable (PE) format with a UNIX Sixth
5+
Edition style shell script that doesn't have a shebang. This makes it
6+
possible to produce a single file binary that executes on the stock
7+
installations of the many OSes and architectures.
8+
9+
## Supported OSes and Architectures
10+
11+
- AMD64
12+
- Linux
13+
- MacOS
14+
- Windows
15+
- FreeBSD
16+
- OpenBSD
17+
- NetBSD
18+
- BIOS
19+
20+
- ARM64
21+
- Linux
22+
- MacOS
23+
- FreeBSD
24+
- Windows (non-native)
25+
26+
## File Header
27+
28+
APE defines three separate file magics, all of which are 8 characters
29+
long. Any file that starts with one of these magic values can be
30+
considered an APE program.
31+
32+
### (1) APE MZ Magic
33+
34+
- ASCII: `MZqFpD='`
35+
- Hex: 4d 5a 71 46 70 44 3d 27
36+
37+
This is the canonical magic used by almost all APE programs. It enables
38+
maximum portability between OSes. When interpreted as a shell script, it
39+
is assiging a single quoted string to an unused variable. The shell will
40+
then ignore subsequent binary content that's placed inside the string.
41+
42+
It is strongly recommended that this magic value be immediately followed
43+
by a newline (\n or hex 0a) character. Some shells, e.g. FreeBSD SH and
44+
Zsh impose a binary safety check before handing off files that don't
45+
have a shebang to `/bin/sh`. That check applies to the first line, which
46+
can't contain NUL characters.
47+
48+
The letters were carefully chosen so as to be valid x86 instructions in
49+
all operating modes. This makes it possible to store a BIOS bootloader
50+
disk image inside an APE binary. For example, simple CLI programs built
51+
with Cosmopolitan Libc will boot from BIOS into long mode if they're
52+
treated as a floppy disk image.
53+
54+
The letters also allow for the possibility of being treated on x86-64 as
55+
a flat executable, where the PE / ELF / Mach-O executable structures are
56+
ignored, and execution simply begins at the beginning of the file,
57+
similar to how MS-DOS .COM binaries work.
58+
59+
The 0x4a relative offset of the magic causes execution to jump into the
60+
MS-DOS stub defined by Portable Executable. APE binaries built by Cosmo
61+
Libc use tricks in the MS-DOS stub to check the operating mode and then
62+
jump to the appropriate entrypoint, e.g. `_start()`.
63+
64+
#### Decoded as i8086
65+
66+
```asm
67+
dec %bp
68+
pop %dx
69+
jno 0x4a
70+
jo 0x4a
71+
```
72+
73+
#### Decoded as i386
74+
75+
```asm
76+
push %ebp
77+
pop %edx
78+
jno 0x4a
79+
jo 0x4a
80+
```
81+
82+
#### Decoded as x86-64
83+
84+
```asm
85+
rex.WRB
86+
pop %r10
87+
jno 0x4a
88+
jo 0x4a
89+
```
90+
91+
### (2) APE UNIX-Only Magic
92+
93+
- ASCII: `jartsr='`
94+
- Hex: 6a 61 72 74 73 72 3d 27
95+
96+
Being a novel executable format that was first published in 2020, the
97+
APE file format is less understood by industry tools compared to the PE,
98+
ELF, and Mach-O executable file formats, which have been around for
99+
decades. For this reason, APE programs that use the MZ magic above can
100+
attract attention from Windows AV software, which may be unwanted by
101+
developers who aren't interested in targeting the Windows platform.
102+
Therefore the `jartsr='` magic is defined which enables the creation of
103+
APE binaries that can safely target all non-Windows platforms. Even
104+
though this magic is less common, APE interpreters and binfmt-misc
105+
installations MUST support this.
106+
107+
It is strongly recommended that this magic value be immediately followed
108+
by a newline (\n or hex 0a) character. Some shells, e.g. FreeBSD SH and
109+
Zsh impose a binary safety check before handing off files that don't
110+
have a shebang to `/bin/sh`. That check applies to the first line, which
111+
can't contain NUL characters.
112+
113+
The letters were carefully chosen so as to be valid x86 instructions in
114+
all operating modes. This makes it possible to store a BIOS bootloader
115+
disk image inside an APE binary. For example, simple CLI programs built
116+
with Cosmopolitan Libc will boot from BIOS into long mode if they're
117+
treated as a floppy disk image.
118+
119+
The letters also allow for the possibility of being treated on x86-64 as
120+
a flat executable, where the PE / ELF / Mach-O executable structures are
121+
ignored, and execution simply begins at the beginning of the file,
122+
similar to how MS-DOS .COM binaries work.
123+
124+
The 0x78 relative offset of the magic causes execution to jump into the
125+
MS-DOS stub defined by Portable Executable. APE binaries built by Cosmo
126+
Libc use tricks in the MS-DOS stub to check the operating mode and then
127+
jump to the appropriate entrypoint, e.g. `_start()`.
128+
129+
#### Decoded as i8086 / i386 / x86-64
130+
131+
```asm
132+
push $0x61
133+
jb 0x78
134+
jae 0x78
135+
```
136+
137+
### (3) APE Debug Magic
138+
139+
- ASCII: `APEDBG='`
140+
- Hex: 41 50 45 44 42 47 3d 27
141+
142+
While APE files must be valid shell scripts, in practice, UNIX systems
143+
will oftentimes be configured to provide a faster safer alternative to
144+
loading an APE binary through `/bin/sh`. The Linux Kernel can be patched
145+
to have execve() recognize the APE format and directly load its embedded
146+
ELF header. Linux systems can also use binfmt-misc to recognize APE's MZ
147+
and jartsr magic, and pass them to a userspace program named `ape` that
148+
acts as an interpreter. In such environments, the need sometimes arises
149+
to be able to test that the `/bin/sh` is working correctly, in which
150+
case the `APEDBG='` magic is RECOMMENDED.
151+
152+
APE interpreters, execve() implementations, and binfmt-misc installs
153+
MUST ignore this magic. If necessary, steps can be taken to help files
154+
with this magic be passed to `/bin/sh` like a normal shebang-less shell
155+
script for execution.
156+
157+
## Embedded ELF Header
158+
159+
APE binaries MAY embed an ELF header inside them. Unlike conventional
160+
executable file formats, this header is not stored at a fixed offset.
161+
It's instead encoded as octal escape codes in a shell script `printf`
162+
statement. For example:
163+
164+
```
165+
printf '\177ELF\2\1\1\011\0\0\0\0\0\0\0\0\2\0\076\0\1\0\0\0\166\105\100\000\000\000\000\000\060\013\000\000\000\000\000\000\000\000\000\000\000\000\000\000\165\312\1\1\100\0\070\0\005\000\0\0\000\000\000\000'
166+
```
167+
168+
This `printf` statement MUST appear in the first 8192 bytes of the APE
169+
executable, so as to limit how much of the initial portion of a file an
170+
intepreter must load.
171+
172+
Multiple such `printf` statements MAY appear in hte first 8192 bytes, in
173+
order to specify multiple architectures. For example, fat binaries built
174+
by the `apelink` program (provided by Cosmo Libc) will have two encoded
175+
ELF headers, for amd64 and arm64, each of which point into the proper
176+
file offsets for their respective native code. Therefore, kernels and
177+
interpreters which load the APE format directly MUST check the
178+
`e_machine` field of the `Elf64_Ehdr` that's decoded from the octal
179+
codes, before accepting a `printf` shell statement as valid.
180+
181+
These printf statements MUST always use only unescaped ASCII characters
182+
or octal escape codes. These printf statements MUST NOT use space saving
183+
escape codes such as `\n`. For example, rather than saying `\n` it would
184+
be valid to say `\012` instead. It's also valid to say `\12` but only if
185+
the encoded characters that follow aren't an octal digit.
186+
187+
For example, the following algorithm may be used for parsing octal:
188+
189+
```c
190+
static int ape_parse_octal(const unsigned char page[8192], int i, int *pc)
191+
{
192+
int c;
193+
if ('0' <= page[i] && page[i] <= '7') {
194+
c = page[i++] - '0';
195+
if ('0' <= page[i] && page[i] <= '7') {
196+
c *= 8;
197+
c += page[i++] - '0';
198+
if ('0' <= page[i] && page[i] <= '7') {
199+
c *= 8;
200+
c += page[i++] - '0';
201+
}
202+
}
203+
*pc = c;
204+
}
205+
return i;
206+
}
207+
```
208+
209+
APE aware interpreters SHOULD only take `e_machine` into consideration.
210+
It is the responsibility of the `_start()` function to detect the OS.
211+
Therefore, multiple `printf` statements are only embedded in the shell
212+
script for different CPU architectures.
213+
214+
The OS ABI field of an APE embedded `Elf64_Ehdr` SHOULD be set to
215+
`ELFOSABI_FREEBSD`, since it's the only UNIX OS APE supports that
216+
actually checks the field. However different values MAY be chosen for
217+
binaries that don't intend to have FreeBSD in their support vector.
218+
219+
Counter-intuitively, the ARM64 ELF header is used on the MacOS ARM64
220+
platform when loading from fat binaries.
221+
222+
## Embedded Mach-O Header (x86-64 only)
223+
224+
APE shell scripts that support MacOS on AMD64 must use the `dd` command
225+
in a very specific way to specify how the embedded binary Macho-O header
226+
is copied backward to the start of the file. For example:
227+
228+
```
229+
dd if="$o" of="$o" bs=8 skip=433 count=66 conv=notrunc
230+
```
231+
232+
These `dd` statements have traditionally been generated by the GNU as
233+
and ld.bfd programs by encoding ASCII into 64-bit linker relocations,
234+
which necessitated a fixed width for integer values. It took several
235+
iterations over APE's history before we eventually got it right:
236+
237+
- `arg=" 9293"` is how we originally had ape do it
238+
- `arg=$(( 9293))` b/c busybox sh disliked quoted space
239+
- `arg=9293 ` is generated by modern apelink program
240+
241+
Software that parses the APE file format, which needs to extract to be
242+
able extract the Macho-O x86-64 header SHOULD support the old binaries
243+
that use the previous encodings. To make backwards compatibility simple
244+
the following regular expression may be used, which generalizes to all
245+
defined formats:
246+
247+
```c
248+
regcomp(&rx,
249+
"bs=" // dd block size arg
250+
"(['\"] *)?" // #1 optional quote w/ space
251+
"(\\$\\(\\( *)?" // #2 optional math w/ space
252+
"([[:digit:]]+)" // #3
253+
"( *\\)\\))?" // #4 optional math w/ space
254+
"( *['\"])?" // #5 optional quote w/ space
255+
" +" //
256+
"skip=" // dd skip arg
257+
"(['\"] *)?" // #6 optional quote w/ space
258+
"(\\$\\(\\( *)?" // #7 optional math w/ space
259+
"([[:digit:]]+)" // #8
260+
"( *\\)\\))?" // #9 optional math w/ space
261+
"( *['\"])?" // #10 optional quote w/ space
262+
" +" //
263+
"count=" // dd count arg
264+
"(['\"] *)?" // #11 optional quote w/ space
265+
"(\\$\\(\\( *)?" // #12 optional math w/ space
266+
"([[:digit:]]+)", // #13
267+
REG_EXTENDED);
268+
```
269+
270+
For further details, see the canonical implementation in
271+
`cosmopolitan/tool/build/assimilate.c`.

libc/thread/pthread_cond_signal.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
* pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
2727
* // ...
2828
* pthread_mutex_lock(&lock);
29-
* pthread_cond_signal(&cond, &lock);
29+
* pthread_cond_signal(&cond);
3030
* pthread_mutex_unlock(&lock);
3131
*
3232
* This function has no effect if there aren't any threads currently

0 commit comments

Comments
 (0)