WerWolv
https://werwolv.net/
Sat, 31 May 2025 03:08:47 +0000Sat, 31 May 2025 03:08:47 +0000Pico<style>
.circle-mask {
clip-path: circle(69.5% at center);
}
</style>
<h2>About me</h2>
<div style="float:right; padding-left:20px;">
<img width="200px" src="content/assets/_about/me.jpg" class="circle-mask">
</div>
<p>Hi, my name is Nik aka WerWolv. I'm a <span id="age"></span> year old embedded systems electronics engineer from Switzerland. I'm fascinated by embedded systems, low level coding, ARM microcontroller dev, operating systems as well as console homebrew and custom firmwares. Most of the things I develop are open source and available for free for everyone to use on my <a href="https://github.com/WerWolv">GitHub</a> page.</p>
<p>Besides programming, I love mountain climbing, playing video games and listening to metal music (as you might have assumed already).</p>
<script>
var age = document.getElementById("age");
var date = new Date(Date.now() - new Date("1998-11-04").getTime());
age.innerHTML = date.getUTCFullYear() - 1970;
</script>
<p></br></p>
<hr />
<h2>Current activities</h2>
<ul>
<li>I'm working on a open source, cross platform Hex Editor which includes a full custom Programming Language for highlighting and decoding data formats.</li>
<li>I've been part of the Nintendo Switch homebrew community since the very beginning in early 2018 and have worked on numerous different Homebrew projects in the scene.</li>
<li>I've spent a lot of time reverse engineering Windows applications for writing mods, patches, cheats and utilities for many different games and tools.</li>
</ul>
<h2>Development Skills</h2>
<h3>Programming languages</h3>
<p>I have experience in writing code in the following programming languages (most experienced to least experienced)</p>
<ul>
<li><strong>C++</strong></li>
<li>C</li>
<li>ARM Assembly </li>
<li>Java</li>
<li>C#</li>
<li>VHDL</li>
<li>Python</li>
<li>Rust</li>
<li>Matlab</li>
<li>JavaScript</li>
<li>x86 Assembly</li>
<li>Lua</li>
<li>PHP</li>
<li>Golang</li>
</ul>
<h3>Experience</h3>
<ul>
<li>PCB design using Altium Designer</li>
<li>Web design (both front- and backend) in HTML5, CSS3, JavaScript and PHP</li>
<li>API design</li>
<li>Microcontroller development on mainly ARM (STM32) and 8051 (SiLabs C8051) as well as the Arduino framework</li>
<li>FPGA design and development on Altera Cyclone chips using Intel Quartus, Sigasi and Modelsim</li>
<li>Linux for embedded systems </li>
<li>Reverse engineering applications, libraries and file formats using Ghidra and x64dbg</li>
<li>Building and using 3D Printers as well as designing models in Blender and OpenSCAD</li>
<li>Version Control with GitHub and co.</li>
<li>3D application development using raw OpenGL, ImGui, GLFW, SDL or Unity</li>
</ul>
<h2>Higher education</h2>
<ul>
<li>Apprenticeship as Electronics Technician (Elektroniker EFZ)</li>
<li>Vocational School (Matura)</li>
<li>Bachelor of Science in Electrical Engineering and Information Technology
<ul>
<li>Specialized in Embedded Systems</li>
</ul></li>
</ul>
<h2>Notable projects</h2>
<ul>
<li><a href="https://github.com/WerWolv/ImHex">ImHex</a>, A hex editor for reverse engineers, programmers and malware researchers</li>
<li><a href="https://edizon.werwolv.net">EdiZon</a>, A save file manager, script based save file editor and cheating framework for the Nintendo Switch</li>
<li><a href="https://tesla.werwolv.net">Tesla</a>, An overlay ecosystem allowing developers to write custom homebrew overlays on the Nintendo Switch</li>
<li><a href="https://github.com/WerWolv/ARMv8Emulator">Archway</a>, A work in progress ARMv8 byte code emulator</li>
<li><a href="https://github.com/WerWolv/ILInterpreter">ILInterpreter</a>, A work in progress Microsoft Common Intermediate Language interpreter</li>
</ul>
<h2>Contact</h2>
<p>I'm available for any kind of programming / hacking talks, questions about my tools and other stuff.
The easiest way to reach me is through any of the following ways:</p>
<ul>
<li>Through Discord <a href="https://discord.gg/vAM4mAEb2q">@WerWolv#1337</a></li>
<li>On Twitter <a href="https://twitter.com/WerWolv">@WerWolv</a></li>
<li>Via E-Mail at <a href="mailto://[email protected]">[email protected]</a></li>
</ul>Sat, 31 May 2025 03:08:47 +0000
https://werwolv.net/about
https://werwolv.net/aboutThermal Printer BLE Protocol reverse engineering<style>
.column {
float: left;
width: 50%;
padding: 5px;
}
.row::after {
content: "";
clear: both;
display: table;
}
</style>
<p><img src="/content/assets/cat_printer/printer.jpg" alt="Printer" /></p>
<h2>Overview</h2>
<p>Thermal Printers are amazing for quickly printing out notes and todo lists since all they need is paper and some power. Printing is done by heating up the paper to color it black without needing any sort of ink that can run out. Unfortunately the model I got here only supports Bluetooth and the only official way to talk to it is through the <del>horrible</del> <a href="https://apkplz.net/app/com.frogtosea.smartprint">iPrint</a> app. </p>
<h2>Hardware</h2>
<p>Two weeks ago I ordered this thermal printer from <a href="https://www.aliexpress.com/item/1005001666191411.html">AliExpress</a> hoping it would be the same as the one a friend got a while ago. Their printer has an STM32 MCU on it as well as a super nicely labeled UART header. Unfortunately for me, mine has some weird ass controller with no information online whatsoever and no UART or USB support. Great. </p>
<div class="row">
<div class="column">
<img src="/content/assets/cat_printer/pcb_front.jpg" alt="PCB Front" style="width:100%">
</div>
<div class="column">
<img src="/content/assets/cat_printer/pcb_back.jpg" alt="PCB Back" style="width:100%">
</div>
</div>
<table>
<thead>
<tr>
<th>#</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>A</td>
<td>LDO to generate lower voltages required to drive for example the MCU</td>
</tr>
<tr>
<td>B</td>
<td>H-Bridge which controls the stepper motor of the printer</td>
</tr>
<tr>
<td>C</td>
<td>Step-up 2-cell Lithium Battery charger IC</td>
</tr>
<tr>
<td>D</td>
<td>Weird ass MCU apparently nobody has ever heard of. It has an integrated bluetooth PHY</td>
</tr>
</tbody>
</table>
<h2>BLE</h2>
<p>The only thing it does have is Bluetooth, or rather BLE. And the only way to talk to it is through a proprietary, shady, chinese app you have to download from some apk mirroring site because it's not even on the Play Store. It's called <a href="https://apkplz.net/app/com.frogtosea.smartprint">iPrint</a> and has some basic functions to let you print out photos, text and some weird built-in frames and images.</p>
<p>But it works pretty well :)</p>
<p><img src="/content/assets/cat_printer/test_print.jpg" alt="Test Print" /></p>
<p>The main problem though is, having to use my Phone to print those notes isn't really all that great. Being able to quickly generate notes on the computer and printing them out would be so much more useful! Unfortunately nobody's done anything like this for that printer already so I guess I have to do it.</p>
<h2>Reverse Engineering the app</h2>
<p>Since the App is the only thing that really has the ability to talk to the printer, let's start there.
There's probably better ways to do this but I simply googled for <code>apk decompiler</code>, clicked on the first link and used that. It's a site called <a href="http://www.javadecompilers.com/apk">javadecompilers.com</a> where I uploaded the iprint apk I downloaded earlier and after a few minutes the fully decompiled project was ready to be downloaded.</p>
<p>At first I was shocked, the decompiled code was almost 200MB with 14k files but looking into the project a bit I noticed, most of them are just libraries they bundled into the app. Looking around a bit I found a promissing looking file called <code>com.blueUtils.PrintDataUtils.java</code>. In there are multiple functions that format provided input data into a byte array of "cmds" that will end up being sent to the printer over BLE. </p>
<p>Let's first take a look at a function named <code>public byte[] eachLinePixToCmdB(byte[] bArr, int i, int i2)</code>. The decompilation isn't that great but there are multiple similar looking sections in there like this one:</p>
<pre><code class="language-java">LogUtils.m1960e(Integer.valueOf(getEneragy()));
bArr2 = new byte[((i7 * length) + BluetoothOrder.print_text.length + 9 + 10)];
byte[] bArr5 = new byte[10];
bArr5[0] = 81;
bArr5[1] = 120;
bArr5[2] = -81;
bArr5[3] = 0;
bArr5[4] = 2;
bArr5[5] = 0;
bArr5[6] = ConvertUtils.hexString2Bytes(Integer.toHexString(getEneragy()))[1];
bArr5[7] = ConvertUtils.hexString2Bytes(Integer.toHexString(getEneragy()))[0];
bArr5[8] = BluetoothOrder.calcCrc8(bArr5, 6, 2);
bArr5[9] = -1;
System.arraycopy(bArr5, 0, bArr2, 0, bArr5.length);
this.packageLength += 10;</code></pre>
<p>(There's gramatical and logical errors in the naming of functions everywhere. The developers absolutely weren't native english speaking.)</p>
<p>Since Java doesn't really know unsigned variables, there's some negative values in there. However they can simply be converted to unsigned representation by using the 2's completement rules.
Comparing all the sections that look like this, I concluded that the command protocol must look something like this:</p>
<pre><code class="language-rust">Magic0: 0x51
Magic1: 0x78
CommandID: 0x00 - 0xFF
AlwaysZero0: 0x00
Data Size: 0x00 - 0xFF
AlwaysZero1: 0x00
Data: [ Array of bytes with the length provided before, Big Endian ]
DataCRC8: 0x00 - 0xFF
Magic4: 0xFF</code></pre>
<p>Of course, after I reverse engineered this all manually, I randomly stumbled over <code>BluetoothOrder.java</code>.
<img src="https://i.imgur.com/n6iqMt6.png" alt="" />
A list of hardcoded commands used by the app as well as the CRC8 look up table used for the calculation. The table is pretty much the default one though.
With this table I could conclude the following list of command IDs:</p>
<pre><code class="language-rust">RetractPaper: 0xA0
FeedPaper: 0xA1
DrawBitmap: 0xA2
SetDrawingMode: 0xBE
SetEnergyLevel: 0xAF
SetQuality: 0xA4</code></pre>
<p>Perfect, everything that's needed to start talking to the printer!</p>
<h2>Talking to the printer</h2>
<p>It took me a while to find a library (and language) that supported talking to BLE on Windows but ultimately I ended up using Python with the <a href="https://github.com/hbldh/bleak">Bleak library</a>. Looking at their example suggested I need to provide the device's bluetooth mac address as well as some UUID. I ended up reading through some of the BLE specs and found out that this UUID was an identifier for a so called Characteristic the printer provides. It's basically like specifying what port to send the data to for the printer to properly receive it.
Finding the device ID was pretty simple, I downloaded some bluetooth analysis app from the Play Store and the mac address was displayed there right away. But how on earth do I find the characteristic UUID? My first thought was to look at the app again since it needs to be hardcoded there somewhere. Searching for UUIDs in general yielded a lot of results so I quickly stopped there.
Instead I downloaded a free app from the Microsoft Store (lol) called <a href="https://www.microsoft.com/en-us/p/bluetooth-le-lab/9n6jd37gwzc8?activetab=pivot:overviewtab">Bluetooth LE Lab</a> which was amazingly helpful for this. Selecting the device brings you to this screen which displays all Services and Characteristics of the device.
<a href="https://i.imgur.com/FkNot6R.png"></a></p>
<p>There's two <code>WriteWithoutResponse</code>, two <code>Notify</code>, one <code>Indicate</code> and one <code>Read, Write</code> characteristic. Since they have commands in their app that read data from the printer, the only one that really worked was the <code>Read, Write</code> one: <code>0000AE10-0000-1000-8000-00805F9B34FB</code>. The app even allowed me to directly send data to the device.
I converted the <code>paper</code> command array found in the app to hexadecimal and 🎉, paper was ejected from the printer!</p>
<h3>Python reimplementation</h3>
<p>Now that I knew that the commands worked, I started to reimplement it in Python. It took a while to get the whole thing installed on Windows and to get it to talk to the computer's Bluetooth module but in the end I got the same command working from Python.</p>
<h4>Message creation</h4>
<pre><code class="language-python">crc8_table = [
0x00, 0x07, 0x0e, 0x09, ... # Rest of array
]
def crc8(data):
crc = 0
for byte in data:
crc = crc8_table[(crc ^ byte) & 0xFF]
return crc & 0xFF
def formatMessage(command, data):
data = [ 0x51, 0x78 ] + [command] + [0x00] + [len(data)] + [0x00] + data + [crc8(data)] + [0xFF]
return data</code></pre>
<h4>Feeding paper</h4>
<pre><code class="language-python">PrinterAddress = "93:2A:BB:C4:95:8D"
PrinterCharacteristic = "0000AE01-0000-1000-8000-00805F9B34FB"
FeedPaper = 0xA1
async def feedPaper():
device = await BleakScanner.find_device_by_address(PrinterAddress, timeout=20.0)
async with BleakClient(device) as client:
await client.write_gatt_char(PrinterCharacteristic, formatMessage(FeedPaper, [0x70, 0x00]))
loop = asyncio.get_event_loop()
loop.run_until_complete(feedPaper())</code></pre>
<h4>Drawing things</h4>
<p>Looking through the App once again, I noticed this printer doesn't really support printing text directly. Instead what they do is make the app render whatever the user enters using HTML, render that to a bitmap and then send that bitmap to the printer. The <code>DrawBitmap</code> command <code>0xA2</code> takes an array of bytes where each bit represents one pixel in the image. If the bit is a 1, the printer will burn the paper at that position, if it's a 0 it won't. I found this out by simply sending some patterns to the printer. <code>0xFF</code> lead to a opaque line, <code>0xAA</code> lead to every second pixel to be drawn. Knowing that, I wrote a quick function that loaded in a image using the PIL library and turned it into a byte array line-by-line. To print a full image now though, we need to print multiple lines. This is done by drawing a single line and then advancing one step using the <code>FeedPaper</code> command <code>0xA1</code> and so on. After many attemps I finally managed to get it right.</p>
<h2>Result</h2>
<p>The whole project took me about 6 hours spread out over the course of two days. When I first got the printer and took it apart I was really disappointed to not find any UART or USB interface but now I'm really happy I had to use BLE for it. It made the whole thing really fun and easy to use now since it doesn't need any extra hardware (besides a computer with Bluetooth support). Finally though, the result. Fucking worth it.</p>
<video controls width="80%">
<source src="/content/assets/cat_printer/printing.mp4" type="video/mp4">
</source></video>
<p><br/>
I published all my example code here on my GitHub: <a href="https://github.com/WerWolv/PythonCatPrinter">https://github.com/WerWolv/PythonCatPrinter</a>
<img src="https://opengraph.githubassets.com/24e9a9abd03209b1a961763ca27e24d036d52dfc78827d19f975abf90ecc7b44/WerWolv/PythonCatPrinter" alt="" /></p>
<p><br/>
<br/>
There's still a ton to do though. Printing right now is really slow because sending data too fast causes the Printer to jam up and refuses to do anything anymore until it's restarted. The App does have some sort of compression for the data but I did not yet manage to figure out how it works.
The next step probably will be to get text rendering and printing to work. I'll update this blog once I know more.</p>Fri, 30 Apr 2021 00:00:00 +0000
https://werwolv.net/blog/cat_printer
https://werwolv.net/blog/cat_printerImGui Game Overlays using DLL injection<h2>Introduction</h2>
<p>I started this project mainly because I wanted a proper XP tracker I didn't have to pay extra for since the price felt really unjustifiable to me.</p>
<p>This page explains how I used Visual C++ to write a DLL injector toolfor <a href="http://runescape.com/">RuneScape's</a>
NXT Client together with a DLL that hooks into RuneScape's OpenGL draw calls to render a <a href="https://github.com/ocornut/imgui">dear ImGui</a> overlay on top of it.</p>
<p><img src="https://werwolv.net/content/assets/dll_injection/rs_with_imgui.jpg" alt="RuneScape with imGui Overlay" /></p>
<h2>DLL Injecting</h2>
<p>For the DLL Injector, I used a basic C++ <code>Console App</code> from Visual Studio. We don't need anything fancy for this PoC.</p>
<p>The principle of DLL injection is the following:</p>
<pre><code>1. Find the PID of the process the DLL should be injected to.
2. Use the Windows API to get a Handle for that Process.
3. Allocate some memory in the target process and copy the DLL's path into it.
4. Start a new Thread in the target process with `LoadLibraryA` as the start routine.</code></pre>
<p>In code, this looks like follows:</p>
<h3>Finding the PID</h3>
<pre><code class="language-cpp">std::uint32_t pid = 0;
// Loop infinitely till RuneScape is launched
while (pid == 0) {
pid = getPID(L"rs2client.exe");
Sleep(1000);
}</code></pre>
<pre><code class="language-cpp">#include <Windows.h>
#include <TlHelp32.h>
#include <cstdint>
#include <string>
std::uint32_t getPID(const std::wstring&& processName) {
std:uint32_t pid = 0;
// Create snapshot
HANDLE hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
// Check if the snapshot is valid, otherwise bail out
if (hSnap == INVALID_HANDLE_VALUE)
return 0;
PROCESSENTRY32 procEntry{};
procEntry.dwSize = sizeof(PROCESSENTRY32);
// Iterate over all processes in the snapshot
if (Process32First(hSnap, &procEntry)) {
do {
// Check if current process name is the same as the passed in process name
if (_wcsicmp(procEntry.szExeFile, processName.c_str()) == 0) {
pid = procEntry.th32ProcessID;
break;
}
} while (Process32Next(hSnap, &procEntry));
}
// Cleanup
CloseHandle(hSnap);
return pid;
}</code></pre>
<p>This code uses the Windows API found in the <code>Windows.h</code> header to first create a snapshot of all running processes
using <code>CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0)</code> and then iterating over it continuously calling <code>Process32Next(hSnap, &procEntry)</code>
to get the next entry in the list. This is done until the process name of the current process matches the passed in name,
in our case <code>rs2client.exe</code>.</p>
<h3>Getting a Process Handle</h3>
<p>This is super simple and straight-forward. We can just use the Windows API again.</p>
<pre><code class="language-cpp">HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, pid);</code></pre>
<p>This will give us a handle based on the PID with full access to that process. It's how we interact with the RuneScape client.</p>
<h3>Allocate memory in the target process</h3>
<p>This part is done in preparation for the next step. It's allocting memory in the remote process and places the DLL string inside of it.
This is needed since we need to call the LoadLibrary function there which takes in the path to the DLL to load.</p>
<pre><code class="language-cpp">// Allocate memory in the remote process
void *injectDllPathRemote = VirtualAllocEx(hProc, 0x00,
MAX_PATH, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
// If allocation failed, bail out
if (injectDllPathRemote == nullptr)
return 1;
// Write DLL path to the memory we just allocated
constexpr const char *dllPath = "C:\\path\\to\\inject.dll";
WriteProcessMemory(hProc, injectDllPathRemote, dllPath, strlen(dllPath) + 1, 0);</code></pre>
<h3>Starting the thread</h3>
<p>Now we're putting everything together. Starting a thread in the RuneScape process using <code>CreateRemoteThread(...)</code> with LoadLibraryA as the
thread routine. This actually only works because of the nice coincidence that the signature of a <code>LPTHREAD_START_ROUTINE</code> is very similar to
the one of <code>LoadLibraryA</code>. Both functions have a pointer as argument and a integer as return value. If there were more parameters to this function, it would get a lot more difficult to do DLL injection. </p>
<pre><code class="language-cpp">// Create a thread in the RuneScape process which
// runs LoadLibraryA("C:\\path\\to\\inject.dll")
HANDLE hRemoteThread = CreateRemoteThread(hProc, nullptr, 0,
(LPTHREAD_START_ROUTINE)LoadLibraryA, injectDllPathRemote, 0, nullptr);
// Check if we succeeded
if (hRemoteThread != nullptr && hRemoteThread != INVALID_HANDLE_VALUE)
CloseHandle(hRemoteThread);
else
printf("[*] Error starting thread! Error Code: %x\n", GetLastError());</code></pre>
<p>If <code>CreateRemoteThread</code>, we can now execute our code in the context of the target process allowing us to read it's memory directly, patch code and insert hooks. Now we have to make a DLL that handles the hooking.</p>
<h2>Building the DLL</h2>
<p>A DLL can be made by creating a <code>Dynamic-Link Library (DLL)</code> project in Visual Studio which sets up all the build configuration correctly and
includes a basic template containing the DLL's <code>DllMain(...)</code> entry-point function.</p>
<h3>Running code</h3>
<p>To actually run our code in the Game, we need to start yet another thread. This is generally a bad idea
since during <code>DllMain</code> runs, Windows' loader lock is held. This means many Windows calls that use the loader will cause the application to dead-lock. We don't use any of these functions here though so we're safe for the most part. In our case, a thread is definitely needed since <code>DllMain</code> blocks both our injector and API calls elsewhere in the target application. Therefor it has to run and finish quickly without blocking us.</p>
<p>Our DLL simply creates a new thread that runs our code when being loaded.</p>
<pre><code class="language-cpp">BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved) {
switch (ul_reason_for_call) {
case DLL_PROCESS_ATTACH:
{
HANDLE hThread = CreateThread(nullptr, 0,
(LPTHREAD_START_ROUTINE)patcherThread, hModule, 0, 0);
if (hThread != nullptr)
CloseHandle(hThread);
break;
}
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}</code></pre>
<h3>Getting console output from the DLL</h3>
<p>For debug purposes it's usually useful to have some sort of logging in our application. Luckily, the Windows API once again got us covered. Using the <code>AllocConsole</code> function allows us to create a new console window in the current program. Make sure to not close this window though as closing it will act as a <code>SIGINT</code> exception, potentially crashing the game if exceptions don't get handled properly.</p>
<pre><code class="language-cpp">AllocConsole(); // Open a new console window
FILE *f = new FILE();
freopen_s(&f, "CONOUT$", "w", stdout); // Redirect stdout to CONOUT$, the
// current console window.
printf("[*] Running under RuneScape!\n"); // Console works!</code></pre>
<h3>Hooking OpenGL</h3>
<p>The secret of drawing an overlay in any process is hooking the graphic library's "Frame End" function.
In case of OpenGL this function is called <code>wglSwapBuffers</code>, in case of DirectX it's <code>d3dEndScene</code>.
We simply let the Game draw all it's content and when it calls the function to end the current frame, we draw our overlay on top before calling the actual end frame function.</p>
<p>Note: An easy way to find out what the Game you want to hook uses, is to load the Game executable into Ghidra and checking it's imports in the Symbol Tree.</p>
<p><img src="https://werwolv.net/content/assets/dll_injection/ghidra_imports.png" alt="Imports" /></p>
<p>(RuneScape imports both opengl32.dll AND d3d9.dll here but according to the wiki, it only uses Direct3D if OpenGL is not working)</p>
<p>But how does hooking even work?</p>
<p>A hook works by overwriting some instruction(s) in a functions code with a <code>jmp</code> instruction.</p>
<h5>Before patching</h5>
<p><img src="https://werwolv.net/content/assets/dll_injection/swapBuffers_pre.png" alt="Pre Patching" /></p>
<h5>After patching</h5>
<p><img src="https://werwolv.net/content/assets/dll_injection/swapBuffers_post.png" alt="Post Patching" /></p>
<p>This instruction will redirect execution flow to trampoline routine which first executes the instruction(s) we overwrote with the jump. Then we have to safe the current context. We don't know how our code modifies the registers but what we know is that after our code ran and execution gets back to the hooked function, they need to be in the same state as before our hook ran. Otherwise the original function might end up doing unpredictable things or just straight out crashes. This is usually done by pushing all registers onto the stack, executing the hook, poping all registers back into the right registers and then jumping back right after the injected <code>jmp</code> instruction in the original function.</p>
<pre><code class="language-asm">PUBLIC wglSwapBuffersTrampoline
wglSwapBuffersTrampoline PROC
mov [rsp + 20], rsi ; Execute the instruction that
; was overwritten by our hook patch.
push rax ; Safe the current context.
push rbx ; Pushing all the registers is probably
push rcx ; overkill but better safe than sorry.
push rdx
push rsi
push rdi
push rbp
push rsp
push r8
push r9
push r10
push r11
push r12
push r13
push r14
push r15
call wglSwapBuffers_hook; ; Call our hook
pop r15 ; Restore the context in reverse order
pop r14 ; as a Stack is a FILO buffer (First in last out)
pop r13
pop r12
pop r11
pop r10
pop r9
pop r8
pop rsp
pop rbp
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
jmp wglSwapBuffers_return ; Jump back to the original function.
; This is the address of where our jmp
; instrction was inserted + 5, so immediately
; after it.
wglSwapBuffersTrampoline ENDP</code></pre>
<p>Now that we have a place for our hook to jump to, we need to insert it into the function we want to hook.
The following functions take care of removing code page write restrictions, inserting the hook and restoring the original restrictions again. </p>
<pre><code class="language-cpp">namespace mem {
template<typename T>
T read(DWORD64 addr) {
return *((T *)addr);
}
template<typename T>
void write(DWORD64 addr, T value) {
*((T *)addr) = value;
}
template<typename T>
DWORD64 protect(DWORD64 addr, DWORD protection) {
DWORD oldProtection;
VirtualProtect((LPVOID)addr, sizeof(T), protection, &oldProtection);
return oldProtection;
}
DWORD64 hookFunction(DWORD64 hookAt, DWORD64 newFunc, unsigned int size) {
DWORD64 newOffset = newFunc - hookAt - 5; // -5 since the jump is relative
// to the next instruction
auto oldProtection = mem::protect<DWORD[3]>(hookAt + 1, PAGE_EXECUTE_READWRITE);
mem::write<BYTE>(hookAt, 0xE9); // Opcode of the jmp instruction
mem::write<DWORD>(hookAt + 1, newOffset);
for (unsigned int i = 5; i < size; i++) // nop extra bytes to avoid
// corrupting the overwritten opcode
mem::write<BYTE>(hookAt + i, 0x90);
mem::protect<DWORD[3]>(hookAt + 1, oldProtection);
return hookAt + 5;
}
}</code></pre>
<p>Using this, a hook can be inserted as follows:</p>
<pre><code class="language-cpp">using wglSwapBuffers_t = void(*)(_In_ HDC hDc);
extern "C" wglSwapBuffers_t wglSwapBuffers_return = nullptr;
extern "C" void wglSwapBuffersTrampoline();
// ...
// Get a handle to the opengl.dll
HMODULE hOpengl32 = GetModuleHandle(L"opengl32.dll");
if (hOpengl32 != nullptr) {
// Get the address of wglSwapBuffers
DWORD64 wglSwapBuffersHookAddr = (DWORD64)GetProcAddress(hOpengl32,
"wglSwapBuffers");
// Insert a hook to our trampolineat the start of wglSwapBuffers,
// returns the address to return to
wglSwapBuffers_return = (glSwapBuffers_t) mem::hookFunction(
wglSwapBuffersHookAddr, (DWORD64)wglSwapBuffersTrampoline, 5);
}</code></pre>
<h3>Drawing the overlay</h3>
<p>Finally, after all this work we can start drawing the imgui overlay.
For this, we just download the imgui source code and compile it together with the rest of the DLL
code. Imgui also needs a opengl wrapper to compile, I used glew for this. For it to compile, both the opengl32.lib and glew32<strong>s</strong>.lib have to be linked into the DLL. opengl32.lib gets dynamically linked as it's already been loaded by RuneScape but glew <strong>HAS</strong> to be linked statically since we can't load another DLL within the injected DLL without risking a dead-lock.</p>
<p>Using the imgui impl files for win32 and opengl3 found in <code>examples</code> folder of it's repo, a simple overlay can be created. I used <code>imgui_impl_win32.h</code> and <code>imgui_impl_opengl3.h</code> for RuneScape but this depends heavily on what your game uses.</p>
<p>During initialization of our graphics stuff, we also hook the game's wndProc callback. It's use for us is to capture keyboard and mouse events and direct them to imgui. It also allows us to toggle the overlay though and taking away focus from the game when the overlay is present. </p>
<pre><code class="language-cpp">HWND hGameWindow;
WNDPROC hGameWindowProc;
bool menuShown = true;
LRESULT CALLBACK windowProc_hook(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
// Toggle the overlay using the delete key
if (uMsg == WM_KEYDOWN && wParam == VK_DELETE) {
menuShown = !menuShown;
return false;
}
// If the overlay is shown, direct input to the overlay only
if (menuShown) {
CallWindowProc(ImGui_ImplWin32_WndProcHandler, hWnd, uMsg, wParam, lParam);
return true;
}
// Otherwise call the game's wndProc function
return CallWindowProc(hGameWindowProc, hWnd, uMsg, wParam, lParam);
}
void glSwapBuffers_hook(HDC hDc)
{
// Initialize glew and imgui but only once
static bool imGuiInitialized = false;
if (!imGuiInitialized) {
imGuiInitialized = true;
// Get the game's window from it's handle
hGameWindow = WindowFromDC(hDc);
// Overwrite the game's wndProc function
hGameWindowProc = (WNDPROC)SetWindowLongPtr(hGameWindow,
GWLP_WNDPROC, (LONG_PTR)windowProc_hook);
// Init glew, create imgui context, init imgui
glewInit();
ImGui::CreateContext();
ImGui_ImplWin32_Init(hGameWindow);
ImGui_ImplOpenGL3_Init();
ImGui::StyleColorsDark();
ImGui::GetStyle().AntiAliasedFill = false;
ImGui::GetStyle().AntiAliasedLines = false;
ImGui::CaptureMouseFromApp();
ImGui::GetStyle().WindowTitleAlign = ImVec2(0.5f, 0.5f);
}
// If the menu is shown, start a new frame and draw the demo window
if (menuShown) {
ImGui_ImplOpenGL3_NewFrame();
ImGui_ImplWin32_NewFrame();
ImGui::NewFrame();
ImGui::ShowDemoWindow();
ImGui::Render();
// Draw the overlay
ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData());
}
}</code></pre>
<h2>Conclusion</h2>
<p>While it's pretty simple to inject a DLL, there are a lot of things that can go wrong. Here are some of the issues I faced when writing this and how I got around them:</p>
<h3>LoadLibraryA fails with Access Denied</h3>
<p>This happened after adding glew to the DLL. I tried to dynamically link to glew.dll by loading it in my own DLL. This does not work as my DLL now depended on glew already being loaded. I fixed it by simply linking glew statically.</p>
<h3>Calling any function in the hook causes a segfault</h3>
<p>This was because I forgot to push/pop one register in the trampoline causing the context to be tainted
when returning back to the original function. I also originally didn't replicate the instruction I overwrote with the jump which caused the stack to corrupt when the hooked function tried to return</p>
<h3>Imgui doesn't receive any mouse input</h3>
<p>StackOverflow suggested to use <code>SetWindowLong</code> to overwrite the wndProc function. This did not work and my hook was never called. Switching to <code>SetWindowLongPtr</code> instead fixed the issues.</p>Sun, 19 Apr 2020 00:00:00 +0000
https://werwolv.net/blog/dll_injection
https://werwolv.net/blog/dll_injectionGetting a home-made OS running on a STM32MP1 based development board<p><img src="/content/assets/mp1os/board.jpg" alt="" /></p>
<h1>Overview</h1>
<p>The STM32MP157C-DK2 is one of the latest dev boards by ST Microelectronics. It features a STM32MP1 SoC with two ARM A7 cores and one M4 co-processor core. The intended way of using this SoC according to STM is to run their custom, pre-built Linux distribution, toolchain, SDK and proprietary flash tools. Their <a href="https://www.st.com/content/st_com/en/wiki/wiki-portal.html">wiki</a> does a great job at not telling you a lot of important details because STM's linux distro "takes care of that".
This post however deals with all the dirty details about how the entire boot process works and how to bring the DK2 board into a state where a custom kernel can be loaded.</p>
<h1>Power on</h1>
<p>When the board gets plugged in, the first thing happening is the STPMIC1 power management IC initializing it's output voltages to the default settings. According to its <a href="https://www.st.com/resource/en/datasheet/stpmic1.pdf">datasheet</a> and the <a href="https://www.st.com/content/ccc/resource/technical/layouts_and_diagrams/schematic_pack/group0/36/8e/ea/7a/ca/ca/4b/e4/mb1272-dk2-c01_schematic/files/MB1272-DK2-C01_Schematic.pdf/jcr:content/translations/en.MB1272-DK2-C01_Schematic.pdf">board schematics</a>, this means the core gets powered with 1.2V, the DDR RAM with 1.1V and the rest of the SoC peripherals with 3.3V.</p>
<p>Once the voltage has stabilized, the bootROM starts running and determines where to boot from by reading either the BOOT0 and BOOT1 pin or a value burnt into the OTP efuses. Possible boot sources are NAND and NOR flash, eMMC, Serial and SD cards. The DK2 uses the SD card by default as no other sources can be found on the board.</p>
<h1>SD card content</h1>
<h2>GPT</h2>
<p>The bootROM now tries to boot from the SD card. The SD card must contain a GPT containing two partitions named <code>fsbl1</code> and <code>fsbl2</code>. These are two, hopefully identical, copies of the First Stage bootloader the bootROM will later execute.</p>
<h2>FSBL</h2>
<p>For the bootROM to recognize and load the fsbl (or in fact any binary), a special format is used. It is described as the <a href="https://wiki.st.com/stm32mpu/wiki/STM32_header_for_binary_files">STM32 header for binary files</a> and consists of a 256 bytes long header followed by the binary data to load.</p>
<p><img src="https://wiki.st.com/stm32mpu/nsfr_img_auth.php/4/4c/STM32_header.png" alt="STM32 header" /></p>
<p>The following code is a python script which generates a correct header for any given fsbl binary:</p>
<pre><code class="language-python">header = struct.pack("<4sQQQQQQQQIIIIIIIIIIQQQQQQQQ83xb",
b"STM\x32", # Header magic
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, # ECDSA signature, unsigned here
sum(payload), # Checksum of payload, sum of all bytes
0x00010000, # Header version 1.0
len(payload), # Length of payload
0x2FFC0000 + 0x2400 + 0x100, # Entrypoint address. SYSRAM + 0x2400 (BROM data) + 0x100 (header)
0x00, # Reserved
0x2FFC0000 + 0x2400 + 0x100, # Load address of image, unused
0x00, # Reserved
0x00, # Image version
0x01, # Option flags, disable signature verification
0x01, # ECDSA algorithm set to P-256 NIST, unused
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, # ECDSA signature, unsigned here
0x00 # Binary type: U-Boot
)</code></pre>
<p>Some information about the header:</p>
<ul>
<li>Since the SoC is not in production mode currently, the ECDSA signature is optional and not used here (bit 0 in options flag set)</li>
<li>The checksum is calculated by summing up all bytes in the payload mod <code>0xFFFF'FFFF</code> to get a 32 bit value</li>
<li>The binary always gets loaded to address <code>0x2FFC'2400</code>, no matter what load address was specified in the header.
<ul>
<li>The STM32MP1's so called <code>SYSRAM</code> starts at address <code>0x2FFC'0000</code></li>
<li>The bootROM will use the first <code>0x2400</code> bytes of the <code>SYSRAM</code> for its own data segment. Besides the bootROM's execution data, it also stores multiple structs in there containing various boot information such as boot device, retries, etc.</li>
</ul></li>
<li>The entry point value is the 32 bit address the bootROM will jump to if validation succeeded.
<ul>
<li>For this binary, execution should start at the beginning of the <code>.text</code> segment which is located directly after the header. Therefore the entry point is at <code>0x2FFC0000 + 0x2400 + 0x100</code> (<code>addressof(SYSRAM) + sizeof(.data_bootROM) + sizeof(STM32Header)</code>)</li>
</ul></li>
</ul>
<p>Once the image was verified, copied and the bootROM jumped to the start of the fsbl's .text section, the real fun starts.
Important to note is that the bootROM does not have a proper ELF loader or anything. The binary is simply <code>memcpy</code>'d into <code>SYSRAM</code>. Therefore things like <code>.bss</code> segments will not be expanded automatically. These sections need to be expanded statically beforehand to work properly.</p>
<p>The only setup to do now is to update the <code>SP</code> to point to <code>0x3000'0000</code>, the end of <code>SYSRAM</code>.</p>
<h2>Update 9. August 2020</h2>
<h3>Debugging with OpenOCD</h3>
<p>For the longest time of this project my testing cycle looked as follows:</p>
<pre><code> - Writing code
- Compiling source code with make
- Generating a MBR SD card image with genimage
- Downloading that image from my build server
- Flashing it to an SD card using balenaEtcher
- Inserting the SD card into the board
- Resetting the board
- Hoping that one of the two LEDs used for debugging would light up</code></pre>
<p>This process usually took between 2 and 3 minutes and the only way to debug things was extracting two bits of information at the time using the two on-board LEDs. This time is luckily over thanks to the amazing <a href="https://github.com/ntfreak/openocd">OpenOCD</a> project. OpenOCD is an open source embedded debugging software providing the ability to interface with many different debugging interfaces such as JTAG and ST-Link. Additionally it integrates a gdb-server which can be used control, program and debug the connected SoC though GDB. This ultimately allows for one click building, flashing and debugging right within VSCode:</p>
<p><img src="/content/assets/mp1os/vscode_debugging.png" alt="VSCode GDB debugging" /></p>
<p>Getting OpenOCD to run properly is really easy thanks to a bunch of pre-made scripts. </p>
<pre><code class="language-sh"> $ openocd -f $OPENOCD_SCRIPTS/board/stm32mp15x_dk2.cfg -c "gdb_flash_program enable"</code></pre>
<p>The first part is the debug config for the dk2 board. When using a custom board with an MP1 on board, <code>-f $OPENOCD_SCRIPTS/target/stm32mp15x.cfg -f $OPENOCD_SCRIPTS/interface/stlink-dap.cfg</code> can be used instead. At the time of writing this, they are not in the latest release and require OpenOCD to be built from sources (or using the AUR package <code>openocd-git</code>). <code>-c "gdb_flash_program enable"</code> is necessary for gdb to be able to flash the executable to the board.</p>
<p>Using the <code>Native Debug</code> extension for VSCode, the following task and launch config can be used to directly load the fsbl ELF executable into SYSRAM, executing it and start debugging:</p>
<pre><code class="language-json">// launch.json
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug",
"type": "gdb",
"request": "launch",
"cwd": "${workspaceRoot}",
"target": "${workspaceRoot}/fsbl/build/fsbl.elf",
"gdbpath" : "/bin/arm-none-eabi-gdb",
"preLaunchTask": "build", // Before debugging, build changes
"autorun": [
"target remote tcp:localhost:3333", // Connect to OpenOCD's gdb-server
"load ./fsbl/build/fsbl.elf", // Flash fsbl.elf to target
"file ./fsbl/build/fsbl.elf", // Load symbols from fsbl.elf
"b main", // Set a breakpoint at the start of main()
"j _start" // Jump to _start, the beginning of the crt0
]
}
]
}</code></pre>
<pre><code class="language-json">// tasks.json
{
"version": "2.0.0",
"tasks" : [
{
"label": "build",
"type": "shell", // Run a command
"command": "make", // Run make
"problemMatcher": []
}
]
}</code></pre>
<p>All of this allows for instant loading of your changes without having to go through the SD card at all. Additionally variables can be inspected, breakpoints set, code stepped, the strack trace inspected and more. It cuts the dev cycle from 2 to 3 minutes down to 20 seconds with tons of cool extra features.</p>
<h3>LEDs - Initializing GPIOs</h3>
<p>The easiest way to indicate a sign of life is always to light up an LED. The STM32MP157C-DK2 conveniently has two user LEDs on board, a blue one and an orange one. Checking the board schematics shows, they are connected to GPIO Pin D11 and H7 respectively.</p>
<p><img src="/content/assets/mp1os/schematic_leds.png" alt="LEDs" />
<img src="/content/assets/mp1os/schematic_led_y.png" alt="Orange LED" />
<img src="/content/assets/mp1os/schematic_led_b.png" alt="Blue LED" /></p>
<p>To set up the GPIO pins now, the following steps need to be done.</p>
<p>First, the GPIO clock needs to be enabled. This is done by setting the respective bit in the right <code>RCC_MP_AXXXENSETR</code> register. The reference manual shows, RCC_MP_AHB4ENSETR contains all the bits for the GPIOA to GPIOK clocks. </p>
<p>Next, the GPIO pins need to be configured. The relevant registers for this are GPIOX_MODER, GPIOX_OTYPER, GPIOX_OSPEEDR and GPIOX_PUPDR. </p>
<p><strong>GPIOX_MODER</strong> contains the mode of all pins in this port. Since the pin needs to be able to drive an LED, this should be set to Output mode.</p>
<p><strong>GPIOX_OTYPER</strong> contains the output type of that pin. It may be either Push-Pull or Open-Drain. Looking at the schematics once again, the LED's annode is connected to the GPIO pin and the cathode directly to ground.
This means to light the LED up, a current needs to flow out of the GPIO pin, through the LED into ground. This can only be achieved using the Push-Pull configuration.</p>
<p><strong>GPIOX_OSPEEDR</strong> is the speed at which the pin needs to be able to respond. Higher values here will cause a higher current flow and possible reflections if the line is not properly matched. For an LED though, this all doesn't really matter so it can be safely set to Low or Medium speed. </p>
<p>Finally <strong>GPIOX_PUPDR</strong> defines whether a pullup, pulldown or no resistor at all should be used. In Push-Pull mode, this is generally unwanted and should be set to no pullup.</p>
<p>This finishes up the configuration of the GPIO pin. To make the LEDs light up now, the bit in the <strong>GPIOX_ODR</strong> register corresponding to that pin needs to be set to pull the pin to VDD voltage. And that's it! </p>
<h1>What now?</h1>
<p>Code is being executed but it's running in a very limited environment right now.
To get a kernel up and running, the things still necessary are the following:</p>
<ul>
<li>Write an I2C driver to communicate with the STPMIC1 power management IC and increase the voltage delivered to the onboard DDR3 RAM.</li>
<li>Write a RAM interface driver to initialize and map the DDR3 RAM into memory.</li>
<li>Write an SDMMC interface driver and mount the SD card again</li>
<li>Use for example FATFs to load a SSBL from a FAT32 partition on the SD card.</li>
<li>Write a proper ELF loader to load and map the SSBL into the DDR RAM.</li>
<li>Write a SSBL that loads a kernel image from the SD card.</li>
<li>Write a kernel</li>
</ul>
<p>This blog post and list will be updated as I go along and finish more of the boot chain.
The current progress can be found here: <a href="https://github.com/WerWolv/STM32MP1OS">https://github.com/WerWolv/STM32MP1OS</a></p>
<p>Update 20.08.2021: One of the professors of the University I'm on, emailed me and a friend about projects next year and asked if we wanted to work on a project including the STM32MP1 to possibly use it in the future to teach new students about low-level C, Embedded Linux and Asynchronous multiprocessing. This not only means we don't have to worry about getting a good project next year, it also means I can continue working on the same board during my Bachelor thesis which is absolutely amazing :D</p>
<p>Update 22.01.2021: The project we ended up doing using this board was a demo application showcasing a hardware accelerated GUI running on the A7 Cores under Linux controlling a real-time, bare metal firmware running on the M4 coprocessor. The coprocessor is running an RTOS which uses a PID controller to regulate the height of a ball inside a tube using a ToF sensor to measure the ball's height and a fan to blow the ball up.
Our professor just told us that we're getting an A for the work we've done using the STM32MP157C-DK2 and has high hopes that we're doing as well during the bachelor thesis. During the bachelor thesis we'll be developing a modular development board for students to replace the current (old and incredibly large and heavy) development boards for Embedded systems, Linux and Android classes as well as FPGA/SoC design in VHDL class. This is going to be amazing :)</p>
<p><br>
<br></p>
<hr>
<p><br>
<br></p>
<h1>Reverse Engineering Notes</h1>
<p>These are the notes I took while reading the wiki, reverse engineering u-boot, st's drivers and the DK2's schematic. Everything in all it's unfinished and messy glory. There is some more information here about u-boot and how u-boot finds and loads the linux kernel, however this has not much to do with the bare metal OS mentioned above. It is left here in the hope that it helps people understand the thought processes I went though when looking at the official sd card image, u-boot, the reference manual and the schematics.</p>
<h2>Boot process</h2>
<ul>
<li>
<p>MBR</p>
<ul>
<li>First (actually fourth) partition marked as active / bootable, this is the rootfs</li>
<li>CHS address : <code>0x001E'0D00</code></li>
<li>H Head: 30</li>
<li>S Sector: 13</li>
<li>C Cylinder: 0</li>
<li>LBA = (C <em> HPC + H) </em> SPT + (S - 1) = (0 <em> 256 + 30) </em> 63 + (13 - 1) = 30 * 63 + 12 = 1902
<ul>
<li>HPC: Heads per Cylinder = 256</li>
<li>SPT: Sectors per Track = 63</li>
<li>Block / Sector Size = 512</li>
</ul></li>
<li>Address of first partition: 1902 * 512 = <code>0x000E'DC00</code></li>
</ul>
</li>
<li>
<p>GPT</p>
<ul>
<li>First EFI Partition Entry called <code>fsbl1</code> = <code>"First Stage Bootloader Copy 1</code></li>
<li>FirstLBA: <code>0x22</code> -> 0x22 * 512 = 0x4400 -> Start of first stage bootloader with custom STM32 header</li>
<li>Second EFI Partition Entry called <code>fsbl2</code> = <code>"First Stage Bootloader Copy 2"</code></li>
<li>FirstLBA: <code>0xB7</code> -> 0xB7 * 512 = 0x16E00 -> Start of safety copy of the first stage bootloader</li>
</ul>
</li>
<li>
<p>STM32 FSBL (SPL)</p>
<ul>
<li>Loaded into SYSRAM at address <code>0x2FFC'2400</code></li>
<li>Header specifies entry point at <code>0x2FFC'2400</code> -> <code>SYSRAM + 0x100</code></li>
<li>STM32 header is exactly <code>0x100</code> bytes long, so execution starts off at the beginning of the binary right after the header</li>
<li>Header specifies U-boot FSBL</li>
<li>STM32MP157C-DK2 <a href="https://www.st.com/content/ccc/resource/technical/layouts_and_diagrams/schematic_pack/group0/36/8e/ea/7a/ca/ca/4b/e4/mb1272-dk2-c01_schematic/files/MB1272-DK2-C01_Schematic.pdf/jcr:content/translations/en.MB1272-DK2-C01_Schematic.pdf">Schematics</a></li>
<li>Enables BUCK3 of the STPMIC1 Power Management IC over a <a href="https://github.com/u-boot/u-boot/blob/9a8942b53d57149754e0dfc975e0d92d1afd4087/drivers/power/pmic/stpmic1.c#L118">I2C u-boot driver</a></li>
<li>This down steps the 5V from the USB-C socket and applies it to the VDD net
<ul>
<li>Applies VDD to PDR_ON and PDR_ON_CORE (Power On Reset Enable)</li>
<li>Powers the XTAL oscillator</li>
<li>Powers up the STM32MP1's peripheral interfaces</li>
<li>Brings up the DDR RAM</li>
</ul></li>
<li>Chooses U-boot's MMC loader to load the next stage of U-Boot into the DDR RAM at <code>0xC020'0000</code> <a href="https://github.com/u-boot/u-boot/blob/5f09f9af3cc335fe6a74c031cfa0b1d8bdf4b9db/include/configs/stm32mp1.h#L52-L57">ref</a></li>
<li><a href="https://github.com/u-boot/u-boot/blob/5f09f9af3cc335fe6a74c031cfa0b1d8bdf4b9db/include/configs/stm32mp1.h#L115">Boot target device</a></li>
</ul>
</li>
<li>
<p>SSBL (U-Boot)</p>
<ul>
<li><a href="https://github.com/u-boot/u-boot/blob/c2279d784e35fa25ee3a9fa28a74a1ba545f8c1e/board/st/stm32mp1/board.c#L43">Voltage Regulators</a></li>
<li>Sets the VDD_DDR voltage to 1.35V for DDR3 RAM, 1.25V for LPDDR RAM in 32 bit mode or 1.2V for LPDDR in 16 bit mode (BUCK2)</li>
<li>Sets VTT_DDR voltage to 1.8V (LDO3)</li>
<li><a href="https://github.com/u-boot/u-boot/blob/c2279d784e35fa25ee3a9fa28a74a1ba545f8c1e/board/st/stm32mp1/board.c#L16">Enable UART</a></li>
<li>Only for debug mode</li>
<li>Enables GPIOG clocks (UART4_TX on PG11)</li>
<li>Checks the device specified <a href="https://github.com/u-boot/u-boot/blob/master/configs/stm32mp15_basic_defconfig#L60">here</a> with <a href="https://github.com/u-boot/u-boot/blob/master/configs/stm32mp15_basic_defconfig#L61">id</a> for a suitable partition</li>
<li>They specify <code>auto</code> as the partition so it looks for the first bootable partition and if there's none, it falls back to the first valid partiton. (<a href="https://github.com/u-boot/u-boot/blob/c2279d784e35fa25ee3a9fa28a74a1ba545f8c1e/disk/part.c#L583-L593">reference</a>)</li>
<li>Loads configuration from <code>rootfs:/boot/extlinux/extlinux.conf</code> found <a href="https://github.com/buildroot/buildroot/blob/master/board/stmicroelectronics/stm32mp157c-dk2/overlay/boot/extlinux/extlinux.conf">here</a>
<ul>
<li>Added <a href="https://github.com/u-boot/u-boot/blob/5f09f9af3cc335fe6a74c031cfa0b1d8bdf4b9db/include/configs/stm32mp1.h#L162">here</a></li>
<li>Specified <a href="https://github.com/u-boot/u-boot/blob/82679624f9aa6d1be733c46f3555d5166b6f5b72/include/config_distro_bootcmd.h#L425">here</a></li>
<li>It parses this file to figure out where the zImage and dtb file is located</li>
</ul></li>
<li>Loads the Kernel from <code>rootfs:/boot/zImage</code>
<ul>
<li>Loaded to <code>0xC200'0000</code> (Into DDR RAM going from <code>0xC000'0000</code> to <code>0xDFFF'FFFF</code>)</li>
<li>Configuration found <a href="https://github.com/u-boot/u-boot/blob/5f09f9af3cc335fe6a74c031cfa0b1d8bdf4b9db/include/configs/stm32mp1.h#L149">here</a></li>
</ul></li>
<li>Loads the Device Tree Blob from <code>rootfs:/boot/stm32mp157c-dk2.dtb</code></li>
<li>Jumps to Kernel</li>
<li>Cleanup</li>
<li>Switch to EL2</li>
<li><a href="https://github.com/u-boot/u-boot/blob/c2279d784e35fa25ee3a9fa28a74a1ba545f8c1e/arch/arm/lib/spl.c#L54">reference</a></li>
</ul>
</li>
<li>
<p><a href="https://github.com/STMicroelectronics/arm-trusted-firmware/tree/v2.2-stm32mp/drivers/st">Drivers</a></p>
</li>
</ul>
<h2>Serial Console boot Dump</h2>
<pre><code>U-Boot 2020.04 (Jul 03 2020 - 20:11:15 +0200)
CPU: STM32MP157CAC Rev.B
Model: STMicroelectronics STM32MP157C-DK2 Discovery Board
Board: stm32mp1 in basic mode (st,stm32mp157c-dk2)
Board: MB1272 Var2 Rev.C-01
DRAM: 512 MiB
Clocks:
- MPU : 650 MHz
- MCU : 208.878 MHz
- AXI : 266.500 MHz
- PER : 24 MHz
- DDR : 533 MHz
NAND: 0 MiB
MMC: STM32 SDMMC2: 0
Loading Environment from EXT4... OK
In: serial
Out: serial
Err: serial
****************************************************
* WARNING 1.5mA power supply detected *
* Current too low, use a 3A power supply! *
****************************************************
Net: eth0: ethernet@5800a000
Hit any key to stop autoboot: 0
Boot over mmc0!
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:4...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
131 bytes read in 22 ms (4.9 KiB/s)
1: stm32mp157c-dk2-buildroot
Retrieving file: /boot/zImage
4171640 bytes read in 202 ms (19.7 MiB/s)
append: root=/dev/mmcblk0p4 rootwait
Retrieving file: /boot/stm32mp157c-dk2.dtb
49532 bytes read in 24 ms (2 MiB/s)
## Flattened Device Tree blob at c4000000
Booting using the fdt blob at 0xc4000000
Loading Device Tree to cfff0000, end cffff17b ... OK
Starting kernel ...
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 5.7.1 (werwolv@werwolv-vmwarevirtualplatform) (gcc version 10.1.0 (Buildroot 2020.08-git-00490-gf50086e59f), GNU ld (GNU Binutils) 2.33.1) #1 SMP PREEMPT Fri Jul 3 18:44:10 CEST 2020
[ 0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d
[ 0.000000] CPU: div instructions available: patching division code
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[ 0.000000] OF: fdt: Machine model: STMicroelectronics STM32MP157C-DK2 Discovery Board
[ 0.000000] Memory policy: Data cache writealloc
[ 0.000000] Reserved memory: created DMA memory pool at 0x10000000, size 0 MiB
[ 0.000000] OF: reserved mem: initialized node mcuram2@10000000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created DMA memory pool at 0x10040000, size 0 MiB
[ 0.000000] OF: reserved mem: initialized node vdev0vring0@10040000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created DMA memory pool at 0x10041000, size 0 MiB
[ 0.000000] OF: reserved mem: initialized node vdev0vring1@10041000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created DMA memory pool at 0x10042000, size 0 MiB
[ 0.000000] OF: reserved mem: initialized node vdev0buffer@10042000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created DMA memory pool at 0x30000000, size 0 MiB
[ 0.000000] OF: reserved mem: initialized node mcuram@30000000, compatible id shared-dma-pool
[ 0.000000] Reserved memory: created DMA memory pool at 0x38000000, size 0 MiB
[ 0.000000] OF: reserved mem: initialized node retram@38000000, compatible id shared-dma-pool
[ 0.000000] cma: Reserved 128 MiB at 0xd8000000
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.0
[ 0.000000] percpu: Embedded 15 pages/cpu s30028 r8192 d23220 u61440
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 113664
[ 0.000000] Kernel command line: root=/dev/mmcblk0p4 rootwait
[ 0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] Memory: 313508K/458752K available (6144K kernel code, 188K rwdata, 1540K rodata, 1024K init, 171K bss, 14172K reserved, 131072K cma-reserved, 0K highmem)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.000000] rcu: Preemptible hierarchical RCU implementation.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
[ 0.000000] random: get_random_bytes called from start_kernel+0x320/0x4b0 with crng_init=0
[ 0.000000] arch_timer: cp15 timer(s) running at 24.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x588fe9dc0, max_idle_ns: 440795202592 ns
[ 0.000008] sched_clock: 56 bits at 24MHz, resolution 41ns, wraps every 4398046511097ns
[ 0.000024] Switching to timer-based delay loop, resolution 41ns
[ 0.000806] Console: colour dummy device 80x30
[ 0.001850] printk: console [tty0] enabled
[ 0.001905] Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000)
[ 0.001954] pid_max: default: 32768 minimum: 301
[ 0.002157] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.002202] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.003023] CPU: Testing write buffer coherency: ok
[ 0.003373] CPU0: update cpu_capacity 1024
[ 0.003412] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[ 0.004120] Setting up static identity map for 0xc0100000 - 0xc0100060
[ 0.004303] rcu: Hierarchical SRCU implementation.
[ 0.004743] smp: Bringing up secondary CPUs ...
[ 0.005409] CPU1: update cpu_capacity 1024
[ 0.005420] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[ 0.005587] smp: Brought up 1 node, 2 CPUs
[ 0.005663] SMP: Total of 2 processors activated (96.00 BogoMIPS).
[ 0.005689] CPU: All CPU(s) started in SVC mode.
[ 0.006297] devtmpfs: initialized
[ 0.022241] VFP support v0.3: implementor 41 architecture 2 part 30 variant 7 rev 5
[ 0.022755] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[ 0.022825] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[ 0.028600] pinctrl core: initialized pinctrl subsystem
[ 0.029641] NET: Registered protocol family 16
[ 0.032494] DMA: preallocated 256 KiB pool for atomic coherent allocations
[ 0.039443] /soc/interrupt-controller@5000d000: bank0
[ 0.039494] /soc/interrupt-controller@5000d000: bank1
[ 0.039524] /soc/interrupt-controller@5000d000: bank2
[ 0.043618] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOA bank added
[ 0.044033] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOB bank added
[ 0.044387] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOC bank added
[ 0.044727] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOD bank added
[ 0.045067] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOE bank added
[ 0.045391] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOF bank added
[ 0.045727] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOG bank added
[ 0.046052] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOH bank added
[ 0.046403] stm32mp157-pinctrl soc:pin-controller@50002000: GPIOI bank added
[ 0.046513] stm32mp157-pinctrl soc:pin-controller@50002000: Pinctrl STM32 initialized
[ 0.047280] stm32mp157-pinctrl soc:pin-controller-z@54004000: GPIOZ bank added
[ 0.047335] stm32mp157-pinctrl soc:pin-controller-z@54004000: Pinctrl STM32 initialized
[ 0.058799] usbcore: registered new interface driver usbfs
[ 0.058906] usbcore: registered new interface driver hub
[ 0.059026] usbcore: registered new device driver usb
[ 0.059281] pps_core: LinuxPPS API ver. 1 registered
[ 0.059310] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <[email protected]>
[ 0.059368] PTP clock support registered
[ 0.059683] Advanced Linux Sound Architecture Driver Initialized.
[ 0.060887] clocksource: Switched to clocksource arch_sys_counter
[ 0.071787] NET: Registered protocol family 2
[ 0.072549] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear)
[ 0.072626] TCP established hash table entries: 4096 (order: 2, 16384 bytes, linear)
[ 0.072718] TCP bind hash table entries: 4096 (order: 3, 32768 bytes, linear)
[ 0.072833] TCP: Hash tables configured (established 4096 bind 4096)
[ 0.072991] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
[ 0.073057] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
[ 0.073306] NET: Registered protocol family 1
[ 0.074665] workingset: timestamp_bits=30 max_order=17 bucket_order=0
[ 0.083817] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248)
[ 0.083875] io scheduler mq-deadline registered
[ 0.083900] io scheduler kyber registered
[ 0.153081] STM32 USART driver initialized
[ 0.153532] stm32-usart 40010000.serial: IRQ index 1 not found
[ 0.153638] 40010000.serial: ttySTM0 at MMIO 0x40010000 (irq = 21, base_baud = 4000000) is a stm32-usart
[ 0.853240] printk: console [ttySTM0] enabled
[ 0.858169] stm32-usart 40010000.serial: rx dma alloc failed
[ 0.863329] stm32-usart 40010000.serial: interrupt mode used for rx (no dma)
[ 0.870383] stm32-usart 40010000.serial: tx dma alloc failed
[ 0.876091] stm32-usart 40010000.serial: interrupt mode used for tx (no dma)
[ 0.906704] random: fast init done
[ 0.908119] brd: module loaded
[ 0.912034] random: crng init done
[ 0.922703] loop: module loaded
[ 0.925951] libphy: Fixed MDIO Bus: probed
[ 0.928672] CAN device driver interface
[ 0.933588] stm32-dwmac 5800a000.ethernet: IRQ eth_wake_irq not found
[ 0.938970] stm32-dwmac 5800a000.ethernet: IRQ eth_lpi not found
[ 0.945195] stm32-dwmac 5800a000.ethernet: PTP uses main clock
[ 0.950903] stm32-dwmac 5800a000.ethernet: no reset control found
[ 0.960987] stm32-dwmac 5800a000.ethernet: User ID: 0x40, Synopsys ID: 0x42
[ 0.966577] stm32-dwmac 5800a000.ethernet: DWMAC4/5
[ 0.971617] stm32-dwmac 5800a000.ethernet: DMA HW capability register supported
[ 0.978910] stm32-dwmac 5800a000.ethernet: RX Checksum Offload Engine supported
[ 0.986294] stm32-dwmac 5800a000.ethernet: TX Checksum insertion supported
[ 0.993201] stm32-dwmac 5800a000.ethernet: Wake-Up On Lan supported
[ 0.999541] stm32-dwmac 5800a000.ethernet: TSO supported
[ 1.004858] stm32-dwmac 5800a000.ethernet: Enable RX Mitigation via HW Watchdog Timer
[ 1.012753] stm32-dwmac 5800a000.ethernet: Enabled Flow TC (entries=2)
[ 1.019296] stm32-dwmac 5800a000.ethernet: TSO feature enabled
[ 1.025188] stm32-dwmac 5800a000.ethernet: Using 32 bits DMA width
[ 1.032121] libphy: stmmac: probed
[ 1.037644] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 1.042821] ehci-platform: EHCI generic platform driver
[ 1.048394] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 1.054322] ohci-platform: OHCI generic platform driver
[ 1.061849] stm32_rtc 5c004000.rtc: IRQ index 1 not found
[ 1.065839] stm32_rtc 5c004000.rtc: alarm can't wake up the system: -6
[ 1.073068] stm32_rtc 5c004000.rtc: registered as rtc0
[ 1.077625] stm32_rtc 5c004000.rtc: setting system clock to 2000-01-01T00:00:22 UTC (946684822)
[ 1.086648] stm32_rtc 5c004000.rtc: Date/Time must be initialized
[ 1.092543] stm32_rtc 5c004000.rtc: registered rev:1.2
[ 1.097792] i2c /dev entries driver
[ 1.121614] stm32f7-i2c 40012000.i2c: can't use DMA
[ 1.129085] i2c i2c-0: Added multiplexed i2c bus 1
[ 1.133191] edt_ft5x06 0-0038: supply vcc not found, using dummy regulator
[ 1.145478] input: generic ft5x06 (11) as /devices/platform/soc/40012000.i2c/i2c-0/0-0038/input/input0
[ 1.153880] stm32f7-i2c 40012000.i2c: STM32F7 I2C-0 bus adapter
[ 1.182290] stm32f7-i2c 5c002000.i2c: can't use DMA
[ 1.186728] stpmic1 2-0033: PMIC Chip Version: 0x10
[ 1.192618] BUCK1: supplied by regulator-dummy
[ 1.198691] BUCK2: supplied by regulator-dummy
[ 1.204581] BUCK3: supplied by regulator-dummy
[ 1.210553] BUCK4: supplied by regulator-dummy
[ 1.216397] LDO1: supplied by v3v3
[ 1.221977] LDO2: supplied by regulator-dummy
[ 1.227973] LDO3: supplied by vdd_ddr
[ 1.233149] LDO4: supplied by regulator-dummy
[ 1.236660] LDO5: supplied by regulator-dummy
[ 1.243498] LDO6: supplied by v3v3
[ 1.248521] VREF_DDR: supplied by regulator-dummy
[ 1.254543] BOOST: supplied by regulator-dummy
[ 1.258117] VBUS_OTG: supplied by bst_out
[ 1.262218] SW_OUT: supplied by bst_out
[ 1.267892] input: pmic_onkey as /devices/platform/soc/5c002000.i2c/i2c-2/2-0033/5c002000.i2c:stpmic@33:onkey/input/input1
[ 1.278222] stm32f7-i2c 5c002000.i2c: STM32F7 I2C-2 bus adapter
[ 1.286594] mmci-pl18x 58005000.sdmmc: Got CD GPIO
[ 1.290542] mmci-pl18x 58005000.sdmmc: mmc0: PL180 manf 53 rev1 at 0x58005000 irq 46,0 (pio)
[ 1.325558] sdhci: Secure Digital Host Controller Interface driver
[ 1.330375] sdhci: Copyright(c) Pierre Ossman
[ 1.335820] Synopsys Designware Multimedia Card Interface Driver
[ 1.341056] sdhci-pltfm: SDHCI platform and OF driver helper
[ 1.348863] usbcore: registered new interface driver usbhid
[ 1.353093] usbhid: USB HID core driver
[ 1.357656] stm32-ipcc 4c001000.mailbox: ipcc rev:1.0 enabled, 6 chans, proc 0
[ 1.364638] OF: Can't handle multiple dma-ranges with different offsets on node(/ahb)
[ 1.373179] OF: Can't handle multiple dma-ranges with different offsets on node(/ahb)
[ 1.380360] stm32-rproc 10000000.m4: wdg irq registered
[ 1.385339] stm32-rproc 10000000.m4: failed to get pdds
[ 1.390643] remoteproc remoteproc0: m4 is available
[ 1.398481] NET: Registered protocol family 10
[ 1.402931] Segment Routing with IPv6
[ 1.405275] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[ 1.412137] NET: Registered protocol family 17
[ 1.415651] can: controller area network core (rev 20170425 abi 9)
[ 1.422004] NET: Registered protocol family 29
[ 1.426323] can: raw protocol (rev 20170425)
[ 1.430606] can: broadcast manager protocol (rev 20170425 t)
[ 1.436392] can: netlink gateway (rev 20190810) max_hops=1
[ 1.442153] ThumbEE CPU extension supported.
[ 1.446136] Registering SWP/SWPB emulation handler
[ 1.452705] stm32-dma 48000000.dma-controller: STM32 DMA driver registered
[ 1.459654] stm32-dma 48001000.dma-controller: STM32 DMA driver registered
[ 1.466288] mmc0: new high speed SDHC card at address 0007
[ 1.468382] stm32-mdma 58000000.dma-controller: STM32 MDMA driver registered
[ 1.478263] mmcblk0: mmc0:0007 SD8GB 7.42 GiB
[ 1.478435] reg11: supplied by vdd
[ 1.485844] reg18: supplied by vdd
[ 1.489331] stm32-usbphyc 5a006000.usbphyc: registered rev:1.0
[ 1.500002] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 1.506266] [drm] Initialized stm 1.0.0 20170330 for 5a001000.display-controller on minor 0
[ 1.514711] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 1.521174] GPT:247694 != 15564799
[ 1.524492] GPT:Alternate GPT header not at the end of the disk.
[ 1.530542] GPT:247694 != 15564799
[ 1.533981] GPT: Use GNU Parted to correct GPT errors.
[ 1.539199] mmcblk0: p1 p2 p3 p4
[ 1.993335] Console: switching to colour frame buffer device 60x50
[ 2.018194] stm32-display 5a001000.display-controller: fb0: stmdrmfb frame buffer device
[ 2.026848] dwc2 49000000.usb-otg: supply vusb_d not found, using dummy regulator
[ 2.034054] dwc2 49000000.usb-otg: supply vusb_a not found, using dummy regulator
[ 2.042060] dwc2 49000000.usb-otg: Configuration mismatch. dr_mode forced to host
[ 2.055120] usb33: supplied by vdd_usb
[ 2.058079] dwc2 49000000.usb-otg: DWC OTG Controller
[ 2.062719] dwc2 49000000.usb-otg: new USB bus registered, assigned bus number 1
[ 2.070208] dwc2 49000000.usb-otg: irq 42, io mem 0x49000000
[ 2.076920] hub 1-0:1.0: USB hub found
[ 2.079557] hub 1-0:1.0: 1 port detected
[ 2.084452] ehci-platform 5800d000.usbh-ehci: EHCI Host Controller
[ 2.089839] ehci-platform 5800d000.usbh-ehci: new USB bus registered, assigned bus number 2
[ 2.105114] ehci-platform 5800d000.usbh-ehci: irq 48, io mem 0x5800d000
[ 2.140919] ehci-platform 5800d000.usbh-ehci: USB 2.0 started, EHCI 1.00
[ 2.154036] hub 2-0:1.0: USB hub found
[ 2.159806] hub 2-0:1.0: 2 ports detected
[ 2.167463] ALSA device list:
[ 2.172483] No soundcards found.
[ 2.182411] EXT4-fs (mmcblk0p4): INFO: recovery required on readonly filesystem
[ 2.195322] EXT4-fs (mmcblk0p4): write access will be enabled during recovery
[ 2.496866] EXT4-fs (mmcblk0p4): recovery complete
[ 2.508272] EXT4-fs (mmcblk0p4): mounted filesystem with ordered data mode. Opts: (null)
[ 2.522013] VFS: Mounted root (ext4 filesystem) readonly on device 179:4.
[ 2.534466] usb 2-1: new high-speed USB device number 2 using ehci-platform
[ 2.548092] devtmpfs: mounted
[ 2.555314] Freeing unused kernel memory: 1024K
[ 2.562151] Run /sbin/init as init process
[ 2.691199] EXT4-fs (mmcblk0p4): re-mounted. Opts: (null)
[ 2.742529] hub 2-1:1.0: USB hub found
[ 2.748854] hub 2-1:1.0: 4 ports detected</code></pre>Fri, 24 Jul 2020 00:00:00 +0000
https://werwolv.net/blog/mp1os
https://werwolv.net/blog/mp1osBoot Process of the GCW0, RG350 and similar devices<p><img src="/content/assets/rg350_boot_process/rg350m_booting.jpg" alt="" /></p>
<h2>Introduction</h2>
<p>The GCW0 as well as the RG350 are small handheld retro emulation and homebrew devices running the OpenDingux operating system. Although the RG350 was released 6 years after the GCW0, they both use the exact same Ingenic JZ4770 SoC.
This post focuses on how the RG350's system image is structured, how the JZ4770 loads data from it and how it ultimately jumps to the OpenDingux Linux kernel.</p>
<h2>System Image Layout</h2>
<p>The layout of the system image looks as follows:</p>
<table>
<thead>
<tr>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>MBR</td>
</tr>
<tr>
<td>First Stage Bootloader</td>
</tr>
<tr>
<td>System image (containing the Kernel)</td>
</tr>
<tr>
<td>Data image (containing the rootfs)</td>
</tr>
</tbody>
</table>
<h2>Initial loading</h2>
<p>When the JZ4770 is powered up, it first start executing it's Boot ROM. The Boot ROM reads the boot select pins and determines on the RG350 that it's supposed to boot from an MMC device, here an SD card.</p>
<p>The Boot ROM proceedes by copying the first 0x2000 bytes on the SD card into the SoCs dcache and deinitializing the MMC interface again. To validate if the read data is correct, the Boot ROM skips past the MBR, right to the first stage bootloader, and checks for the magic value <code>MSPL</code>. If valid, the 0x2000 bytes previously loaded into dcache are copied into icache where the Boot ROM jumps into. This is possible due to the unique fact that the JZ4770s cache is mapped into the address space at address <code>0x8000'0000</code>.</p>
<h2>MBR and bootstrapper</h2>
<p>The data found in the icache now is the MBR immediately followed by the first stage bootloader and the Program Counter now points to address <code>0x8000'0000</code>. This is the start of the MBR which looks as follows in memory.</p>
<p><img src="/content/assets/rg350_boot_process/mbr_layout.png" alt="MBR and FSBL" /></p>
<p>The MBR starts with the so called bootstrap code. A 440 bytes section containing code to setup the environment and jump into the actual first stage bootloader. On the GCW0 and the RG350, this is one instruction: <code>80 00 00 10</code> which is the MIPS instruction <code>B PC+#0x204</code> which jumps past the MBR (0x200 bytes) and the <code>MSPL</code> magic (4 bytes). Congratulations, we're now in our bootloader!</p>
<p><img src="/content/assets/rg350_boot_process/MBR_and_FSBL.png" alt="MBR and FSBL" />
In red, the branch instruction is shown, in blue the <code>MSPL</code> magic in little endian.</p>
<h2>Bootloader startup code</h2>
<p>The first thing the bootloader does is disabling all interrupts and clearing their flags. This is important since in the current state, interrupts cannot be handled and would cause the processor to jump to invalid addresses. They will be later on reenabled in the linux kernel. Next the stack is setup by initializing the stack pointer to point to a free area in memory. This is necessary to properly support C function calls and variables and the next step: the jump to <code>main()</code>.</p>
<h2>Bootloader</h2>
<p>Execution now has reached the actual loader code of the bootloader. First, the MBR, which still can be found at address <code>0x8000'0000</code> in the memory mapped dcache, is parsed in order to find the offsets and sizes of the linux kernel and rootfs partitions. The MMC interface is then once again initialized and reads the linux kernel from the SD card into <code>KSEG1</code>. This is a 512MB region in memory which does not allow for caching. This is very important as the currently executed code still lives inside dcache and icache. This means any read or write to a region that supports caching would trash the cache causing the bootloader to be overwritten with garbage data and so corrupting the boot environment.</p>
<p>Once the linux kernel is loaded into memory, the linux configuration struct is setup so Linux later on for example knows where to load the rootfs from.
After this, the bootloader is done and can jump to the address the kernel was loaded at via a function call that passes in the kernel parameter config struct.</p>
<p>The rest of the boot process is now the same as on any other linux system and is described very thoroughly <a href="https://www.embeddedrelated.com/showarticle/59.php">here</a>.</p>
<h2>Thanks to</h2>
<ul>
<li><strong>pcercuei</strong> for answering many of my questions about the system</li>
<li><strong>circuits</strong> for answering many more of my questions and providing a pretty title image for this post</li>
<li><strong>All the nice people on the Retro Gaming Handhelds Discord</strong> for being awesome to talk to</li>
</ul>Fri, 05 Jun 2020 00:00:00 +0000
https://werwolv.net/blog/rg350_boot_process
https://werwolv.net/blog/rg350_boot_processReverse Engineering the Surface Book 2's proprietary IOCTL commands<p><img src="/content/assets/surface_ioctl/surface.jpg" alt="Surface Book 2" /></p>
<h2>Overview</h2>
<p>The Surface Book 2 is one of Microsoft's self made notebooks. What makes it different from other laptops is it's deep integration of the drawing pen into Windows and the ability to detach the entire screen from it's base by pressing a button on the keyboard or using their pre-installed <code>SurfaceDTX</code> tool.</p>
<h2>Communication with hardware devices</h2>
<p>Since Windows any many other operating systems run on a ton of different hardware, it's impossible to bundle support for every device directly into the Kernel, however userspace programs may still want to communicate with hardware installed in the computer. Instead of adding custom system calls for every device ever built, most OSes support loading of kernel extensions at runtime (kernel modules on Linux, device drivers on Windows) together with a unified way to communicate with these extensions, called <code>ioctl</code>.</p>
<p>The reason ioctl and device drivers are necessary in the first place is for security reasons. On startup all hardware devices found on- or connected to the computer's mainboard are mapped into the kernel's address space and have to be controled from there using extensions that live in the kernel's address space as well. The kernel's address space cannot be directly accessed by userspace applications so the kernel may allow access to certain devices through syscalls while denying access to others.</p>
<p>The greatness of <code>ioctl</code> comes from its simplicity. A single syscall is used on windows called <code>NtDeviceIoControlFile</code> with its wrapper function <code>DeviceIoControl</code>. It takes the following arguments:</p>
<ul>
<li>A <code>HANDLE</code> to the device, usually obtained by using the <code>NtCreateFile</code> syscall</li>
<li>A control code describing the operation the device driver should execute. These codes consist of several fields:
<ul>
<li>Device type (cdrom, mouse, network, printer, etc.) values <code>0x8000</code> or greater are vendor specific devices (for example the latch mechanism on the Surface Book 2)</li>
<li>Function code describing the command the device driver should execute. These are arbitrary values defined in the individual drivers. </li>
<li>Transfer type specifying if the data gets buffered or not</li>
<li>Required access for operations, either read-only, write-only or read-write</li>
</ul></li>
<li>A pointer to a in-buffer sent to the device driver</li>
<li>The size of the in-buffer</li>
<li>A pointer to a out-buffer which will be filled with data returned from the driver</li>
<li>The size of the out-buffer</li>
<li>A pointer to a <code>uint32_t</code> where the received data size will be written to</li>
<li>A optional pointer to a overlapped struct for async operations</li>
</ul>
<p>When calling <code>DeviceIoControl</code> a syscall handler in the kernel gets called. That handler uses the passed device handler to find the right device driver to be called. The in-buffer then gets copied from user- into kernel space and the driver's <code>DEVICE_CONTROL</code> callback gets called containing the control code and pointers to the in- and out-buffers. The control code is used to figure out what operation should be executed, the in-buffer is used for parameters and the out-buffer for possible returned values.</p>
<h2>REing the Latch driver</h2>
<p>To find out how the latch driver works, there are two possible approaches. Either we reverse engineer the device driver directly and analyze the <code>DEVICE_CONTROL</code> callback or we use the already implemented latch control tool Microsoft built to find the correct driver and control codes.</p>
<p>I decided to go for the latter since the tool was trivial to find (by simply looking at the task manager) and even better, it was written in C# containing full symbol information. To analyze the .NET application, I used <code>JetBrain dotPeek</code>. Simply looking through the different namespaces in dotPeek quickly made me discover a promising class called <code>DriverLatch.cs</code>. </p>
<p><img src="/content/assets/surface_ioctl/dotPeek_namespaces.png" alt="Namespaces" /></p>
<p>Conveniently, at the very start of the file, the latch interface GUID and all the different control codes were specified.</p>
<pre><code class="language-csharp">private static readonly Guid g_latchInterfaceId = new Guid("f49e75f6-f869-4346-9eb8-ded248275916");
private static readonly IOControlCode g_latchCommandIoctl = new IOControlCode((ushort) 32768, (ushort) 2065, (IOControlAccessMode) 2, (IOControlBufferingMethod) 0);
private static readonly IOControlCode g_latchChangedIoctl = new IOControlCode((ushort) 32768, (ushort) 2066, (IOControlAccessMode) 1, (IOControlBufferingMethod) 0);
private static readonly IOControlCode g_latchStatusIoctl = new IOControlCode((ushort) 32768, (ushort) 2064, (IOControlAccessMode) 1, (IOControlBufferingMethod) 0);
private static readonly IOControlCode g_detachChangedIoctl = new IOControlCode((ushort) 32768, (ushort) 2067, (IOControlAccessMode) 1, (IOControlBufferingMethod) 0);
private static readonly IOControlCode g_detachStateIoctl = new IOControlCode((ushort) 32768, (ushort) 2068, (IOControlAccessMode) 1, (IOControlBufferingMethod) 0);</code></pre>
<p>The interesing one here is only <code>g_latchCommandIoctl</code> though since <code>*ChangedIoctl</code> control codes are callbacks and <code>*StateIoctl</code> control codes are there to query information about the current latch state.</p>
<p>Looking further through the class led to a method conveniently named <code>void OpenLatch(uint cancelAfterMs)</code>. It does exactly what the name implies, it sends a ioctl command through the .NET Windows API opening the latch. It does not return any values but it takes in a struct of data as input buffer:</p>
<pre><code class="language-csharp">private enum LatchCommandType
{
Invalid,
Open,
Close_DEPRECATED,
ButtonPress,
Cancel,
MaximumValue,
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]
private struct LatchCommandInArgs
{
public DriverLatch.LatchCommandType LatchCommand;
public uint TimeoutMs;
}</code></pre>
<p>Again, very conveniently labeled :)</p>
<h2>Device Interface File</h2>
<p>In order to send data to the driver, a handle is required which is returned by the <code>NtCreateFile</code> syscall. The issue is though, how to get the path of it? This, I couldn't figure out either at first. Consulting Microsoft's documentation didn't really help a lot either. The way I came up with is sadly not super great but it did the trick. The path always contains the GUID found previously in the source code. And for the .NET tool to communicate with the driver it needs to have the full path in memory somewhere. So why not use Cheat Engine's string search tool to search for the GUID string in memory and look around a bit to find the rest of the string. Important to note is, since this is a .NET application, all strings are stored in UTF-16. After some fiddling around, this is what turned up:</p>
<p><img src="/content/assets/surface_ioctl/cheatEngine_devicePath.png" alt="Cheat Engine" /></p>
<p>Or in plain text: <code>\\?\ACPI#MSHW0133#2&daba3ff&1#{f49e75f6-f869-4346-9eb8-ded248275916}</code></p>
<h2>Putting it all together</h2>
<p>To finish off, I wanted to write a program in C/C++ which simply unlocks the latch when executed. Having all the information required from the binary, this was rather trivial:</p>
<pre><code class="language-cpp">#include <windows.h>
#include <cstdint>
enum class LatchCommandType : std::uint32_t {
Invalid,
Open,
Close_DEPRECATED,
ButtonPress,
Cancel,
MaximumValue
};
struct LatchCommandInArgs {
LatchCommandType LatchCommand;
std::uint32_t TimeoutMs;
} __attribute__((packed));
// Latch command ioctl control code
const DWORD latchCommandIoctl = CTL_CODE(0x8000, 2065, METHOD_BUFFERED, FILE_WRITE_ACCESS);
int main() {
// Open a handle to the latch device driver
HANDLE ioctlLatchFile = CreateFileW (
L"\\\\?\\ACPI#MSHW0133#2&daba3ff&1#{f49e75f6-f869-4346-9eb8-ded248275916}",
GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE,
nullptr,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
nullptr);
// Specify the device driver arguments sent through the in-buffer
LatchCommandInArgs args = { .LatchCommand = LatchCommandType::ButtonPress, .TimeoutMs = 5000 };
DWORD readSize = 0;
// Make the ioctl call, opening the latch
DeviceIoControl(ioctlLatchFile, latchCommandIoctl, &args, sizeof(LatchCommandInArgs), nullptr, 0, &readSize, nullptr);
return 0;
}</code></pre>
<p>A open source implementation of SurfaceDTX written in C# can be found on my GitHub repository: </p>
<p><a href="https://github.com/WerWolv/SurfaceAlwaysDTX"><img src="https://github-link-card.s3.ap-northeast-1.amazonaws.com/WerWolv/SurfaceAlwaysDTX.png" class="center" width="460px"></a></p>Thu, 30 Jul 2020 00:00:00 +0000
https://werwolv.net/blog/surface_ioctl
https://werwolv.net/blog/surface_ioctlSat, 31 May 2025 03:08:47 +0000
https://werwolv.net/projects
https://werwolv.net/projectsArchwayMon, 01 Jun 2020 00:00:00 +0000
https://werwolv.net/projects/archway
https://werwolv.net/projects/archwayEdiZonSun, 01 Jul 2018 00:00:00 +0000
https://werwolv.net/projects/edizon
https://werwolv.net/projects/edizonImHexSun, 11 Oct 2020 00:00:00 +0000
https://werwolv.net/projects/imhex
https://werwolv.net/projects/imhex