[{"content":"The Const Keyword For the longest time, I simply used the const keyword because it was a blanket command to use it. All tutorials and blog advised to use it. They all gave their explanations. I did not understand most of it. I just knew to use const when passing variables of certain types into a function - a const reference to be precise.\nA few days ago while writing C++ for a little sqlite clone I was writing, it clicked! It just made sense.\nI am asumming some familiarity with the idea of pointers and memory addresses and the reader is curious about how const relates to it.\nIn cpp if you pass a struct or objects into a function, you are basically passing a copy of it entirely to the function stack. You are passing by value. The function now has ownership of it. If that type is large, the stack could run out of memory. It is not as large as the heap.\n1 2 3 4 5 6 7 struct GameState{ float matrix[64]; // 256 bytes char name[128]; // 128 bytes int score[100]; //400 bytes }; // 784 bytes total void process (GameState s) {} // 784 bytes passed in total to the stack!. Imagine this in hot loops environment or copying it over and over in various function calls. Okay, good. The next question is: if I am not pass the entire value into the function stack, then how do I pass things? In cpp it is generally adviced, you pass by reference not value. The refernece is a \u0026ldquo;pointer\u0026rdquo; to a memory address. So instead of passing the entire value, we pass just the address of that varibale in memory. That is far cheaper - 8 bytes.\n1 2 3 4 void process(GameState\u0026amp; s){} // Now you are passing just a reference to the GameState in memory (8 bytes) not the full structure of 784 bytes. // But like every advice, there are instances where you abandon them. // int, float, bool are cheap. You do not generally need to pass them by reference. The next problem is this: If I pass the reference which is just a memory address to the function, does that not mean that it can be mutated or modified by accident? is it not dangerous? is it not the very address holding the variable?\nWell, that is where the beauty of const comes in. Const simply says in as much as you can see the variable in this memory address, you definitely cannot modifiy the value there. You can only read it. You cannot write to it. The notorious cpp compiler enforces this. Const is a promise stating that the value held in the memory address that was passed (\u0026amp;) will not mutated at all. The solves the problem.\nIf you want to modify it, sinply pass the reference to that address. If you do not want to modify it and just want to use it for other functions or read it, then use const.\n1 2 3 4 5 6 7 void process(const GameState\u0026amp; s){ // you get direct access to the memory address // no single copy is made // but the compiler is going to prevent any mutation on s. s.score = 99 ; // this will not be allowed. int newScore = s.score; // this is allowed. You are just reading it. } // Nothing to clean up. You had a reference to it. You never owned it. Just borrowed. Think that is the end ?\nAbsolutley not.\nThe next stage is this: A struct is declared. A function is declared in that struct. This function tries to mutate the value of the struct (which we clearly do not want modified). What will happen ?\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 struct Server{ int port; int connectionCount; // we declare a const function int getPort() const{ port = 99; // this will return a compiler error as a const function is promising not to change the state of the varibale return port // this is allowed. Reading is fine. } // we declare a non const function int increaseConnectionCount() { connectionCount++; // we can mutate it } void run(){ port = 8080; } // This is what it implies: void startServer(const Server \u0026amp;s) { s.run(); // This won\u0026#39;t work. Why? Because the variable being passed is const but run isnt const. It is breaking the promise. s.getPort(); // This allowed. It is upholding the promise. } } Well the logic holds. Since the struct being passed into the function is not to be modified i.e. (const Server\u0026amp; s), then the function being called on it must also promise not to modify it. The function must be const. If you pass a const reference somewhere, every method called through it must also be a const. If a variable has been designed not to change, why should a function called through it change it?.\nThat my friends explains const. Its so simple and logical. The same concept holds in rust - only its the reverse. In rust immutability is the default while in cpp, you opt in for immutablility using the keyword const.\n","permalink":"https://franzego.github.io/posts/const-in-cpp/","summary":"\u003ch1 id=\"the-const-keyword\"\u003eThe Const Keyword\u003c/h1\u003e\n\u003cp\u003eFor the longest time, I simply used the const keyword because it was a blanket command to use it. All tutorials and blog advised to use it. They all gave their explanations. I did not understand most of it. I just knew to use const when passing variables of certain types into a function - a const reference to be precise.\u003c/p\u003e\n\u003cp\u003eA few days ago while writing C++ for a little sqlite clone I was writing, it clicked! It just made sense.\u003c/p\u003e","title":"Const In C++"},{"content":"Building an LSM Tree in Go: Understanding the Moving Parts Log-Structured Merge (LSM) Trees have become ubiquitous in today\u0026rsquo;s database world. The name has become almost synonymous with modern storage engines. LSM trees sit behind databases like Cassandra, CockroachDB, RocksDB, LevelDB, and Pebble.\nOne thing these systems have in common is their use in write-heavy workloads. The goal of an LSM is to ingest writes at a large scale, as quickly as possible. It sacrifices some read speed for this, because nothing is ever free. Even then, some optimizations can make that tradeoff manageable.\nTo truly understand how this works, I decided to build lsmgo - a small LSM Tree prototype written in Go for learning systems programming. The goal wasn\u0026rsquo;t to build the next RocksDB. The goal was to understand the moving parts that make an LSM-style storage engine work.\nIn this post, I want to walk through the foundational building blocks I implemented: batch writes, the WAL (Write Ahead Log), the Memtable, and SSTable flushes. This is a dive into the internals of the data structures that power this technology.\n1. Batch Writes: The Cost of Fsync I consider batch writes to be one of the foundations of the entire system. A learning prototype doesn\u0026rsquo;t strictly need batch writes, but I find them incredibly instructive because they clearly expose the limitation of the \u0026ldquo;one-write\u0026rdquo; approach.\nImagine the naive path for a write: one write = one WAL append + one memtable insert\nThe problem is that each durable WAL append involves an fsync, and fsync is expensive. It forces the OS to flush its write buffers all the way to disk. On a typical NVMe drive, that might be around 100-200 microseconds per fsync. If you do that per key, your throughput ceiling becomes completely limited by the number of fsyncs you can perform, regardless of how fast everything else is.\nBatching changes the math entirely. You take 10, 100, or 1000 key-value pairs, write them all to the WAL in one sequential append, do one fsync, then apply all of them to the memtable. The fsync cost is now amortized across the entire batch.\n2. The Write Path and the WAL The Write Ahead Log (WAL) is the guard against data loss. It exists to ensure durability.\nWhen a batch comes in, the write goes through this order:\nflowchart LR A[Incoming Batch] --\u003e|Append \u0026 Fsync| B[Write Ahead Log] B --\u003e|Apply| C[Active Memtable] C --\u003e|Threshold Reached?| D[Rotate Memtable] Before data goes into the memtable (the primary write location in memory), it first has to be persisted in the WAL. It may feel counter-intuitive to write to a slow disk before writing to fast DRAM. But this disk write is strictly sequential. The file grows at the end, streaming bytes forward. No random jumping around. This makes the write pattern much friendlier to disk.\nIf the server crashes just before adding the data to the memtable, the state can be recovered by replaying the WAL. It\u0026rsquo;s a safety net that prevents the ambiguity of an interrupted write.\n3. The Memtable: Memory vs. Disk Tradeoffs The memtable is what makes writes (and some reads) quick. It\u0026rsquo;s a data structure held in memory that temporarily holds recent writes before flushing them to SSTables.\nIn lsmgo, I used a skip list as the underlying data structure.\nWhy not a map? Maps are strong for insertion and lookup, but they are unordered. When a memtable reaches its threshold and needs to be written to an SSTable on disk, the database needs sorted iteration over all keys. A hash map can\u0026rsquo;t give you that without a full O(n log n) sort at flush time. The skip list maintains sorted order continuously as inserts happen, so flushing is just a linear scan.\nLSMs do not do in-place updates. An update is just a newer write for the same key. A delete is a special tombstone marker, not an immediate removal. Newer writes simply shadow older writes.\nBecause of this shadowing, read operations must respect a strict sequence, starting from the newest data:\nflowchart TD A[Read Request] --\u003e B[Active Memtable] B -- Not Found --\u003e C[Immutable Memtables] C -- Not Found --\u003e D[SSTables Newest to Oldest] D -- Not Found --\u003e E[Key Does Not Exist] The core LSM tradeoff is this: accept pointer indirection in memory so disk writes can be sequential.\n4. SSTables \u0026amp; The Magic Number When a memtable reaches its threshold, it\u0026rsquo;s retired into an immutable queue. Eventually, it\u0026rsquo;s flushed to disk as an SSTable (Sorted String Table).\nUnlike the memtable, the SSTable is a durable file. This implementation writes a simple layout: [records][bloom filter][footer]\nOne important lesson here was the use of a magic number in the footer. A database file is just bytes. Without a recognizable structure, we don\u0026rsquo;t know if the file is complete, corrupt, or even the right type of file. The magic number validates the file format. If the expected signature is missing, it\u0026rsquo;s safe to treat the file as invalid.\nTo optimize reads, we use Bloom filters per SSTable. Since reading from disk is expensive, the Bloom filter acts as a fast probabilistic check. It might give false positives, but it never gives false negatives. If the Bloom filter says a key isn\u0026rsquo;t there, we don\u0026rsquo;t have to scan the file. That is a small price to pay for reducing unnecessary disk scans.\n5. The Manifest Finally, the manifest is the durable list of SSTables that belong to the DB.\nAt first, it feels like writing an SSTable file should be enough. If 000001.sst exists on disk, why does the DB need anything else?\nThe OS directory can contain many files—some may be old, temporary, corrupt, or unrelated. The DB cannot just trust every file it sees. It needs a durable catalog. That catalog is the manifest.\nFor now, I kept the manifest deliberately simple and text-based:\n1 2 add 1 /tmp/db/sst/000001.sst next 2 On startup, the DB replays the manifest to rebuild its in-memory SSTable list.\nWhy Compaction Is Not Here Yet You might notice something missing: Compaction.\nCompaction needs a durable way to say: \u0026ldquo;Remove old SSTables A \u0026amp; B, and add new compacted SSTable C.\u0026rdquo; That means compaction entirely depends on having a robust manifest. For this milestone, I stopped after building the manifest foundation. A future compaction milestone will add the remove or replace manifest records.\nFinal Thoughts Building lsmgo has been an incredible exercise in systems programming. It’s one thing to read about how RocksDB or LevelDB works, but implementing the sequence numbers, tombstones, WAL framing, and skip list rotations really cements the concepts.\nIf you are interested in storage engines, I highly recommend building your own toy version. The tradeoff decisions you make will teach you more than any whitepaper could. The full breakdown of the project is available here: github. Contibutions and suggestions are welcome.\nGodSpeed, Franz\n","permalink":"https://franzego.github.io/posts/log-structured-merge-trees/","summary":"\u003ch1 id=\"building-an-lsm-tree-in-go-understanding-the-moving-parts\"\u003eBuilding an LSM Tree in Go: Understanding the Moving Parts\u003c/h1\u003e\n\u003cp\u003e\u003cimg alt=\"LSM Tree Simple Illustration\" loading=\"lazy\" src=\"/posts/log-structured-merge-trees/lsm_tree.png\"\u003e\u003c/p\u003e\n\u003cp\u003eLog-Structured Merge (LSM) Trees have become ubiquitous in today\u0026rsquo;s database world. The name has become almost synonymous with modern storage engines. LSM trees sit behind databases like Cassandra, CockroachDB, RocksDB, LevelDB, and Pebble.\u003c/p\u003e\n\u003cp\u003eOne thing these systems have in common is their use in write-heavy workloads. The goal of an LSM is to ingest writes at a large scale, as quickly as possible. It sacrifices some read speed for this, because nothing is ever free. Even then, some optimizations can make that tradeoff manageable.\u003c/p\u003e","title":"Log Structured Merge Trees"},{"content":"Trying to set up a personal blog is not quite as straightforward. There was no one stop guide for setting up one using github pages. I had to google a lot, make mistakes and grill gemini for answers. It was not so technical or difficult but after solving the issue, I decided to document the steps i followed for my future self and anyone who ends up in the same position as I was.\nHugo Written in Go, Hugo treats content like source code. It takes Markdown, runs it through Go\u0026rsquo;s html/template engine, and spits out a static site in milliseconds.\nIf it means anything, I did this on Ubuntu (wsl). It is probably similar to macOS. If you are on windows, just use wsl please - it saves a lot of headaches.\nThe Installation Trap If you are on Ubuntu, your first instinct is likely sudo apt install hugo. It is a trap. The standard repositories often carry outdated binaries, frequently missing the \u0026ldquo;extended\u0026rdquo; version required to compile SCSS for modern themes.\nInstead of relying on the package manager, I pulled the standalone binary directly from the source. It is cleaner and gives you absolute control over the runtime. Go about it like this:\n1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Pull the archive wget https://github.com/gohugoio/hugo/releases/download/v0.162.1/hugo_0.162.1_linux-amd64.tar.gz # the specific version may have changed at the time of your usage. # Extract tar -zxvf hugo_0.162.1_linux-amd64.tar.gz # Move the binary to your path sudo mv hugo /usr/local/bin/ # Clean up the remnants rm hugo_0.162.1_linux-amd64.tar.gz README.md LICENSE # Check version hugo version Hopefully the commands above went without any hitch. On to the next step.\nScaffolding the System With the Hugo binary in my path, initializing the architecture is trivial.\n1 2 3 hugo new site your-blog cd your-blog git init For the frontend, I wanted something minimalist - a layout that respects the reader\u0026rsquo;s time and focuses entirely on the typography and the code blocks. I went with PaperMod, adding it as a Git submodule so it can be updated without polluting the repository\u0026rsquo;s commit history.\n1 2 git submodule add https://github.com/adityatelange/hugo-PaperMod.git themes/PaperMod echo \u0026#34;theme = \u0026#39;PaperMod\u0026#39;\u0026#34; \u0026gt;\u0026gt; hugo.toml After this, you run this command:\n1 hugo new contents/posts/hello-world.md You create a repostiory on github: username.github.io. Do not include a gitignore or readme.md file. After that run your remote add origin command. then git branch -M main. Don\u0026rsquo;t push yet. Relax.\nAutomating the Deployment (and Fixing the Pipeline) The goal was a pure \u0026ldquo;docs-as-code\u0026rdquo; pipeline: I push a Markdown file to the main branch, and a CI/CD runner handles the build and deployment to GitHub Pages.\nWe create a workflows file for this repo. The code for it is in the .github/workflows/hugo.yml file in this repo. After that we add, commmit and push.\nI provisioned an Ubuntu runner in .github/workflows/hugo.yml. However, the modern CI environment is a moving target. GitHub is currently deprecating Node 20 actions, which caused the pipeline to throw a loud warning. I silenced it by forcing the runner to use Node 24 and bumping the Hugo action to v3\nA Syntax Lesson in the Front Matter The pipeline ran, and immediately crashed with a fatal exit code 1:\n1 unmarshal failed: toml: expected character = t is always the smallest details that bring a system down. Hugo relies heavily on \u0026ldquo;front matter\u0026rdquo;—the metadata at the top of every Markdown file. I had inadvertently mixed YAML syntax (which uses colons, like draft: false) inside a block that Hugo was trying to parse as TOML.\n1 2 3 4 5 --- title: \u0026#34;Hello World\u0026#34; date: 2026-05-28 draft: false --- Once the parser had the correct characters, the pipeline turned green. The site was live and active.\nSetting this up feels much like writing a good programs: strict, explicit, and highly performant. The infrastructure is now invisible, leaving me with nothing to do but write.\nGodspeed, Franz\n","permalink":"https://franzego.github.io/posts/hugo-project-setup/","summary":"\u003cp\u003eTrying to set up a personal blog is not quite as straightforward. There was no one stop guide for setting up one using github pages. I had to google a lot, make mistakes and grill gemini for answers. It was not so technical or difficult but after solving the issue, I decided to document the steps i followed for my future self and anyone who ends up in the same position as I was.\u003c/p\u003e","title":"Hello, World: Building a Go-Powered Blog from Scratch"}]