Image
MITReleasesC++20a-architecturea-platformbuildcodecov

Overview

What is a BSA?

The Bethesda archive file is a proprietary format used to store game files for the The Elder Scrolls and Fallout series of games beginning with the The Elder Scrolls III. This format is essentially a zip file which stores some extra meta information to be used with their internal virtual filesystem. There are many tools that can be used to work with these files, but most are written as user facing applications, without the intention to be used as a library. bsa intends to provide a low-ish level interface for C++ programmers to work with the format.

Why bsa?

It's written in contemporary C++

Nope, it's not written in C#, Pascal, or Python. It's all native, with contemporary C++ features in mind. bsa provides interfaces that model standard containers, so that programmers intuitively understand how to work with its interface without needing to dive into the documentation.

It's actively tested

The testsuite covers a wide range of features, ensuring that bsa handles archives accurately, and that bugs never regress.

It's low overhead

bsa primarily stores no-copy views into file data/strings so that objects are cheap to copy and the resulting memory overhead is low. However, bsa can also take ownership of data, as a convenience.

It's low level

bsa provides low level interfaces into the underlying data, so that programmers who feel they can "do it better" don't feel burdened by arbitrary restrictions. This does not mean that there aren't high level interfaces, it simply means that bsa will step out of your way when appropriate.

Examples

Reading

#include <bsa/tes4.hpp>
#include <cstdio>
#include <filesystem>

int main()
{
    std::filesystem::path oblivion{ "path/to/oblivion" };
    bsa::tes4::archive bsa;
    const auto version = bsa.read(oblivion / "Data/Oblivion - Voices2.bsa");
    const auto file = bsa["sound/voice/oblivion.esm/imperial/m"]["testtoddquest_testtoddhappy_00027fa2_1.mp3"];
    if (file) {
        file->write(std::filesystem::path{ "happy.mp3" }, version);
    }
}

Writing

#include <bsa/tes4.hpp>
#include <cstddef>
#include <utility>

int main()
{
    const char payload[] = { "Hello world!\n" };
    bsa::tes4::file f;
    f.set_data({ reinterpret_cast<const std::byte*>(payload), sizeof(payload) - 1 });

    bsa::tes4::directory d;
    d.insert("hello.txt", std::move(f));

    bsa::tes4::archive archive;
    archive.insert("misc", std::move(d));
    archive.archive_flags(bsa::tes4::archive_flag::file_strings | bsa::tes4::archive_flag::directory_strings);
    archive.archive_types(bsa::tes4::archive_type::misc);

    archive.write("example.bsa", bsa::tes4::version::sse);
}

CMake Options

OptionDefault ValueDescription
BSA_BUILD_DOCSOFFSet to ON to build the documentation.
BSA_BUILD_EXAMPLESOFFSet to ON to build the examples.
BSA_BUILD_SRCON ✔️Set to ON to build the main library.
BSA_SUPPORT_XMEMOFFSet to ON to build support for the xmem codec proxy.
BUILD_TESTINGON ✔️Set to ON to build the tests. See also the CMake documentation for this option.

Integration

bsa uses CMake as its primary build system. Assuming that bsa and its dependencies have been installed to a place where CMake can find it, then using it in your project is as simple as:

find_package(bsa REQUIRED CONFIG)
target_link_libraries(${PROJECT_NAME} PUBLIC bsa::bsa)

XMem Codec

The xmem codec is a compression format available as part of the xbox development kit (XDK). This compression format is utilized only in TESV. archive.exe for TESV:SSE has this compression flag available, however it is unimplemented, and the game will simply use LZ4 instead. Support for this format is very difficult due to its proprietary nature, however there exists an implementation of the format as part of the XNA framework, which is freely available, albeit as a 32-bit binary. Thus, support for this format is only available on Windows, and requires users to opt into it via the BSA_SUPPORT_XMEM CMake option. Additionally, users must build the xmem support proxy separately, and bundle the resulting binary with their own.

Important Notes

  • If the hash of one file compares equal to the hash of another file, then they are equal. It doesn't matter if they have different file names, or if they store different data blobs. The game engine uniquely identifies file's based on their hash alone.
  • UTF-8 inputs are not well formed. The game engine has a crippling bug where extended ascii characters can index out-of-bounds, producing unreproducible hashes. It is the user's job to ensure they aren't attempting to store paths which contain such characters. The game engine will accept them, but it will never be able to reproducibly locate them.
  • The game engine normalizes paths to use the \ character instead of the standard /. As such, users should be aware that file paths retrieved from the virtual file system may not constitute valid paths on their native file system.
  • Avoid writing file paths which are close to the limit of MAX_PATH. Bethesda uses fixed buffers everywhere with no input validation, so they will most likely crash the game.
  • Make sure to lexically normalize your paths before you pass them. Bethesda uses really basic path splitting methods, and bsa replicates them.
  • Files can not be split into more than 4 chunks inside a ba2. Bethesda uses a fixed buffer to store the chunks, and exceeding that limit will likely crash the game.

Dependencies

Consumption

XMem Codec Support

Development

Alternatives