Marginalia Downloads
Here you will find supplementary downloads related to the development
of the marginalia search engine.
Data Exports
See exports/ for various exported data from the search engine's database and operations.
Runtime Dependencies
These are files needed to run and operate the search engine. These files are downloaded automatically by the Marginalia Search
build and convert scripts when appropriate so there's normally no need to manually download them.
Crawl Samples
- Small (1000 domains, 2 GB)
- Medium (2000 domains, 6 GB) -- recommended
- Large (5000 domains, 20 GB)
- Humongous (50,000 domains, 180 GB)
Warning: The larger samples are fairly annoying to work with, as they
take several hours to process. This is usually a one-off process unless you are
working on steps involving processing crawl data, and the tradeoff is that your
local test environment feels more real. But it's slow. The huge sample takes up to
5 hours to process on a reasonably powerful machine.
Models
These statistical models are used by marginalia's language processing.
- ngrams.bin (129 Mb)
- tfreq-new-algo3.bin (411 Mb)
- lid.176.ftz (920K; CC-SA 3.0, fasttext)