Marginalia Downloads

Here you will find supplementary downloads related to the development of the marginalia search engine.

Data Exports

See exports/ for various exported data from the search engine's database and operations.

Runtime Dependencies

These are files needed to run and operate the search engine. These files are downloaded automatically by the Marginalia Search build and convert scripts when appropriate so there's normally no need to manually download them.

Crawl Samples

  1. Small (1000 domains, 2 GB)
  2. Medium (2000 domains, 6 GB) -- recommended
  3. Large (5000 domains, 20 GB)
  4. Humongous (50,000 domains, 180 GB)
Warning: The larger samples are fairly annoying to work with, as they take several hours to process. This is usually a one-off process unless you are working on steps involving processing crawl data, and the tradeoff is that your local test environment feels more real. But it's slow. The huge sample takes up to 5 hours to process on a reasonably powerful machine.

Models

These statistical models are used by marginalia's language processing.
  1. ngrams.bin (129 Mb)
  2. tfreq-new-algo3.bin (411 Mb)
  3. lid.176.ftz (920K; CC-SA 3.0, fasttext)