Note that the modelPath is the only required parameter. For testing you can set this in the environment variable LLAMA_PATH.

Hierarchy

Properties

modelPath: string

Path to the model on the filesystem.

batchSize?: number

Prompt processing batch size.

cache?: boolean | BaseCache<Generation[]>
callbackManager?: CallbackManager

Deprecated

Use callbacks instead

callbacks?: Callbacks
concurrency?: number

Deprecated

Use maxConcurrency instead

contextSize?: number

Text context size.

embedding?: boolean

Embedding mode only.

f16Kv?: boolean

Use fp16 for KV cache.

gpuLayers?: number

Number of layers to store in VRAM.

logitsAll?: boolean

The llama_eval() call computes all logits, not just the last one.

maxConcurrency?: number

The maximum number of concurrent calls that can be made. Defaults to Infinity, which means no limit.

maxRetries?: number

The maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.

maxTokens?: number
metadata?: Record<string, unknown>
onFailedAttempt?: FailedAttemptHandler

Custom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.

prependBos?: boolean

Add the begining of sentence token.

seed?: null | number

If null, a random seed will be used.

tags?: string[]
temperature?: number

The randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.

threads?: number

Number of threads to use to evaluate tokens.

topK?: number

Consider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature > 0.

topP?: number

Selects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature > 0.

trimWhitespaceSuffix?: boolean

Trim whitespace from the end of the generated text Disabled by default.

useMlock?: boolean

Force system to keep model in RAM.

useMmap?: boolean

Use mmap if possible.

verbose?: boolean
vocabOnly?: boolean

Only load the vocabulary, no weights.

Generated using TypeDoc