File: control

package info (click to toggle)
llama.cpp 6641%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 43,640 kB
  • sloc: cpp: 218,020; ansic: 117,624; python: 29,020; lisp: 9,094; sh: 5,776; objc: 1,045; javascript: 828; xml: 259; makefile: 219
file content (265 lines) | stat: -rw-r--r-- 11,188 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
Source: llama.cpp
Section: science
Priority: optional
Maintainer: Debian Deep Learning Team <debian-ai@lists.debian.org>
Uploaders: Christian Kastner <ckk@debian.org>
Standards-Version: 4.7.2
Vcs-Browser: https://salsa.debian.org/deeplearning-team/llama.cpp
Vcs-Git: https://salsa.debian.org/deeplearning-team/llama.cpp.git
Homepage: https://github.com/ggml-org/llama.cpp/
Build-Depends: dh-sequence-bash-completion,
               cmake,
               dh-python,
               debhelper-compat (= 13),
               help2man,
               libcurl4-openssl-dev,
               libggml-dev (>= 0.9.4),
               libggml-dev (<< 0.9.5),
               pkgconf,
Build-Depends-Indep: dh-sequence-python3,
                     python3-all,
                     pybuild-plugin-pyproject,
                     python3-poetry-core,
                     python3-numpy,
                     python3-tqdm,
                     python3-yaml,
                     python3-sentencepiece,
                     python3-pytest,
Rules-Requires-Root: no

Package: llama.cpp
Architecture: all
Depends: llama.cpp-tools,
         ${misc:Depends},
Recommends: llama.cpp-tools-extra,
            python3-gguf,
Suggests: llama.cpp-examples
Description: LLM inference in C/C++ - metapackage
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This is a meta-package that either depends on all of the relevant binary
 packages.

Package: libllama0
Section: libs
Architecture: any
Multi-Arch: same
Depends: libggml0-backend-cpu (>= 0.9.4),
         libggml0-backend-cpu (<< 0.9.5),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - libraries
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains the libllama and libmtmd libraries. Note that these
 libraries are not yet stable, so they are installed to private directories
 for now.

Package: libllama-dev
Section: libdevel
Architecture: any
Multi-Arch: same
Depends: libllama0 (= ${binary:Version}),
         libggml-dev (>= 0.9.4),
         libggml-dev (<< 0.9.5),
         ${misc:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - headers and development files
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package provides the llama.cpp library headers and development files.
 Note that these libraries are not yet stable, so they are installed to
 private directories for now.

Package: llama.cpp-tools
Architecture: any
Multi-Arch: foreign
Depends: libllama0 (= ${binary:Version}),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - main utilities
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains the subset of the most commonly used utilities:
 llama-cli, llama-server, llama-bench, and llama-quantize.

Package: llama.cpp-tools-extra
Architecture: any
Multi-Arch: foreign
Depends: llama.cpp-tools (= ${binary:Version}),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - extra utilities
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains all tools that are not already shipped in package
 llama.cpp-tools.

Package: llama.cpp-examples
Architecture: any
Multi-Arch: foreign
Depends: llama.cpp-tools (= ${binary:Version}),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - example programs
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains utilities that usptream ships as examples.

Package: llama.cpp-tests
Architecture: any
Multi-Arch: foreign
Depends: libllama0 (= ${binary:Version}),
         ${misc:Depends},
         ${shlibs:Depends},
Breaks: llama.cpp (<< 5882+dfsg-3~exp1)
Replaces: llama.cpp (<< 5882+dfsg-3~exp1)
Description: LLM inference in C/C++ - tests
 The main goal of llama.cpp is to enable LLM inference with minimal setup and
 state-of-the-art performance on a wide range of hardware - locally and in the
 cloud.
 .
  * Plain C/C++ implementation without any dependencies
  * Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate
    and Metal frameworks
  * AVX, AVX2, AVX512 and AMX support for x86 architectures
  * 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization
    for faster inference and reduced memory use
  * Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs
    via HIP and Moore Threads MTT GPUs via MUSA)
  * Vulkan and SYCL backend support
  * CPU+GPU hybrid inference to partially accelerate models larger than the
    total VRAM capacity
 .
 The compute functionality is provided by ggml. By default, ggml's CPU backend
 is installed, but there are many other backends for CPUs and GPUs.
 .
 This package contains all of the test binaries, mainly for autopkgtests.

Package: python3-gguf
Section: python
Architecture: all
Depends: ${python3:Depends},
         ${misc:Depends},
Suggests: python3-pyside6.qtcore,
          python3-pyside6.qtwidgets,
Description: Python library for working with GGUF files
 GGUF is a file format for storing models for inference with GGML and executors
 based on GGML. GGUF is a binary format that is designed for fast loading and
 saving of models, and for ease of reading. Models are traditionally developed
 using PyTorch or another framework, and then converted to GGUF for use in
 GGML.
 .
 This package provides a Python library for reading and writing files in the
 GGUF format, and exposes this to the CLI in the form of a few utilities.