File: README

package info (click to toggle)
hwtools 0.5-0.2
  • links: PTS
  • area: main
  • in suites: potato
  • size: 1,304 kB
  • ctags: 1,213
  • sloc: ansic: 9,522; tcl: 2,140; asm: 802; makefile: 295; sh: 262; cpp: 160; csh: 42
file content (552 lines) | stat: -rw-r--r-- 22,252 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
			===================
			= MemTest-86 v2.2 =
			===================

Table of Contents
=================
  1) Introduction
  2) Installation
  3) Feedback
  4) Memory Testing Philosophy
  5) Memtest86 Test Algorithms
  6) Test Details
  7) Execution Time
  8) Online Commands
  9) Display Description
 10) Troubleshooting Memory Errors
 11) Theory of Operation
 12) Change Log
 13) Problems
 14) Acknowledgments


1) Introduction
===============
Memtest86 is thorough, stand alone memory test for i386 architecture
systems.  BIOS based memory tests are only a quick check and often miss
failures that are detected by Memtest86.

For updates goto the Memtest86 web page:

	http://reality.sgi.com/cbrady_denver/memtest86

To report problems or provide feedback send email to:

	cbrady@sgi.com


2) Installation
===============
Memtest86 is a stand alone program and can be loaded from either a disk
partition or from a floppy disk.

To build Memtest86:
   1) Review the Makefile and adjust options as needed.
   2) Type "make"

This creates a file named "memtest86.bin" which is a bootable image.  This
image file may be copied to a floppy disk or lilo may be used to boot this
image from a hard disk partition.

To create a Memtest86 bootdisk
   1) Insert a blank write enabled floppy disk.
   2) As root, Type "make install"

To boot from a disk partition via lilo
   1) Copy the image file to a permanent location (ie. /memtest86).
   2) Add an entry in the lilo config file (usually /etc/lilo.conf) to boot
      memtest86.  Only the image and label fields need to be specified. 
      The following is a sample lilo entry for booting memtest86:

	image = /memtest86
	label = memt

   3) As root,  type "lilo"

      At the lilo prompt enter memt to boot memtest86.

If you encounter build problems a binary image has been included (precomp.bin).
To create a boot-disk with this pre-built image do the following:
   1) Insert a blank write enabled floppy disk.
   2) Type "make install-bin"

If you have problems with memory sizing an option was added in
version 2.1 to get the memory size from the BIOS.  To enable this 
option uncomment the define for BIOS_MEMSZ in init.c and then rebuild
and install the test.


3) Feedback
===========
To make Memtest86 better I need your feedback! The test algorithms
used in Memtest86 and their implementation are based mostly in theory.
I need to know (from real use) what types of algorithms and patterns
actually work best.  Therefore, if you find memory errors with Memtest86
please send an email with the test number that detected the error and
some details about the nature of the error (how often, how many addresses,
etc...).


4) Memory Testing Philosophy
============================
There are many good approaches for testing memory.  However, many tests
simply throw some patterns at memory without much thought or knowledge
of the memory architecture or how errors can best be detected. This
works fine for hard memory failures but does little to find intermittent
errors.  The BIOS based memory tests are useless for finding intermittent
memory errors.

Memory chips consist of a large array of tightly packed memory cells,
one for each bit of data.  The vast majority of the intermittent failures
are a result of interaction between these memory cells.  Often writing a
memory cell can cause one of the adjacent cells to be written with the
same data.  An effective memory test should attempt to test for this
condition.  Therefore, an ideal strategy for testing memory would be
the following:

  1) write a cell with a zero
  2) write all of the adjacent cells with a one, one or more times
  3) check that the first cell still has a zero

It should be obvious that this strategy requires an exact knowledge
of how the memory cells are laid out on the chip.  In addition there is a
never ending number of possible chip layouts for different chip types
and manufacturers making this strategy impractical.  However, there
are testing algorithms that can approximate this ideal strategy. 


5) Memtest86 Test Algorithms
============================
Memtest86 uses two algorithms that provide a reasonable approximation
of the ideal test strategy above.  The first of these strategies is called
moving inversions.  The moving inversion test works as follows:

  1) Fill memory with a pattern
  2) Starting at the lowest address
	2a check that the pattern has not changed
	2b write the patterns complement
	2c increment the address
	repeat 2a - 2c
  3) Starting at the highest address
	3a check that the pattern has not changed
	3b write the patterns complement
	3c decrement the address
	repeat 3a - 3c

This this algorithm is a good approximation of an ideal memory test but
there are some limitations.  Most high density chips today store data
4 to 16 bits wide.  With chips that are more than one bit wide it
is impossible to selectively read or write just one bit.  This means
that we cannot guarantee that all adjacent cells have been tested
for interaction.  In this case the best we can do is to use some
patterns to insure that all adjacent cells have at least been written
with all possible one and zero combinations.

It can also be seen that caching, buffering and out of order execution
will interfere with the moving inversions algorithm and make less effective.
It is possible to turn off cache but the memory buffering in new high
performance chips can not be disabled.  To address this limitation a new
algorithm I call Modulo-X was created.  This algorithm is not affected by
cache or buffering.  The algorithm works as follows:
  1) For starting offsets of 0 - 20 do
	1a write every 20th location with a pattern
	1b write all other locations with the patterns complement
	   repeat 1b one or more times
	1c check every 20th location for the pattern

This algorithm accomplishes nearly the same level of adjacency testing
as moving inversions but is not affected by caching or buffering.  Since
separate write passes (1a, 1b) and the read pass (1c) are done for all of
memory we can be assured that all of the buffers and cache have been
flushed between passes.  The selection of 20 as the stride size was somewhat
arbitrary.  Larger strides may be more effective but would take longer to
execute.  The choice of 20 seemed to be a reasonable compromise between
speed and thoroughness.


6) Test Details
===============
Memtest86 executes a series of numbered test sections to check for
errors.  These test sections consist of a combination of test
algorithm, data pattern, cache setting and refresh rate. The 
execution order for these tests were arranged so that errors will be
detected as rapidly as possible.  Tests 8, 9 and 10 are very long running
extended tests and are only executed when extended testing is selected.
The extended tests have a low probability of finding errors that were
missed by the default tests.  A description of each of the test sections
follows:

Test 0 [Address test, walking ones, no cache]
  Tests all address bits in all memory banks by using a walking ones
  address pattern.

Test 1 [Moving Inv, ones&zeros, cached]
  This test uses the moving inversions algorithm with patterns of only
  ones and zeros.  Cache is enabled even though it interferes to some
  degree with the test algorithm.  With cache enabled this test does not
  take long and should quickly find all "hard" errors and some more
  subtle errors.  This section is only a quick check.

Test 2 [Modulo 20, ones&zeros, cached]
  Using the Modulo-X algorithm should uncover errors that are not
  detected by moving inversions due to cache and buffering interference
  with the the algorithm.  As with test one only ones and zeros are
  used for data patterns.

Test 3 [Address test, own address, no cache]
  Each address is written with its own address and then is checked
  for consistency.  In theory previous tests should have caught any
  memory addressing problems.  This test should catch any addressing
  errors that somehow were not previously detected.
 
Test 4 [Moving inv, 8 bit pat, cached]
  This is the same as test zero but uses a 8 bit wide pattern of
  "walking" ones and zeros.  This test will better detect subtle errors
  in "wide" memory chips.  A total of 20 data patterns are used.
  
Test 5 [Moving inv, 32 bit pat, cached]
  This is a variation of the moving inversions algorithm that
  shifts the data pattern left one bit for each successive address.
  The starting bit position is shifted left for each pass.  To use
  all possible data patterns 32 passes are required.  This test should
  be effective in detecting data sensitive errors is "wide" memory
  chips.

Test 6 [Moving inv, ones&zeros, no cache]
  This is the same as test one but without cache.  With cache off
  there will be much less interference with the test algorithm.
  However, the execution time is much, much longer.  This test may
  find very subtle errors missed by tests one and two.

Test 7 [Modulo 20+, ones&zeros, cached]
  With test 5 we look for obscure "fade" or "multiple write" errors.
  This test is nearly identical to test two.  In test two we write
  all of the test locations with a pattern and then write the
  complement to all other locations two times.  In this test we write
  the complement six times.  This means that the time from when the test
  locations are written and then checked is much longer, checking for
  problems with fade.  It also better detects errors that only show up
  when adjacent cells are written multiple times.

Test 8 [Moving inv, 8 bit pat, no cache]
  This is the first extended test.  By using an 8 bit pattern with
  cache off this test should be effective in detecting all types of
  errors.  However, it takes a very long time to execute and there is
  a low probability that it will detect errors not found by the previous
  tests.

Test 9 [Modulo 20, 8 bit, cached]
  This is the first test to use the modulo 20 algorithm with a data
  pattern other than ones and zeros.  This combination of algorithm and
  data pattern should be quite effective.  However, it's very long
  execution time relegates it to the extended test section.

Test 10 [Moving inv, 32 bit pat, no cache]
  This test should be the most effective in finding errors that are
  data pattern sensitive.  However, without cache it's execution time
  is excessively long.

Memtest86 has the ability to test memory using longer refresh rates.  This
makes is possible to detect marginal errors that otherwise would go
undetected with the normal refresh rate.  Three refresh rates are available,
the normal rate of 15ms, an extended refresh rate of 150ms and an extra long
rate of 500ms.  The default refresh rate is used for test 0 and tests 1 - 5
use an extended rate of 150ms.  The extended tests (8 - 10) use the extra
long refresh rate of 500ms.  The refresh rate may be overridden at any time
via online configuration commands.

Note: the extra long refresh rate is much longer than normal and errors
reported with this refresh rate do not necessarily indicate faulty memory.


7) Execution Time
==================
The time required for a complete pass of Memtest86 will vary greatly
depending on CPU speed, memory speed and memory size.  Here are the
execution times from a Cleron-366 with 128mb of SDRAM:

  Test 0:     0:07
  Test 1:     0:33
  Test 2:     3:29
  Test 3:     1:34
  Test 4:     3:09
  Test 5:    14:56
  Test 6:    17:33
  Test 7:    15:23

  Total Time for Default tests:  56:44

  Test 8:   2:18:58
  Test 9:   1:01:20
  Test 10: 12:59:00

  Total Time for All tests:  17:14:50


8) Online Commands
==================
Memtest86 has a limited number of online commands.  Online commands
provide control over cache and refresh settings, test selection,
address range and error scrolling.  A help bar is displayed at the
bottom of the screen listing the available on-line commands. 

  Command  Description

  ESC   Exits the test and does a warm restart via the BIOS.

  c     Enters test configuration menu
	    Menu options are:
               1) Cache mode
               2) Refresh mode
               3) Test selection
	       4) Address Range

  SP    Set scroll lock (Stops scrolling of error messages)
	Note: Testing is stalled when the scroll lock is
	set and the scroll region is full.

  CR    Clear scroll lock (Enables error message scrolling)


9) Display Description
======================
The following is a description of each field displayed by Memtest86:

  Test No:  The number and description of the test being executed
  Testing:  The range of memory currently being tested
  Pattern:  The current 32 bit data pattern used for testing
  Cache:    Cache status for both L1 and L2 cache
  Refresh:  The current dram refresh rate  
  CPU:      The CPU type
  L1 Cache: The size of level 1 cache (Not available for all CPU
	    types)
  L2 Cache: The size of level 2 cache (Not available for all CPU
	    types)
  Memory:   A list of all memory segments found
  Pass:     Number of times that the entire test sequence has been
	    completed
  Errors:   Total errors found

Just to the right of the title a "windmill" and a series of dots
provide visibility of the test progress.  The "windmill" rotates once for
every 8mb of memory tested.  For each test a new dot is added to the
series when a sweep of memory has been completed.  Four to eight sweeps
of memory are done for each pattern depending of the test being executed.

The following information is displayed when a memory error is detected.
An error message is only displayed for errors with a different address or
failing bit pattern.  All displayed values are in hexadecimal.

  Addr:   Failing memory address 
  Good:   Current data pattern 
  Bad:    Failing data pattern 
  Xor:    Exclusive or of good and bad data (this shows the position
	  of the failing bit(s))
  Count:  Number of consecutive errors with the same address and
	  failing bits


10) Troubleshooting Memory Errors
================================
Once a memory error has been detected, determining the failing SIMM/DIMM
module is not a clear cut procedure.  With the large number of motherboard
vendors and possible combinations of simm slots it would be difficult if
not impossible to assemble complete information about how a particular
error would map to a failing memory module.  However, there are steps
that may be taken to determine the failing module.  Here are three
techniques that you may wish to use:

1) Removing modules
This is simplest method for isolating a failing modules, but may only be
employed when one or more modules can be removed from the system.  By
selectively removing modules from the system and then running the test
you will be able to find the bad modules.  Be sure to note exactly which
modules are in the system when the test passes and when the test fails.

2) Rotating modules
When none of the modules can be removed then you may wish to rotate modules
to find the failing one.  This technique can only be used if there are
three or more modules in the system.  Change the location of two modules
at a time.  For example put the module from slot 1 into slot 2 and put
the module from slot 2 in slot 1.  Run the test and if either the failing
bit or address changes then you know that the failing module is one of the
ones just moved. By using several combinations of module movement you
should be able to determine which module is failing.

3) Replacing modules
If you are unable to use either of the previous techniques then you are
left to selective replacement of modules to find the failure.  


11) Theory of Operation
=======================
Bootstrap and setup code is used to load Memtest86. This code loads the
test, sets up memory management registers and does miscellaneous setup.
When the load and setup are complete the memory map is as follows:

0x000	|-----------------------------------------------|
	|	Stack (4k)				|
0x1000	|-----------------------------------------------|
	|	Memtest-text (24k) Origin  0x1000	|
0x7000	-------------------------------------------------
	|	Memtest-data (2k)  Origin  0x7000	|
0x7800	-------------------------------------------------
	|	Memtest-text (24k) Origin  0x108800	|
0xd800	-------------------------------------------------
	|	Memtest-data (2k)  Origin  0x10e800	|
0xe000	-------------------------------------------------
	|	Common variables (1k)			|
0xe400	-------------------------------------------------

Relocation of the test is accomplished by using two copies of the test
code that have been built to execute at different addresses (different
origins).  When the test is started, the code with an origin of 0x1000 is
executed.  At the end of the testing phase the memory block from 0x1000
to 0xe400 is copied to 0x101000, the stack is set to 0x101000 and then
we jump to address 0x108800 (the code with an origin of 0x108800).
When the code is relocated only the first 640k of memory is tested.
When this test is complete then the code is moved back to 0x1000, the
stack is set back to 0x1000 and then we jump to 0x1000 (the code with
an origin of 0x1000).
	
When Memtest86 is loaded into memory it first scans memory to find all
segments of available read/write memory (DRAM).  DRAM is identified by
reading a location and then writing its complement.  If at least one bit in
each byte changes then we assume that it is DRAM.  To save time we only do
this check every 1k bytes.  All memory from 0xa0000 to 0xfffff is skipped.
Each segment of memory is displayed on the right side of the screen.  All
segments of memory that are found will be tested regardless of size.  The
memory scan is limited to the maximum memory size supported by the
motherboard.


12) Change Log
==============
Enhancements in v2.2
   Added two new address tests

   Added an on-line command for setting test address range

   Optomized test code for faster execution (-O3, -funroll-loops and
	-fomit-frame-pointer)

   Added and elapsed time counter.

   Adjusted menu options for better consistency


Enhancements in v2.1
   Fixed a bug in the CPU detection that caused the test to
   hang or crash with some 486 and Cryrix CPU's

   Added CPU detection for Cyrix CPU's

   Extended and improved CPU detection for Intel and AMD CPU's

   Added a compile time option (BIOS_MEMSZ) for obtaining the last
   memory address from the BIOS.  This should fix problems with memory
   sizing on certain motherboards.  This option is not enabled by default.
   It may be enabled be default in a future release.

Enhancements in v2.0
   Added new Modulo-20 test algorithm.

   Added a 32 bit shifting pattern to the moving inversions algorithm.

   Created test sections to specify algorithm, pattern, cache and refresh
   rate.

   Improved test progress indicators.

   Created  popup menus for configuration.

   Added menu for test selection.

   Added CPU and cache identification.

   Added a "bail out" feature to quit the current test when it does not
   fit the test selection parameters.

   Re-arranged the screen layout and colors.

   Created local include files for I/O and serial interface definitions
   rather than using the sometimes incompatible system include files. 

   Broke up the "C" source code into four separate source modules.

Enhancements in v1.5
   Some additional changes were made to fix obscure memory sizing
   problems.

   The 4 bit wide data pattern was increased to 8 bits since 8 bit
   wide memory chips are becoming more common.

   A new test algorithm was added to improve detection of data
   pattern sensitive errors. 


Enhancements in v1.4
   Changes to the memory sizing code to avoid problems with some
   motherboards where memtest would find more memory than actually
   exists.

   Added support for a console serial port. (thanks to Doug Sisk)

   On-line commands are now available for configuring Memtest86 on
   the fly (see On-line Commands).
	

Enhancements in v1.3
   Scrolling of memory errors is now provided.  Previously, only one screen
   of error information was displayed.

   Memtest86 can now be booted from any disk via lilo.

   Testing of up to 4gb of memory has been fixed is now enabled by default.
   This capability was clearly broken in v1.2a and should work correctly
   now but has not been fully tested (4gb PC's are a bit rare).

   The maximum memory size supported by the motherboard is now being
   calculated correctly.  In previous versions there were cases where not
   all of memory would be tested and the maximum memory size supported
   was incorrect.

   For some types of failures the good and bad values were reported to be
   same with an Xor value of 0.  This has been fixed by retaining the data
   read from memory and not re-reading the bad data in the error reporting
   routine.

   APM (advanced power management) is now disabled by Memtest86.  This
   keeps the screen from blanking while the test is running.

   Problems with enabling & disabling cache on some motherboards have been
   corrected.

13) Problems
============
Problems have been reported with some Compact Presario systems with
K6-2 CPUs.  Solid errors are detected in the last 16k - 4meg of memory
but otherwise the computer works fine.  The reason for this is currently
unknown.

Memtest86 has not been designed for or tested with parity memory enabled
or error correcting (ECC) memory.  With ECC memory the test will not be
able to detect single bit errors but the should otherwise execute correctly.

Memtest86 has no support for multiple processors.

There have been a number of compatibility problems reported.  Most of
these problems have been identified and corrected, but it is likely that
there are still some incompatibilities.  Please report problems.

Changes in the loader and kernel include files have caused incompatibilities
resulting in build failures.  A binary image (precomp.bin) of the test is
included and may be used if build problems are encountered.


14) Acknowledgments
===================
The initial versions of the source files bootsect.S, setup.S, head.S and
build.c are from the Linux 1.2.1 kernel and have been heavily modified.

Doug Sisk provided code to support a console connected via a serial port.