File: PERFORMANCE

package info (click to toggle)
plex86 0.0.20011018-8
links: PTS
area: main
in suites: woody
size: 4,868 kB
ctags: 8,721
sloc: ansic: 46,915; cpp: 17,817; xml: 1,283; makefile: 1,130; sh: 451; asm: 360; csh: 18
file content (91 lines) | stat: -rw-r--r-- 3,333 bytes
PERFORMANCE
===========

*** These are some notes/ideas for improving performance in plex86.

If guest SS spans 32bit space, can eliminate switching to r3h SS in
  tcode.  Add guest SS descriptor to meta page info and compare.

The timer handling needs work.  Currently, the guest is likely getting
  hammered with interrupts too frequently.

In fault handler, need to have ability to know if 1st instruction
  is virtualized.  If so, emulate and save round trip.

Finer invalidation granularity for code cache page writes.  Right now,
  we're dumping the entire page even if data access does not
  conflict with translated instructions.

optimize 16-bit code complementing so only instructions
  which require it are actually complemented.

Make DT usage bitlists 32-bits instead of 8.

There are several points marked with 'xxx' in the kernel/dt/
  code which talk about potential optimizations.

The PLEX86_PHYMEM_MOD ioctl interface for invalidating translated
  code pages due to user space writes, needs to be modified
  to be more efficient.  Currently, we dump all translated code
  for _any_ write.  Ouch!  The floppy/DMA interface from user
  space requires this.  So for now, any floppy access will drag
  performance.

Optimize functions in util-nexus.c: mon_memzero, mon_memcpy, mon_memset
  They could be done a lot more efficiently.
  Perhaps make mon_memzero function specifically for pages.

Some important and/or heavily used components of hardware emulation
  could be optionally moved to monitor space:
    {DMA, disk, floppy, video, etc, timer, ...}

Optionally, let ring3 guest code run without DT (SIV) intervention.
  Need a set of criteria for when this is possible, and should
  be controlled by user conf file setting.

Remove levels of indirection posed by plugin.c.

Use a better messaging system between user space and monitor.
  Possibly queue more than one message at a time.  Could use
  memory mapped page(s).

Different x86 modes could have their own opcode virtualization map.

Pseudo devices and special guest-OS specific device drivers for
  disk/network/video/etc and an associated architecture.  This would
  let us pass data more quickly and prevent a lot of emulation overhead.
  The real device emulation could plug into the same architecture as
  the pseudo devices.



============= OLD NOTES ======================================

*** These are notes from the 1st generation of software instruction
*** virtualization (SBE).  Some of them still apply.  I'll sort through
*** them later.

Fix extra CR3 reload in nexus.S

Handle string IO, N operations at a time, rather than
  1 at a time.  This is really bogging down performance.
  Have to consider paging and page boundaries, segment limits,
  timing issues, and debug single stepping etc here.

Could software breakpoints (INT3) invoke handler directly,
  rather than generate #GP because of lack of permission from
  guest at ring3.

Use COSIMULATE macro in kernel/ for trim code when it's
  not used.

If we virtualize a near branch instruction only because
  we are at the maximum recursion level, then we could
  unvirtualize it later on another pass, when the current
  level is lower?

Optimize multi-byte reads/writes to VGA

Alignment of routines in mon-fault.c

Keep multiple monitor GDTs customized for certain CPU modes?