1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
|
DSV.py - Cliff Wells, 2002
Import/export DSV (delimiter separated values, a generalization of CSV).
IMPORTANT NOTE:
1.4.1 introduces a small incompatiblity with previous versions:
errorHandler now passes an additional argument that user-defined
error handlers must accomodate. The new function signature is
now errorHandler(linenumber, oldrow, newrow, columns, maxColumns).
While this is a simple fix, it will break any error handlers that
don't allow for the new argument.
To install:
python setup.py install
To test:
python DSV.py
A sample csv file is included (darkwave.csv)
Basic use:
from DSV import DSV
data = file.read()
qualifier = DSV.guessTextQualifier(data) # optional
data = DSV.organizeIntoLines(data, textQualifier = qualifier)
delimiter = DSV.guessDelimiter(data) # optional
data = DSV.importDSV(data, delimiter = delimiter, textQualifier = qualifier)
hasHeader = DSV.guessHeaders(data) # optional
If you know the delimiters, qualifiers, etc, you may skip the optional
'guessing' steps as they rely on heuristics anyway (although they seem
to work well, there is no guarantee they are correct). What they are
best used for is to make a good guess regarding the data structure and then
let the user confirm it.
As such there is a 'wizard' to aid in this process (use this in lieu of
the above code - requires wxPython):
from DSV import DSV
dlg = DSV.ImportWizardDialog(parent, -1, 'DSV Import Wizard', filename)
dlg.ShowModal()
headers, data = dlg.ImportData() # may also return None
dlg.Destroy()
The dlg.ImportData() method may also take a function as an optional argument
specifying what it should do about malformed rows. See the example at the bottom
of this file. A few common functions are provided in this file (padRow, skipRow,
useRow).
Requires Python 2.0 or later
Wizards tested with wxPython 2.2.5/NT 4.0, 2.3.2/Win2000 and Linux/GTK (RedHat 7.x)
You can test DSV.py by running it at the prompt:
$ python DSV.py
A couple sample csv files are included (darkwave.csv, nastiness.csv).
Bugs/Caveats:
- Although I've tested this stuff on varied data, I'm sure there are cases
that I haven't seen that will choke any one of these routines (or at least
return invalid data). This is beta code!
- guessTextQualifier() algorithm is limited to quotes (double or single).
- Surprising feature: Hitting <enter> on wxSpinCtrl causes seg
fault under Linux/GTK (not Win32). Strangely, pressing <tab> seems ok.
Therefore, I had to use wxSpinButton. Also, spurious spin events get
generated for both of these controls (e.g. when calling wxBeginBusyCursor)
- Keyboard navigation needs to be implemented on wizards
- There may be issues with cr/lf translation, although I haven't yet seen any.
Why another CSV tool?:
- Because I needed a more flexible CSV importer, one that could accept different
delimiters (not just commas or tabs), one that could make an intelligent guess
regarding file structure (for user convenience), be compatible with the files
output by MS Excel, and finally, be easily integrated with a wizard. All of the
modules I have seen prior to this fell short on one count or another.
- It seemed interesting.
To do:
- Better guessTextQualifier() algorithm. In the perfect world I envision, I can
use any character as a text qualifier, not just quotes.
- Finish wizards and move them into separate module.
- Better guessHasHeaders() algorithm, although this is difficult.
|