File: README

package info (click to toggle)
python-dsv 1.4.1-3
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 364 kB
  • ctags: 187
  • sloc: python: 583; makefile: 2
file content (89 lines) | stat: -rw-r--r-- 3,569 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
DSV.py - Cliff Wells, 2002
  Import/export DSV (delimiter separated values, a generalization of CSV).

IMPORTANT NOTE:
  1.4.1 introduces a small incompatiblity with previous versions: 
  errorHandler now passes an additional argument that user-defined
  error handlers must accomodate.  The new function signature is 
  now errorHandler(linenumber, oldrow, newrow, columns, maxColumns).
  While this is a simple fix, it will break any error handlers that
  don't allow for the new argument.


To install:
python setup.py install

To test:
python DSV.py
A sample csv file is included (darkwave.csv)

Basic use:

   from DSV import DSV

   data = file.read()
   qualifier = DSV.guessTextQualifier(data) # optional
   data = DSV.organizeIntoLines(data, textQualifier = qualifier)
   delimiter = DSV.guessDelimiter(data) # optional
   data = DSV.importDSV(data, delimiter = delimiter, textQualifier = qualifier)
   hasHeader = DSV.guessHeaders(data) # optional

If you know the delimiters, qualifiers, etc, you may skip the optional
'guessing' steps as they rely on heuristics anyway (although they seem
to work well, there is no guarantee they are correct). What they are 
best used for is to make a good guess regarding the data structure and then 
let the user confirm it.

As such there is a 'wizard' to aid in this process (use this in lieu of 
the above code - requires wxPython):

   from DSV import DSV

   dlg = DSV.ImportWizardDialog(parent, -1, 'DSV Import Wizard', filename)
   dlg.ShowModal()
   headers, data = dlg.ImportData() # may also return None
   dlg.Destroy()

The dlg.ImportData() method may also take a function as an optional argument
specifying what it should do about malformed rows.  See the example at the bottom
of this file. A few common functions are provided in this file (padRow, skipRow,
useRow).

Requires Python 2.0 or later
Wizards tested with wxPython 2.2.5/NT 4.0, 2.3.2/Win2000 and Linux/GTK (RedHat 7.x)


You can test DSV.py by running it at the prompt:

$ python DSV.py

A couple sample csv files are included (darkwave.csv, nastiness.csv).


Bugs/Caveats:
   - Although I've tested this stuff on varied data, I'm sure there are cases
     that I haven't seen that will choke any one of these routines (or at least
     return invalid data). This is beta code!
   - guessTextQualifier() algorithm is limited to quotes (double or single).
   - Surprising feature: Hitting <enter> on wxSpinCtrl causes seg
     fault under Linux/GTK (not Win32). Strangely, pressing <tab> seems ok.
     Therefore, I had to use wxSpinButton.  Also, spurious spin events get
     generated for both of these controls (e.g. when calling wxBeginBusyCursor)
   - Keyboard navigation needs to be implemented on wizards
   - There may be issues with cr/lf translation, although I haven't yet seen any.
   
Why another CSV tool?:
   - Because I needed a more flexible CSV importer, one that could accept different
     delimiters (not just commas or tabs), one that could make an intelligent guess
     regarding file structure (for user convenience), be compatible with the files
     output by MS Excel, and finally, be easily integrated with a wizard.  All of the
     modules I have seen prior to this fell short on one count or another.
   - It seemed interesting.
     
To do:
   - Better guessTextQualifier() algorithm. In the perfect world I envision, I can
     use any character as a text qualifier, not just quotes.
   - Finish wizards and move them into separate module.
   - Better guessHasHeaders() algorithm, although this is difficult.