File: BedBaseVisitor.hpp

package info (click to toggle)
bedops 2.4.41%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 21,148 kB
  • sloc: ansic: 28,562; cpp: 15,359; sh: 2,704; makefile: 2,687; xml: 1,669; python: 1,582; csh: 823; perl: 365; java: 172
file content (229 lines) | stat: -rw-r--r-- 7,684 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
/*
  Author: Shane Neph & Scott Kuehn
  Date:   Dec. 7, 2009
*/
//
//    BEDOPS
//    Copyright (C) 2011-2022 Shane Neph, Scott Kuehn and Alex Reynolds
//
//    This program is free software; you can redistribute it and/or modify
//    it under the terms of the GNU General Public License as published by
//    the Free Software Foundation; either version 2 of the License, or
//    (at your option) any later version.
//
//    This program is distributed in the hope that it will be useful,
//    but WITHOUT ANY WARRANTY; without even the implied warranty of
//    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
//    GNU General Public License for more details.
//
//    You should have received a copy of the GNU General Public License along
//    with this program; if not, write to the Free Software Foundation, Inc.,
//    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
//

#ifndef _BED_BASE_VISITOR_HPP
#define _BED_BASE_VISITOR_HPP

#include <list>
#include <set>
#include <type_traits>

#include "algorithm/WindowSweep.hpp"
#include "data/bed/BedCompare.hpp"
#include "data/bed/BedDistances.hpp"
#include "utility/Assertion.hpp"
#include "utility/Exception.hpp"

/*
  Bed types are not sequences of simple numbers.  WindowSweep::sweep() can
  be used with both types, but there are inherent difficulties with BED.

  Take, for instance, the properly ordered coordinates:
  Problem 1:
  chr1 0 100
  chr1 23 25
  chr1 23 107
  chr1 30 45
  chr1 31 32
  chr1 99 100
  chr1 101 107

  When the first element is evaluated, everything but the last row is part
  of the current window.  When you change to the second row, chr1 99 100 is
  no longer part of the current window.  Then it is for the 3rd row, then
  not for the 4th.  We do not account for such possibilities within sweep()
  or the main Visitor baseclass via events.

  BedBaseVisitor implements logic to deal with these problems, and issues
  events to subclasses appropriately.  Bed type visitors should inherit
  from BedBaseVisitor and use its public interface.

  The same sort of issue exists on the left end.
  Problem 2:
  chr1 0 100
  chr1 23 25
  chr1 39 100

  When the 3rd row is the current item, row 2 does not make up its window,
  but row 1 does.  Once row 2 is behind and out of scope relative to the
  current item; it never comes back into play.

  I believe memory management can no longer be passed down to derived classes
  due to buffering that goes on within this class.  I haven't thought about
  it much yet as no Visitor has yet needed its own memory management.

  I have tried optimizing things when the overlap method is based upon bp's and
   not percentages and received much better runtimes.  Unfortunately, my assumptions
   were wrong.  The bottom line is that you need to look at every element in the
   current window, both going out of range to the left and coming into range on
   the right in fixWindow().  Here is the case that fails:

  chr1  1   200  a  1
  chr1  10  20   b  3
  chr1  50  150  c  4

  when used with bedmap --bp-ovr 11 --count -

  row 2 has no hits, but row 1 goes with row 3's output.  Cannot assume that once
    row 2 is out of range that everything to the left of it is out of range too,
    even when looking at straight bp's.  The main issue shown here is that the
    --bp-ovr 11 is larger than row 2's range.  It can never qualify but row 1 can.

  All of the problems mentioned can be attributed to fully nested elements.  There
    are scenarios without fully nested elements that are problematic too without this
    BedBaseVisitor.  For example, consider using Visitors.hpp as base class with the
    bedmap application under the following scenario:
      bedmap --fraction-map 0.1 --echo --count with the following file:

        chr1	564622	564633
        chr1	564629	564637
        chr1	564634	564677
        chr1	564673	564681
        chr1	564974	565006
        chr1	564978	565023
        chr1	565007	565040
        chr1	565256	565294

   When at line 2, line 3 does not map b/c 0.1*43nt = 4nt while line 2 and line 3
     overlap by 3 nt.  But, when we go to line 3 as the reference, line 2 is an
     overlapping element that qualifies since its size is 8 nt (0.1*8nt -> any
     overlap satisfies).  Unfortunately, sweep() has already flushed out line 2
     since it has the inherent symmetry assumption:  dist(a,b) == dist(b,a).
   BedBaseVisitor deals with this scenario properly.
*/


namespace Visitors {

  template <typename BedDist, typename Ref, typename Map = Ref>
  struct BedBaseVisitor {
     typedef BedDist DistType;
     typedef Ref RefType;
     typedef Map MapType;

  protected:
     // typedefs
     typedef std::set<MapType*, Bed::CoordRestAddressCompare<MapType>> OrderLesser;
     typedef OrderLesser OrderCache;
     typedef OrderLesser OrderWin;

  public:
     // Interface for sweep()
     inline bool ManagesOwnMemory() const { return(false); }

     inline void OnStart(RefType* t) {
       // Give derived class the new reference
       ref_ = t;
       SetReference(t);
     }

     inline void OnAdd(MapType* u) {
       // Add(u); Do not add until deletions done in fixWindow()
       cache_.insert(u);
     }

     inline void OnDelete(MapType* u) {
       static typename OrderWin::iterator winIter;
       winIter = win_.find(u);
       if ( winIter != win_.end() ) { // update
         Delete(u);
         win_.erase(winIter);
       } else {
         cache_.erase(u);
       }
     }

     void OnDone() {
       fixWindow(); // deletions before insertions
       DoneReference();
     }

     inline void OnEnd() {
       End();
       win_.clear();
       cache_.clear();
     }

     inline void OnPurge() {
       // bed types can violate sweep's OnPurge() checks and they aren't really needed
     }

     explicit BedBaseVisitor(const DistType& d = DistType()) : dist_(d)
       { /* */ }
  
     virtual ~BedBaseVisitor() { /* */ }

     // Derived Class interface : Must be public due to MultiVisitor-type usage
     virtual void Delete(MapType*) = 0;
     virtual void Add(MapType*) = 0;
     virtual void DoneReference() = 0;
     virtual inline void SetReference(RefType*) { /* */ }
     virtual inline void End() { /* */ }
  
   private:
     void fixWindow() {
       // Deletions must come before insertions for consistency with sweep()
       // Realize that, at a minimum, ref_ has changed -> all time-consuming
       //  checks are necessary
  
       // Are any items in the window really out of range of 't'?  See problem 2.
       static std::list<MapType*> lst;
       auto winIter = win_.begin();
       while ( winIter != win_.end() ) {
         if ( dist_.Map2Ref(*winIter, ref_) != 0 ) {
           Delete(*winIter);
           lst.push_back(*winIter);
           win_.erase(winIter++);
         } else {
           ++winIter;
         }
       } // while

       auto cacheIter = cache_.begin();
       while ( cacheIter != cache_.end() ) {
         if ( 0 == dist_.Map2Ref(*cacheIter, ref_) ) {
           Add(*cacheIter);
           win_.insert(*cacheIter);
           cache_.erase(cacheIter++);
         } else {
           ++cacheIter;
         }
       } // while
  
       cache_.insert(lst.begin(), lst.end());
       lst.clear();
     }
  
   private:
     // MUST use a set with value/address compare here instead of a std::multiset
     //  This is for deleting elements and making sure multiple rows with the
     //  same coordinates each receive Delete() calls to derived classes.
     DistType dist_;
     RefType* ref_;
     OrderCache cache_;
     OrderWin win_;
  };

} // namespace Visitors

#endif // _BED_BASE_VISITOR_HPP