File: eiInit.Rd

package info (click to toggle)
r-bioc-eir 1.46.0%2Bds-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 384 kB
  • sloc: cpp: 59; makefile: 5
file content (124 lines) | stat: -rw-r--r-- 4,953 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
\name{eiInit}
\alias{eiInit}
\title{
   Initialize a compound database
}
\description{
   Takes the raw compound database in whatever format the given
   measure supports and creates a "data" directory.
}
\usage{
	eiInit(inputs,dir=".",format="sdf",descriptorType="ap",append=FALSE,
	conn=defaultConn(dir,create=TRUE), updateByName = FALSE, cl = NULL, connSource = NULL,
	priorityFn = forestSizePriorities,skipPriorities=FALSE)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{inputs}{
	  Either a filename of a file in \code{format} format, or an SDFset. This can
	  also be a vector of  filenames and if \code{cl} is also specified and if you database
	  supports it (SQLite does not), it will load these file in parallel on the cluster.
   }
  \item{dir}{
      The directory where the "data" directory lives. Defaults to the
      current directory.
   }
	\item{format}{
		The format of the data in \code{inputs}. Currently only "sdf" and "smiles" is 
		supported.
	}
	\item{descriptorType}{
		The format of the descriptor. Currently supported values are "ap" for atom pair, and 
		"fp" for fingerprint.
	}
	\item{append}{
		If true the given compounds will be added to an existing database
		and the <data-dir>/Main.iddb file will be updated with the new
		compound id numbers. This should not normally be used directly, use
		\code{\link{eiAdd}} instead to add new compounds to a database.
	}
	\item{conn}{
		Database connection to use. If a connection is given, you must ensure that it has been initialized using 
		the \code{\link{initDb}} function from ChemmineR before calling \code{\link{eiInit}}.
	}
	\item{updateByName}{
		If true we make the assumption that all compounds, both in the existing database and the
		given dataset, have unique names. This function will then avoid re-adding existing,
		identical compounds, and will update existing compounds with a new definition if a new
		compound definition with an existing name is given. 

		If false, we allow duplicate compound names to exist in the database, though not
		duplicate definitions. So identical compounds will not be re-added, but if a new version of
		an existing compound is added it will not update the existing one, it will add the modified one 
		as a completely new compound with a new compound id. 
	}
  \item{cl}{
     A SNOW cluster can be given here to run this function in
     parallel.
   }
	\item{connSource}{
		A function returning a new database connection. Note that it is not sufficient to return a
		reference to an existing connection, it must be a distinct, new connection.  
		This is needed for cluster operations
		that make use of the database as they will each need to create a new connection.
		If not given, certain parts of this function will not be parallelized.

		This function can also be used to setup the environment on the cluster worker nodes. For
		example, you might need to re-load libraries like RSQLite and such.
	}
	\item{priorityFn}{
		This option takes a function that takes a list of compound ids and returns
		a data frame with the compound ids as the column 'compound_id', and their priority as
		the column 'priority'. There are two pre-defined functions in ChemmineR:
		'randomPriorities', and 'forestSizePriorities' (default). 

		When several compounds map to the same descriptor, then when some functions need to go
		from a descriptor to a compound, there is ambiguity about which compound to select. In
		that case, it will pick the compound with the highest priority. 
	}
	\item{skipPriorities}{
		If this is true, then no priority values will be computed. See option \code{priorityFn} 
		for an explanation of priorities.
	}

}
\details{


   EiInit can take either an SDFset, or a filename.  SDF and SMILES is supported
   by default.
   It might complain if your SDF file does not
   follow the SDF specification. If this happens, you can create an
   SDFset with the \code{read.SDFset} command and then use that
   instead of the filename.  
	
	EiInit will create  a folder called
   'data'. Commands should always be executed in the folder containing
   this directory (ie, the parent directory of "data"), or else
   specify the location of that directory with the \code{dir} option.

}
\value{
   A directory called "data" will have been created in the current working directory.
	The generated compound ids of the given compounds will be returned. These can be used to 
	reference a compound or set of compounds in other functions, such as \code{\link{eiQuery}}.
}
\author{
   Kevin Horan
}


\seealso{
   \code{\link{eiMakeDb}}
   \code{\link{eiPerformanceTest}}
   \code{\link{eiQuery}}
}
\examples{
   data(sdfsample)
   dir=file.path(tempdir(),"init")
   dir.create(dir)
   eiInit(sdfsample,dir=dir,priorityFn=randomPriorities)
}
% Add one or more standard keywords, see file 'KEYWORDS' in the
% R documentation directory.
%\keyword{ ~kwd2 }% __ONLY ONE__ keyword per line