File: README.md

package info (click to toggle)
rudof 0.1.146%2Bds-1
  • links: PTS, VCS
  • area: main
  • in suites: forky
  • size: 10,048 kB
  • sloc: python: 1,288; makefile: 32; sh: 1
file content (167 lines) | stat: -rw-r--r-- 4,061 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# Rudof Python bindings

The Python bindings for [rudof](https://rudof-project.github.io/) are called `pyrudof`. They are available at [pypi](https://pypi.org/project/pyrudof/).

For more information, you can access the [readthedocs documentation](https://pyrudof.readthedocs.io/en/latest/). We keep several tutorials about rudof as Jupyter notebooks in: [https://rudof-project.github.io/tutorials].

After compiling and installing this module, a Python library  called `pyrudof` should be available.  

## Build the development version

This module is based on [pyo3](https://pyo3.rs/) and [maturin](https://www.maturin.rs/).

To build and install the development version of `pyrudof` you need to clone this git repository, go to the `python` directory (the one this README is in) and run:

```
pip install maturin
```

followed by:

```sh
pip install .
```

If you are using `.env`, you can do the following:

```sh
python3 -m venv .venv
```

followed by: 

```sh
source .venv/bin/activate
```

or

```sh
source .venv/bin/activate.fish
```

and once you do that, you can locally install que package as:

```sh
pip install -e .
```

## Running the tests

Go to the tests folder: 

```sh
cd tests
```

and run: 

```sh
python3 -m unittest discover -vvv
```

## Using rudof_generate

The `pyrudof` package includes bindings for `rudof_generate`, which allows you to generate synthetic RDF data from ShEx or SHACL schemas.

### Basic Example

```python
import pyrudof

# Create configuration
config = pyrudof.GeneratorConfig()
config.set_entity_count(100)
config.set_output_path("output.ttl")
config.set_output_format(pyrudof.OutputFormat.Turtle)

# Create generator
generator = pyrudof.DataGenerator(config)

# Load schema and generate data
generator.run("schema.shex")
```

### Configuration Options

The `GeneratorConfig` class provides many configuration options:

```python
config = pyrudof.GeneratorConfig()

# Generation parameters
config.set_entity_count(1000)           # Number of entities to generate
config.set_seed(42)                     # Random seed for reproducibility

# Schema format
config.set_schema_format(pyrudof.SchemaFormat.ShEx)  # or SchemaFormat.SHACL

# Output configuration
config.set_output_path("data.ttl")
config.set_output_format(pyrudof.OutputFormat.Turtle)  # or OutputFormat.NTriples
config.set_compress(False)              # Whether to compress output
config.set_write_stats(True)            # Write generation statistics

# Cardinality strategy
config.set_cardinality_strategy(pyrudof.CardinalityStrategy.Balanced)
# Options: Minimum, Maximum, Random, Balanced

# Parallel processing
config.set_worker_threads(4)            # Number of worker threads
config.set_batch_size(100)              # Batch size for processing
config.set_parallel_writing(True)       # Enable parallel file writing
config.set_parallel_file_count(4)       # Number of output files (when parallel)
```

### Loading Schemas

You can load schemas in different ways:

```python
# Load ShEx schema
generator.load_shex_schema("schema.shex")

# Load SHACL schema
generator.load_shacl_schema("shapes.ttl")

# Auto-detect schema format
generator.load_schema_auto("schema_file")

# Then generate data
generator.generate()
```

### Complete Workflow

The `run()` method provides a convenient way to load a schema and generate data in one step:

```python
# Auto-detect format
generator.run("schema.shex")

# Specify format explicitly
generator.run_with_format("shapes.ttl", pyrudof.SchemaFormat.SHACL)
```

### Configuration Files

You can also load configuration from TOML or JSON files:

```python
# Load from TOML
config = pyrudof.GeneratorConfig.from_toml_file("config.toml")

# Load from JSON
config = pyrudof.GeneratorConfig.from_json_file("config.json")

# Save configuration
config.to_toml_file("saved_config.toml")
```

### Available Enums

- **SchemaFormat**: `ShEx`, `SHACL`
- **OutputFormat**: `Turtle`, `NTriples`
- **CardinalityStrategy**: `Minimum`, `Maximum`, `Random`, `Balanced`

For more examples, see the `examples/generate_example.py` file.