1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167
|
# Rudof Python bindings
The Python bindings for [rudof](https://rudof-project.github.io/) are called `pyrudof`. They are available at [pypi](https://pypi.org/project/pyrudof/).
For more information, you can access the [readthedocs documentation](https://pyrudof.readthedocs.io/en/latest/). We keep several tutorials about rudof as Jupyter notebooks in: [https://rudof-project.github.io/tutorials].
After compiling and installing this module, a Python library called `pyrudof` should be available.
## Build the development version
This module is based on [pyo3](https://pyo3.rs/) and [maturin](https://www.maturin.rs/).
To build and install the development version of `pyrudof` you need to clone this git repository, go to the `python` directory (the one this README is in) and run:
```
pip install maturin
```
followed by:
```sh
pip install .
```
If you are using `.env`, you can do the following:
```sh
python3 -m venv .venv
```
followed by:
```sh
source .venv/bin/activate
```
or
```sh
source .venv/bin/activate.fish
```
and once you do that, you can locally install que package as:
```sh
pip install -e .
```
## Running the tests
Go to the tests folder:
```sh
cd tests
```
and run:
```sh
python3 -m unittest discover -vvv
```
## Using rudof_generate
The `pyrudof` package includes bindings for `rudof_generate`, which allows you to generate synthetic RDF data from ShEx or SHACL schemas.
### Basic Example
```python
import pyrudof
# Create configuration
config = pyrudof.GeneratorConfig()
config.set_entity_count(100)
config.set_output_path("output.ttl")
config.set_output_format(pyrudof.OutputFormat.Turtle)
# Create generator
generator = pyrudof.DataGenerator(config)
# Load schema and generate data
generator.run("schema.shex")
```
### Configuration Options
The `GeneratorConfig` class provides many configuration options:
```python
config = pyrudof.GeneratorConfig()
# Generation parameters
config.set_entity_count(1000) # Number of entities to generate
config.set_seed(42) # Random seed for reproducibility
# Schema format
config.set_schema_format(pyrudof.SchemaFormat.ShEx) # or SchemaFormat.SHACL
# Output configuration
config.set_output_path("data.ttl")
config.set_output_format(pyrudof.OutputFormat.Turtle) # or OutputFormat.NTriples
config.set_compress(False) # Whether to compress output
config.set_write_stats(True) # Write generation statistics
# Cardinality strategy
config.set_cardinality_strategy(pyrudof.CardinalityStrategy.Balanced)
# Options: Minimum, Maximum, Random, Balanced
# Parallel processing
config.set_worker_threads(4) # Number of worker threads
config.set_batch_size(100) # Batch size for processing
config.set_parallel_writing(True) # Enable parallel file writing
config.set_parallel_file_count(4) # Number of output files (when parallel)
```
### Loading Schemas
You can load schemas in different ways:
```python
# Load ShEx schema
generator.load_shex_schema("schema.shex")
# Load SHACL schema
generator.load_shacl_schema("shapes.ttl")
# Auto-detect schema format
generator.load_schema_auto("schema_file")
# Then generate data
generator.generate()
```
### Complete Workflow
The `run()` method provides a convenient way to load a schema and generate data in one step:
```python
# Auto-detect format
generator.run("schema.shex")
# Specify format explicitly
generator.run_with_format("shapes.ttl", pyrudof.SchemaFormat.SHACL)
```
### Configuration Files
You can also load configuration from TOML or JSON files:
```python
# Load from TOML
config = pyrudof.GeneratorConfig.from_toml_file("config.toml")
# Load from JSON
config = pyrudof.GeneratorConfig.from_json_file("config.json")
# Save configuration
config.to_toml_file("saved_config.toml")
```
### Available Enums
- **SchemaFormat**: `ShEx`, `SHACL`
- **OutputFormat**: `Turtle`, `NTriples`
- **CardinalityStrategy**: `Minimum`, `Maximum`, `Random`, `Balanced`
For more examples, see the `examples/generate_example.py` file.
|