Organization of scripts and data
< Previous section Next section >
This section provides a suggestion for a general organization of datasets and scripts. Of course, there is no obligation to adhere to the exact same workflow. Experts might want to adapt the workflow to their needs. However, this is a setup that works well for many, even large scale, projects and we hence recommend to follow a similar structure!
This section contains only a brief, abstract description. In the next section(s), we will give a short, concise example for how to apply this structure to an actual use case with LIS Pro 3D!
Recommended Directory Structure
Generally, we often use the following folder structure for a project:
project # top-level folder
├── data
├── scripts
└── processing
├── grids
│ └── ...
├── shapes
│ └── ...
└── point_clouds
└── ...
You don’t need to manually create these directories yet, we will do so in the next section!
project folder
This is the top-level folder where all our scripts and data go. It offers a separated space on your computer which is not cluttered with any other files!
data folder
This folder contains your input data, for example LAS files. We store this separately from the datasets that we create during processing.
Of course, the data folder does not have to be located within the project folder. Large input datasets are often stored in a special location; you only need read access to this location.
scripts folder
This folder contains all scripts (configuration, data preparation and processing).
processing folder
This is the folder where we typically write intermediate- or final output data. These may be grids, vector datasets or point clouds but could also be structured by the processing steps (prepared_pointcloud, classified_pointcloud, final_pointcloud, etc.).
Organization of Processing Scripts
For the automated point cloud processing with LIS Pro 3D, we recommend to organize your Python scripts as follows. However, if you have a long workflow, it is recommended to split the actual data processing into smaller subsections (01_data_classification.py, 02_tree_detection.py, 03_vectorization.py, etc.):
scripts
├── project_config.py
├── data_preparation.py
├── data_processing.py
├── data_delivery.py
└── run_script.py
project_config.py
This script contains the basic project configuration, such as:
- (most importantly):
sys.path.insert(0, os.environ['SAGA_PATH'])makes the PySAGA accessible from your environment - without this line, you won’t be able to access LIS Pro 3Ds tools! - Paths to datasets (e.g., the path to the input data folder)
- Tool parameters (e.g., overlap of processing units, output resolution, …)
data_preparation.py
Most datasets need to be prepared and organized before processing. This is especially important for projects with many individual scan positions or overlapping flight strips that have not yet been connected.
We recommend performing these preparatory steps in a separate script. This script should usually handle the following tasks:
- Create a LAS/LAZ index for all LAS/LAZ files
- Create a point cloud catalog (allowing seamless access)
- Create a vector layer with the bounding boxes of the point cloud catalog
- Create a tiling scheme for the processing units
data_processing.py
This script defines the actual processing steps:
- It imports paths to datasets for both input and output files from the
project_config.pyfile - It imports tool parameters from the
project_config.pyfile - It defines a central processing function (including all steps of the workflow), which operates on individual units (i.e., subsets) of your point cloud data
- It takes care of parallelization
data_delivery.py
This script will finalize your generated data products: It divides the final dataset into tiles with a specified size for delivery to the customer.
run_script.py
This script will be a thin wrapper around our preparation, processing and delivery scripts that mainly redirects logs (both successfull tool executions and errors) to text files. We will take care of it in the last section that covers logging with LIS Pro 3D. Especially in parallel processing, special functions are needed to handle outputs from different processes. The run script writes separate log files for standard out (normal process execution) and error out (errors). It can also be used to execute several scripts one after the other in one pass. For now, it is sufficient to note that for parallelized processing pipelines we will not use the regular print(...) statements but some PySAGA own custom methods for logging.