This is a Python assistant code for DinverExt Geopsy 3.3.0 which has problems on parallel computation.
Why Geopsy 3.3.0
I started with Geopsy 3.4.2, but the version check told me to use Qt version >= 5.14, and thus I turned to Geopsy 3.3.0 which requires Qt 5.12. Although later I realized that "the version check can be skipped and up to Qt 5.11 the compilation may be successful" (from Marc).
Problems of DinverExt Geopsy 3.3.0
This is in detail in the question (viewtopic.php?t=396).
Geopsy 3.3.0 uses various threads to conduct parallel computation. Each thread creates a sub-directory and then a 'parameters' file in that sub-directory. Each thread should entre its sub-directory before calling the forward calculation code, but the entring sub-directory step is skipped. Thus, the path remains in the starting directory, and the forward calculation code can not find the 'parameters' file in the current path.
What needs to be done
a. Find a latest updated 'parameters' file;
b. Avoid the I/O confilct in parallel computation.
One solution
Step 1: recording files
- use one file (e.g., filenames.dat) to store all local filenames before starting the inversion;
- use one directory (e.g., running) to record the sub-direcotry in which threads are running;
Step 2: search for a updated 'parameters' file
After the inversion started, various threads will create sub-directories and then one 'parameters' file in each sub-directory. Then the threads call the forward code with the path remaining the starting path.
- The forward code use the Python 'fcntl' lib (Python lib for File Lock in Linux) to read 'filenames.dat', which ensures that only one thread is searching for a updated 'parameters' file;
- the sub-directory names can be obtained by the difference of the filenames in 'filenames.dat' and the filenames in current path;
- go through all sub-directory names and find the updated 'parameters' file when it satisfies
a. Not in the 'running' directory;
b. 'misfit' not exists in the sub-directory, or the modified datetime of 'misfit' is earlier than
'parameters'.
- when a updated 'parameters' file is found, create a file named after the sub-directory in the 'running' directory, and release the file lock for 'filenames.dat' to allow the next thread to start the Step 2.
Step 3: running forward calculation
- run the forward calculation;
- when finished, generate the 'misfit' in the sub-directory, and remove the file named by this sub-directory from the 'running' directory.
Python code
Code: Select all
### Creator: Wanbo Xiao (wbxiao@pku.edu.cn)
### July 11, 2023
import os
import time
import fcntl
### search for the updated 'parameters' file
if True:
# change to your PATH below
PATH = '/home/'
fns_now = os.listdir(PATH)
fn_to_run = None # variable for the sub-directory with the updated 'parameters' file
with open(PATH + '/filenames.dat', 'r') as f:
fcntl.flock(f, fcntl.LOCK_EX)
fns_pre = [item.strip() for item in f.readlines()]
fns_add = [item for item in fns_now if item not in fns_pre]
time.sleep(0.1) # adding this seems to improve the I/O conflict for me, you can remove it after testing
if len(fns_add) > 0:
for fn in fns_add:
temp_path = PATH + '/' + fn
# not running
if os.path.exists(PATH + '/running/' + fn):
continue
# 'misfit' not exists, or modified earlier than 'parameters'
if not os.path.exists(temp_path + '/misfit'):
fn_to_run = fn
break
elif os.path.getmtime(temp_path+'/misfit') < os.path.getmtime(temp_path+'/parameters'):
fn_to_run = fn
break
# if found, record in 'running' directory
if fn_to_run is not None:
os.system('touch %s/running/%s'%(PATH, fn))
fcntl.flock(f, fcntl.LOCK_UN)
# this is a log file to record the results of the above searching, remove it if you want
with open(PATH + '/order.log','a') as f:
f.write('%s %d\n'%(fn_to_run, len(fns_add)))
### start the forward calculation
if fn_to_run is not None:
new_path = PATH + '/' + fn_to_run
# the forward calculation here is to get the average value of 'parameters', as an example
# change it to your forward calculation below
with open(new_path + '/parameters', 'r') as f:
lines = f.readlines()
num = 0
for line in lines:
num += float(line.strip())
# generate the 'misfit' file in the sub-directory
with open(new_path + '/misfit', 'w') as f:
f.write('%.2f\n'%(num))
# remove the file named after the sub-directory from 'running'
os.system('rm %s/running/%s'%(PATH, fn))
The order.log is as below.
Starts with many lines of 'None 0' (line number almost equals the core num of my machine).
'None' means no updated 'parameters' file is found. '0' means no new sub-directory is found.
Code: Select all
None 0
None 0
None 0
None 0
None 0
......
'7f1d*' are the names of the sub-directories. '56' means I have 56 cores on my machine.
Code: Select all
7f1d28064a50 56
7f1d28038a50 56
7f1d280991c0 56
7f1d280c4b00 56
7f1d28099d00 56
7f1d280b6e00 56
......
Code: Select all
None 56
None 56
None 56
None 56
None 56
......
Occasionally 'None 12' occurs in the middle but very few.
Code: Select all
7f1d28089290 112
7f1d280d7450 112
7f1d28032870 112
7f1d280a0ae0 112
7f1d280537a0 112
......
7f1d28102b30 112
7f1d28028e40 112
7f1d2808e2d0 112
None 112
7f1d280716a0 112
7f1d28023f80 112
7f1d280679a0 112
7f1d28084b90 112
......
Code: Select all
None 112
None 112
None 112
None 112
None 112
......
Final words
Feel free to use and improve this script. But the best way may be to upgrade to the latest version.
Wanbo