A Python assistant code for DinverExt Geopsy 3.3.0

This forum is dedicated to discuss all problems and suggestions related to the inversion software
Post Reply
WanboXiao
Posts: 5
Joined: Tue Jun 14, 2022 6:03 pm

A Python assistant code for DinverExt Geopsy 3.3.0

Post by WanboXiao »

Summary
This is a Python assistant code for DinverExt Geopsy 3.3.0 which has problems on parallel computation.

Why Geopsy 3.3.0
I started with Geopsy 3.4.2, but the version check told me to use Qt version >= 5.14, and thus I turned to Geopsy 3.3.0 which requires Qt 5.12. Although later I realized that "the version check can be skipped and up to Qt 5.11 the compilation may be successful" (from Marc).

Problems of DinverExt Geopsy 3.3.0
This is in detail in the question (viewtopic.php?t=396).
Geopsy 3.3.0 uses various threads to conduct parallel computation. Each thread creates a sub-directory and then a 'parameters' file in that sub-directory. Each thread should entre its sub-directory before calling the forward calculation code, but the entring sub-directory step is skipped. Thus, the path remains in the starting directory, and the forward calculation code can not find the 'parameters' file in the current path.

What needs to be done
a. Find a latest updated 'parameters' file;
b. Avoid the I/O confilct in parallel computation.

One solution
Step 1: recording files
- use one file (e.g., filenames.dat) to store all local filenames before starting the inversion;
- use one directory (e.g., running) to record the sub-direcotry in which threads are running;

Step 2: search for a updated 'parameters' file
After the inversion started, various threads will create sub-directories and then one 'parameters' file in each sub-directory. Then the threads call the forward code with the path remaining the starting path.
- The forward code use the Python 'fcntl' lib (Python lib for File Lock in Linux) to read 'filenames.dat', which ensures that only one thread is searching for a updated 'parameters' file;
- the sub-directory names can be obtained by the difference of the filenames in 'filenames.dat' and the filenames in current path;
- go through all sub-directory names and find the updated 'parameters' file when it satisfies
a. Not in the 'running' directory;
b. 'misfit' not exists in the sub-directory, or the modified datetime of 'misfit' is earlier than
'parameters'.
- when a updated 'parameters' file is found, create a file named after the sub-directory in the 'running' directory, and release the file lock for 'filenames.dat' to allow the next thread to start the Step 2.

Step 3: running forward calculation
- run the forward calculation;
- when finished, generate the 'misfit' in the sub-directory, and remove the file named by this sub-directory from the 'running' directory.

Python code

Code: Select all

### Creator: Wanbo Xiao (wbxiao@pku.edu.cn)
### July 11, 2023
import os
import time
import fcntl

### search for the updated 'parameters' file
if True:
    # change to your PATH below
    PATH = '/home/'
    fns_now = os.listdir(PATH)

    fn_to_run = None  # variable for the sub-directory with  the updated 'parameters' file
    with open(PATH + '/filenames.dat', 'r') as f:
        fcntl.flock(f, fcntl.LOCK_EX)
        fns_pre = [item.strip() for item in f.readlines()]
        fns_add = [item for item in fns_now if item not in fns_pre]
        time.sleep(0.1)  # adding this seems to improve the I/O conflict for me, you can remove it after testing
        
        if len(fns_add) > 0:
            for fn in fns_add:
                temp_path = PATH + '/' + fn
                # not running
                if os.path.exists(PATH + '/running/' + fn):
                    continue
                # 'misfit' not exists, or modified earlier than 'parameters'
                if not os.path.exists(temp_path + '/misfit'):
                    fn_to_run = fn
                    break
                elif os.path.getmtime(temp_path+'/misfit') < os.path.getmtime(temp_path+'/parameters'):
                    fn_to_run = fn
                    break
        
        # if found, record in 'running' directory
        if fn_to_run is not None:
            os.system('touch %s/running/%s'%(PATH, fn))
        fcntl.flock(f, fcntl.LOCK_UN)
    
    # this is a log file to record the results of the above searching, remove it if you want
    with open(PATH + '/order.log','a') as f:
        f.write('%s %d\n'%(fn_to_run, len(fns_add)))


### start the forward calculation
if fn_to_run is not None:
    new_path = PATH + '/' + fn_to_run
    # the forward calculation here is to get the average value of 'parameters', as an example
    # change it to your forward calculation below
    with open(new_path + '/parameters', 'r') as f:
        lines = f.readlines()
    num = 0
    for line in lines:
        num += float(line.strip())
    # generate the 'misfit' file in the sub-directory
    with open(new_path + '/misfit', 'w') as f:
        f.write('%.2f\n'%(num))
    # remove the file named after the sub-directory from 'running'
    os.system('rm %s/running/%s'%(PATH, fn))
Results
The order.log is as below.

Starts with many lines of 'None 0' (line number almost equals the core num of my machine).
'None' means no updated 'parameters' file is found. '0' means no new sub-directory is found.

Code: Select all

None 0
None 0
None 0
None 0
None 0
......
Then follows with many lines running the forward calcualtion.
'7f1d*' are the names of the sub-directories. '56' means I have 56 cores on my machine.

Code: Select all

7f1d28064a50 56
7f1d28038a50 56
7f1d280991c0 56
7f1d280c4b00 56
7f1d28099d00 56
7f1d280b6e00 56
......
Then follows with many lines of 'None 56' (line number almost equals the core num of my machine).

Code: Select all

None 56
None 56
None 56
None 56
None 56
......
Then follows with many lines running the forward calcualtion again.
Occasionally 'None 12' occurs in the middle but very few.

Code: Select all

7f1d28089290 112
7f1d280d7450 112
7f1d28032870 112
7f1d280a0ae0 112
7f1d280537a0 112
......
7f1d28102b30 112
7f1d28028e40 112
7f1d2808e2d0 112
None 112
7f1d280716a0 112
7f1d28023f80 112
7f1d280679a0 112
7f1d28084b90 112
......
Finally, ends with many lines of 'None 112' (line number almost equals the core num of my machine).

Code: Select all

None 112
None 112
None 112
None 112
None 112
......
In sum, the number of lines with 'None' almost equals four times the cores of my machine.

Final words
Feel free to use and improve this script. But the best way may be to upgrade to the latest version.

Wanbo
admin
Site Admin
Posts: 822
Joined: Mon Aug 13, 2007 11:48 am
Location: ISTerre
Contact:

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by admin »

Thanks for sharing this detailed report and code.
Did you check that the issue is completely fixed with 3.4.2?
WanboXiao
Posts: 5
Joined: Tue Jun 14, 2022 6:03 pm

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by WanboXiao »

Hi Marc,

If you mean the issue of not entring the sub-directories. I still have this problem with the Geopsy 3.4.2 on Linux (installed with Qt 5.12.12). This time even with my Python code it was not working. The inversion always terminated with 'Cannot find 'misfit' in sub-directories'.

If I use a forward code that records its current path as below

Code: Select all

import os
import fcntl

### write out current path
PATH = '/home/ybwang_pkuhpc/ybwang_cls/lustre2/Project/DFA_Inv'
with open(PATH + '/temp_path.log', 'a') as f:
    fcntl.flock(f, fcntl.LOCK_EX)
    f.write('%s\n'%(os.getcwd()))
    fcntl.flock(f, fcntl.LOCK_UN)
    
### write out misfit
misfit_ave = 1.0
with open('misfit', 'w') as f:
    f.write('%.6f\n'%(misfit_ave))
The output file 'temp_path.log' contains multiple lines of '/home/ybwang_pkuhpc/ybwang_cls/lustre2/Project/DFA_Inv' (my starting path), meaning that the path remains in the starting path instead of entring the sub-directories.

With my Python code, the output file 'order.log' contains multiple lines of 'None 0', which means no new sub-directories are found. Would it be possible that the threads call the forward code before creating the sub-directories?

One thing to note is that I had an error when installing the 3.4.2 source package at the 'make' step.

Code: Select all

/home/ybwang_pkuhpc/ybwang_cls/lustre2/utilities/Geopsy/geopsypack-src-3.4.0/QGpCoreTools/src/Global.cpp: In Function ‘void QGpCoreTools::skipOpenBlasMultiThreading()’:
/home/ybwang_pkuhpc/ybwang_cls/lustre2/utilities/Geopsy/geopsypack-src-3.4.0/QGpCoreTools/src/Global.cpp:76:31: Error:‘openblas_set_num_threads’ was not declared in this scope
     openblas_set_num_threads(1);
I solved this by adding

Code: Select all

extern "C" void openblas_set_num_threads(int num_threads);
at the beginning of QGpCoreTools/src/Global.cpp (following this solution https://github.com/Microsoft/CNTK/pull/2365). But I am not sure if this causes the above issue.
Last edited by WanboXiao on Wed Jul 12, 2023 11:45 am, edited 2 times in total.
admin
Site Admin
Posts: 822
Joined: Mon Aug 13, 2007 11:48 am
Location: ISTerre
Contact:

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by admin »

I checked dinverext module with your Python code and rms5 provided as an example.
If the command is provided as a relative path to the working directory, it does not work (3.4.2). With an absolute path, it works fine for me, is it the same for you? Did you try with a relative or with an absolute path? Anyhow, for next releases, relative path to working directory are also accepted.

I also added a proper clean up of the thread directories at the end of the process.

The issue you get is probably linked to a bad detection of Open BLAS include path. I remember I fix it several weeks ago. This is probably fixed with 3.5.0-preview which is currently available via the git repository.

Relative path are now accepted in 3.5.0-preview. You can access it right now.
Best regards,
Marc
WanboXiao
Posts: 5
Joined: Tue Jun 14, 2022 6:03 pm

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by WanboXiao »

Geopsy 3.4.2
Thank you, Marc. It is working with absolute path for 3.4.2! With no need to use the Python assistant code.

At first, I was only focusing on the absolute path of setting the working directory, then I realized that I have to add the absolute path in the command (e.g., 'python ${absolut_path}/test.py').

P.S.
I red-installed the Open BLAS package and got no error of openblas in the compilation this time. Then I re-installed Geopsy 3.4.2 with every step being the absolute path. I am not sure if these steps are necessary.


Geopsy 3.5.0
Since I am using Qt 5.12.12, I encountered many errors caused by Qt version difference (e.g., function to get the mouse pointer position). I changed everywhere needed to fit my Qt 5.12.12.

I also got error like "not found -lhdf5_serial". I changed all 'hdf5_serial' to 'hdf5' in the 'configure-3.5' files of any directories that show this error.

With these changes, the Geopsy 3.5.0 can be installed on my Linux platform. I found the 'dinverext' version is 1.0.3, lower than 1.0.5 of Geopsy 3.4.2.

When I run the dinverext with the same command as I used for Geopsy 3.4.2, it quickly stoped with the following error. It seems that the absolute path is automatically added, but the generated command is not a runable Linux command. Or it is related to the Qt version difference or the changes that I made.

Code: Select all

/lustre2/ybwang_pkuhpc/ybwang_cls/Project/DFA_Inv/python3.8 /lustre2/ybwang_pkuhpc/ybwang_cls/Project/DFA_Inv/test.py failed to start. 
Either /lustre2/ybwang_pkuhpc/ybwang_cls/Project/DFA_Inv/python3.8 /lustre2/ybwang_pkuhpc/ybwang_cls/Project/DFA_Inv/test.py is missing, or you may have insufficient permissions to invoke it.
Error setting target, see above messages
Error creating forward object


Best,
Wanbo
Last edited by WanboXiao on Sun Jul 16, 2023 11:20 am, edited 2 times in total.
admin
Site Admin
Posts: 822
Joined: Mon Aug 13, 2007 11:48 am
Location: ISTerre
Contact:

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by admin »

Hi Wanbo,

Thanks for pointing out the version number issue. At the time branch 3.5 was created the version of dinverext was 1.0.3 but branch 3.4 continues to evolve a bit up to 1.0.5. The merge process between branches was not taking this update into account. This is now fixed, many other packets were impacted.

HDF5 linkage and the variety of hdf5 libraries across Linux distributions was not handled at all. Last week, I wrote custom a configure script for the library GeopsyCore to manage the presence or not of HDF5 under its various forms (probably still not universal). I forgot to push these changes to the repository. If you have a bit of time, I would be curious to check it under your environment.

For the modification to support Qt 5.12, I would be interested in a "diff -Naur" of the original tree and your modified tree. I can wait a bit before updating the repository, meanwhile you can download a fresh copy of the original tree to make the diff, in case you did not keep the original tree. Once this is done, I will update the repository.

Marc
admin
Site Admin
Posts: 822
Joined: Mon Aug 13, 2007 11:48 am
Location: ISTerre
Contact:

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by admin »

I checked the compatibility with Qt-5.11 (the system version of my Debian 10) with a few new modifications. They are now included and available in 3.5.0-preview.
WanboXiao
Posts: 5
Joined: Tue Jun 14, 2022 6:03 pm

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by WanboXiao »

Hi Marc,

Sorry I am a bit late to answer you. I was helping my colleague installing another software these days. I will post the details given by 'diff -Naur' tomorrow if still needed.

Best,
Wanbo
admin
Site Admin
Posts: 822
Joined: Mon Aug 13, 2007 11:48 am
Location: ISTerre
Contact:

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by admin »

If you start a fresh compilation from the current 3.5.0-preview, it should support Qt-5.12 without modification (better if you can confirm). There is no need to run a diff -Naur, unless you encounter other issues.
WanboXiao
Posts: 5
Joined: Tue Jun 14, 2022 6:03 pm

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by WanboXiao »

The fresh 3.5.0-preview is okay for the Qt 5.12.12 now, while the HDF5 library still gives errors. I can make the installation done after changing the "-lhdf5_serial" to "-lhdf5" in the "configure-3.5" files of directories: geopsy-fk, geopsy-spac, geopsy-hv, geopsy, vslarray, waran.

My HDF5 environment (v1.8.23) is imported using the code below.

Code: Select all

#!/bin/bash
H5DIR=/appsnew/usr/HDF5/1.8.23
export HDF5_ROOT=${H5DIR}

export CFLAGS=-I${H5DIR}/include
export CXXFLAGS=-I${H5DIR}/include
export CPPFLAGS=-I${H5DIR}/include

export LDFLAGS=-L${H5DIR}/lib
export LD_LIBRARY_PATH=${H5DIR}/lib:$LD_LIBRARY_PATH

export PATH=${H5DIR}/bin:$PATH

My installation may not be perfect. When I add a new run in dinverext, I got the similar error as before.

Code: Select all

Error creating a new inversion thread.
/lustre2/ybwang_pkuhpc/ybwang_cls/Project/DFA_Inv/sbatch_jobs/python3.8 test.py failed to start. 
Either /lustre2/ybwang_pkuhpc/ybwang_cls/Project/DFA_Inv/sbatch_jobs/python3.8 test.py is missing, or you may have insufficient permissions to invoke it.
Error setting target, see above messages
admin
Site Admin
Posts: 822
Joined: Mon Aug 13, 2007 11:48 am
Location: ISTerre
Contact:

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by admin »

The fresh 3.5.0-preview is okay for the Qt 5.12.12 now, while the HDF5 library still gives errors. I can make the installation done after changing the "-lhdf5_serial" to "-lhdf5" in the "configure-3.5" files of directories: geopsy-fk, geopsy-spac, geopsy-hv, geopsy, vslarray, waran.
hdf5 library is explicitly referenced in GeopsyCore library. After checking, the explicit link for these apps is useless. I cleaned the dependencies. If you have still a bit of time, you can update your tree, make clean, configure and make again. I hope that it runs now without error.
My installation may not be perfect. When I add a new run in dinverext, I got the similar error as before.
Because you probably have several versions installed, did you run the following command?

Code: Select all

/path/to/the/version/you/want/to/run/dinver -clear-plugins
The code exists immediately. The next time you run dinver, the plugin path is reset to the default installed one and you can be sure that the plugins are belonging to the same version as the core application. If the installed versions have only minor modifications, plugins and apps are all binary compatible, but you may miss the expected modifications.

Did you try to set an absolute path to test.py?
I checked with the rms5 example provided with dinverext (in share folder). A relative path for the command is converted to an absolute path but not for the arguments.

Code: Select all

Working directory: /home/wathelem/sandbox/run1

Code: Select all

$ cd /home/wathelem/sandbox
$ ls
rms5.bash run1

Code: Select all

External command: ../rms.bash
External command: /bin/bash ../../rms.bash
both work fine, but the following do not

Code: Select all

External command: bash ../../rms.bash
External command: /bin/bash ../rms.bash
WanboXiao
Posts: 5
Joined: Tue Jun 14, 2022 6:03 pm

Re: A Python assistant code for DinverExt Geopsy 3.3.0

Post by WanboXiao »

Thanks, Marc! All your advice work well.

I git the latest 3.5.0-preview and installed without any errors. The -clear-plugins command also works. Now I can run dinverext with 3.5.0-preview. I just realized that my previous external command is not correct.

Code: Select all

Previous external command: python3.8 /my/absolute/path/test.py
The absolute path will be added to the "python3.8", which will cause errors.
When I add the python3.8 path in the headline of my Python script and make it an executable file, the external command below can work.

Code: Select all

External command: /my/absolute/path/test.py
I also tried your adviced command and it works.

Code: Select all

External command: /bin/bash /my/absolute/path/my.sh
Post Reply