After running for a long time, an IOC using StreamDevice and ASYN exhibits significant deviations between the message transmission intervals and the set scan period. #102

1458861693 · 2025-01-20T08:38:15Z

I have developed an EPICS IOC to control an X-ray source. The IOC needs to continuously send messages to the X-ray source controller at a 100ms interval to maintain the connection with the X-ray source. These messages serve as heartbeat packets. If the time interval between two consecutive heartbeat packets exceeds 3 seconds, the high voltage of the X-ray source will be automatically set to 0kV.

Additionally, upon power-up, the X-ray source sends unsolicited operational status data to the IOC at a 100ms interval. This unsolicited data provides continuous updates on the working status of the X-ray source without requiring any explicit requests from the IOC.

I have developed an IOC using StreamDevice and ASYN, which successfully receives and parses the status data from the X-ray source. Now, I can monitor the status of the X-ray source in real-time. However, there are some issues with sending heartbeat packets to maintain the connection with the X-ray source.

In the db file, I set "SCAN=0.1", which means processing the record every 100ms. As follows:

record(ao, "$(P)source:sendMessages") {
    field(DESC, "send control messages")
    field(DTYP, "stream")
    field(SCAN, ".1 second")
    field(OUT, "@XraySource.proto SendMsg($(P)kV.VAL,$(P)mA.VAL,$(P)source:ctrlData.VAL,$(P)exposure:time:amp.VAL) $(PORT)")
    field(FLNK, "$(P)source:cleanMessages")     # clean data after send messages
  }

Initially, the IOC runs well, and the message transmission intervals are around 100ms. As shown in the figure below, data starting with "5A" is sent by the IOC, and data starting with "A5" is sent by the X-ray source. Initially, everything works fine.

However, after running for a while, approximately 2 to 3 hours, I noticed that the message transmission intervals become unstable, sometimes being 100ms, sometimes 400ms, and sometimes even 2000ms. The following figure shows the IOC running after 5 hours.

I am not sure if the message transmission intervals will continue to increase as the IOC keeps running, but I cannot allow this interval to exceed 3 seconds because the X-ray source controller will automatically shut down the high voltage.

I am not sure what is causing this issue. Currently, I suspect that there might be a problem with the ASYN configuration, but after reviewing the documentation, I still haven't found the root cause. I need help to resolve this issue.

The versions of ASYN and StreamDevice I am using are asyn-R4-44 and StreamDevice-master, and the EPICS version is base-7.0.8.
The db, protocol files and st.cmd file are attached.
Thanks very much.

The text was updated successfully, but these errors were encountered:

dirk-zimoch · 2025-01-20T08:55:59Z

Do you get any error messages from your IOC about scan processing overruns?

dirk-zimoch · 2025-01-20T09:09:14Z

The db, protocol files and st.cmd file are attached.

I cannot find them.

1458861693 · 2025-01-20T09:35:04Z

Sorry, I forgot to upload the above files. It has now been uploaded to the attachment

db and proto and cmd ZipFiles.zip

The IOC looks normal and there are no errors reported. As follows.

#!../../bin/linux-x86_64/XraySource
< envPaths
epicsEnvSet("IOC","iocXraySource")
epicsEnvSet("TOP","/home/ls/epics/XraySource")
epicsEnvSet("SUPPORT","/home/ls/epics/softIoc/synApps_6_1/support")
epicsEnvSet("ASYN","/home/ls/epics/softIoc/synApps_6_1/support/asyn-R4-44/")
epicsEnvSet("AUTOSAVE","/home/ls/epics/softIoc/synApps_6_1/support/autosave-R5-10/")
epicsEnvSet("CALC","/home/ls/epics/softIoc/synApps_6_1/support/calc-R3-7-3/")
epicsEnvSet("STREAM","/home/ls/epics/softIoc/synApps_6_1/support/StreamDevice-master/")
epicsEnvSet("SNCSEQ","/home/ls/epics/softIoc/synApps_6_1/support/seq-2-2-9/")
epicsEnvSet("EPICS_BASE","/home/ls/epics/softIoc/base-7.0.8")
cd "/home/ls/epics/XraySource"
## Register all support components
dbLoadDatabase "dbd/XraySource.dbd"
XraySource_registerRecordDeviceDriver pdbbase
# create communicate port and set the communicate parameters of X ray Source
### test ip and port ###
drvAsynIPPortConfigure ("L0", "192.168.1.7:5000",0,0,0)
##################
# drvAsynIPPortConfigure ("L0", "192.168.1.199:5000",0,0,0)
## Load record instances
dbLoadRecords("/home/ls/epics/XraySource/db/XraySource.db", "P=xanes2:, PORT=L0")
# IMAX, OMAX setting
# dbLoadRecords("db/asynRecord.db","P=norum:,R=asyn,PORT=L0,ADDR=24,IMAX=100,OMAX=100")
### Asyn
dbLoadRecords("/home/ls/epics/softIoc/synApps_6_1/support/asyn-R4-44//db/asynRecord.db","P=xanes2:,R=asyn,PORT=L0,ADDR=0,OMAX=2000,IMAX=2000")
epicsEnvSet ("EPICS_CAS_SERVER_PORT", "6020")
epicsEnvSet ("STREAM_PROTOCOL_PATH", "/home/ls/epics/XraySource/db")
asynSetTraceMask("L0", -1, 0x9)
asynSetTraceIOMask("L0", -1, 0x4)
asynSetTraceInfoMask("L0",-1, 0x1)
################# AUTOSAVE ##################
# cd ${AUTOSAVE}
# Optional database for autosave sta
# dbLoadRecords "db/save_restoreStatus.db", "P=$(IOCNAME):"
#save_restoreSet_status_prefix("$(IOCNAME):")
# set_requestfile_path("$(SAVE_DIR)")
# set_savefile_path("$(SAVE_DIR)")
# save_restoreSet_NumSeqFiles(3)
# save_restoreSet_SeqPeriodInSeconds(30)
# Arrange for restoring saved values into records
# set_pass1_restoreFile("$(IOCNAME).sav")
################# AUTOSAVE ##################
cd "/home/ls/epics/XraySource/iocBoot/iocXraySource"
iocInit
Starting iocInit
############################################################################
## EPICS R7.0.8
## Rev. 2024-01-30T23:45+0800
## Rev. Date build date/time: 
############################################################################
iocRun: All initialization complete
## Start any sequence programs
#seq sncxxx,"user=ls"
# Set trace file
dbpf("xanes2:asyn.TFIL", "/home/ls/Desktop/XRaySourceLog.txt")
DBF_STRING:         "/home/ls/Desktop/XRaySourceLog.txt"    
epics>

dirk-zimoch · 2025-01-20T09:49:00Z

I notice all your inputs use "I/O Intr". Does this device send its data automatically, independent of your sendMsg?
If not (if they are replies of the message sent in sendMsg) try to handle them in the send Msg protocol (using redirections) instead of using "I/O Intr".

1458861693 · 2025-01-20T09:53:58Z

yes, the device sends device status message automatically to the IOC at 100ms interval after the device start. No need to send a request.

1458861693 · 2025-01-20T12:10:57Z

You can download these files and test in your computer. Looking forward to your reply.

MarkRivers · 2025-01-20T14:54:42Z

How did you generate the communications trace? I see you set the asynTrace settings, but that does not look like the output of asynTrace, which should have "write" and "read" messages.

1458861693 · 2025-01-20T15:08:34Z

I used a network debugging assistant to simulate communication between an X-ray source and the IOC, so the messages look unchanged. Yes, I have set up asynTrace, and the output is saved in a txt file, which I will upload. The file is somewhat large.
After the IOC has been running for an extended period, the intervals between the messages sent by the IOC are not as stable as they were initially.

XRaySourceLog.txt

dirk-zimoch · 2025-01-20T16:41:00Z

Some buffer accumulating input or output data over time would explain the degradation if more and more old data is searched (maybe for terminators or 0 bytes) or copied (when buffer is growing).
But I do not see anything like that.
I need to add some debuging to read buffer sizes and set up a test system and let it run for a while...

MarkRivers · 2025-01-20T17:44:25Z

Can you run "top" to look at the resources use by the IOC process and see if CPU or memory use are growing when the time between writes gets long?

1458861693 · 2025-01-22T12:02:19Z

I tested this IOC on a physical machine today. The IOC runs on a CentOS 8 system, while the network debugging assistant runs on a Windows 11 system. However, because the Windows system went into sleep mode, the test lasted for less than 2 hours.

Now I will run this IOC on a Raspberry Pi 4 to communicate with the network debugging assistant on the Windows 10 system.
I will send the output data of the top command here every so often. Please help me resolve this issue; it is very important to me. Thank you.

output of the top command at 2025-1-22 20:01
This is the resource usage of the IOC right after it has started.

1458861693 · 2025-01-22T13:55:54Z

output of the top command at 2025-1-22 21:53
The IOC has been running for 2 hours, and the resource usage appears to be normal.
The send and receive rates of the IOC remain consistent with the configured settings.

dirk-zimoch · 2025-01-22T14:09:32Z

Keep observing your IOC process. You may also look at /proc/[PID of your IOC]/statm regularly (every hour or so) and see if any of the numbers keep growing.

1458861693 · 2025-01-22T14:12:27Z

Keep observing your IOC process. You may also look at /proc/[PID of your IOC]/statm regularly (every hour or so) and see if any of the numbers keep growing.

OK，Is the PID of st.cmd of my IOC?

dirk-zimoch · 2025-01-22T14:47:33Z

If you have started the IOC as 'st.cmd', then it is that one (PID 2072).
If you have started the IOC as '[programname] st.cmd', then look for the programname. In your case probably "XraySource", according to what I see in st.cmd.

1458861693 · 2025-01-22T15:24:09Z

If you have started the IOC as 'st.cmd', then it is that one (PID 2072). If you have started the IOC as '[programname] st.cmd', then look for the programname. In your case probably "XraySource", according to what I see in st.cmd.

I started this IOC by type './st.cmd' on terminator, and I don't find a process named 'XraySource' or 'Xray'..., so this PID of my IOC should be 2072.

This is the output that I type 'ps aux | grep 'Xray'*'.

pi@raspberrypi:~ $ ps aux | grep 'Xray*'
pi 2072 2.0 0.0 40512 5376 pts/4 Sl+ 19:52 4:22 ../../bin/linux-arm/XraySource ./st.cmd
pi 6934 0.0 0.0 7348 524 pts/1 S+ 23:21 0:00 grep --color=auto Xray*

1458861693 · 2025-01-22T16:14:04Z

This is the output of "cat /proc/2072/statm".
Could it be that some settings in the asyn, such as the buffer size, need to be increased? But I didn't find this settings. Please help me.
Thanks!

pi@raspberrypi:/opt/egcs $ date
Wed 22 Jan 2025 10:13:22 PM CST
pi@raspberrypi:/opt/egcs $ cat /proc/2072/statm
10128 1344 1066 7 0 6905 0
pi@raspberrypi:/opt/egcs $ cat /proc/2072/statm
10128 1344 1066 7 0 6905 0
pi@raspberrypi:/opt/egcs $ date
Wed 22 Jan 2025 10:21:14 PM CST
pi@raspberrypi:/opt/egcs $ cat /proc/2072/statm
10128 1344 1066 7 0 6905 0
pi@raspberrypi:/opt/egcs $ date
Wed 22 Jan 2025 10:31:38 PM CST
pi@raspberrypi:/opt/egcs $ cat /proc/2072/statm
10128 1344 1066 7 0 6905 0
pi@raspberrypi:/opt/egcs $ date
Wed 22 Jan 2025 11:20:57 PM CST
pi@raspberrypi:/opt/egcs $ cat /proc/2072/statm
10128 1344 1066 7 0 6905 0
pi@raspberrypi:/opt/egcs $ date
Thu 23 Jan 2025 12:02:11 AM CST
pi@raspberrypi:/opt/egcs $ cat /proc/2072/statm
10128 1344 1066 7 0 6905 0
pi@raspberrypi:/opt/egcs $

MarkRivers · 2025-01-22T17:12:19Z

@1458861693 I don't think it is a problem with asyn because others would have reported this problem. The thing that you are doing that is less typical is having I/O Intr scanned input records and periodic output records. I suspect that could be causing an issue in StreamDevice.

I don't think you really need to run the network monitoring. The asyn trace output should be sufficient. You can send that output to a file.

dirk-zimoch · 2025-01-22T17:59:06Z

None of the memory numbers is going up. Thus my hypothesis about a buffer accumulating more and more data to process seems to be wrong.

1458861693 · 2025-01-23T01:24:27Z

@MarkRivers , I will stop running the network monitoring, use only asyn trace, and send the output to a file. Please wait for my feedback.

1458861693 · 2025-01-23T01:32:26Z

@dirk-zimoch @MarkRivers @ The Input record have I/O Intr scanned should be normal according to the StreamDevice document, but about output record, if I want to send message to a port periodically and the interval must be very short, how should I handle this?

MarkRivers · 2025-01-23T02:13:46Z

What you are doing should work fine. It seems like there may be a problem we were not aware of.

1458861693 · 2025-01-23T06:33:40Z

@MarkRivers @dirk-zimoch
There is a new discovery. Another IOC is running on a CentOS8 system, using the same db and proto files as those in the previous attachment. After running for 20 hours, it suddenly stopped sending data, and the system became very sluggish; I could barely operate the CS-Studio software. At this point, the output of the top command was as follow picture.

And the IOC Shell also displayed the following warning messages.

epics>
dbScan warning from '.1 second' scan thread:
Scan processing averages 1.550 seconds (0.505 .. 2.324).
Over-runs have now happened 10 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '.5 second' scan thread:
Scan processing averages 1.533 seconds (0.797 .. 2.432).
Over-runs have now happened 10 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '.1 second' scan thread:
Scan processing averages 1.789 seconds (0.505 .. 6.287).
Over-runs have now happened 20 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '.5 second' scan thread:
Scan processing averages 2.157 seconds (0.797 .. 8.662).
Over-runs have now happened 16 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '.5 second' scan thread:
Scan processing averages 1.828 seconds (1.073 .. 2.528).
Over-runs have now happened 10 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '1 second' scan thread:
Scan processing averages 1.624 seconds (1.026 .. 2.127).
Over-runs have now happened 10 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '.1 second' scan thread:
Scan processing averages 1.900 seconds (0.505 .. 7.760).
Over-runs have now happened 40 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '.5 second' scan thread:
Scan processing averages 2.245 seconds (1.073 .. 9.658).
Over-runs have now happened 17 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '1 second' scan thread:
Scan processing averages 2.771 seconds (1.026 .. 17.366).
Over-runs have now happened 13 times in a row.
To fix this, move some records to a slower scan rate.

dbScan warning from '1 second' scan thread:
Scan processing averages 8.178 seconds (1.026 .. 78.466).
Over-runs have now happened 14 times in a row.
To fix this, move some records to a slower scan rate.
errlog: 3 messages were discarded

dbScan warning from '2 second' scan thread:
Scan processing averages 7.558 seconds (2.111 .. 25.687).
Over-runs have now happened 12 times in a row.
To fix this, move some records to a slower scan rate.
epics>

Reviewing the trace files during these 20 hours, the data sent by the IOC was mostly normal; although there were slight fluctuations in the intervals between data transmissions, there were basically no instances of several-second-long gaps.

1458861693 · 2025-01-23T06:43:25Z

@MarkRivers , I will stop running the network monitoring, use only asyn trace, and send the output to a file. Please wait for my feedback.

Meanwhile, I am also running another IOC as mentioned above, without using the network monitor, only utilizing the IOC to send data with the scan field set to 0.1 seconds. After 5 hours, based on the timestamps of the data sent from the IOC, the intervals have remained largely consistent with the scan period. I have also uploaded the asyn trace file to the attachment. I will continue to run this IOC indefinitely and will provide feedback here if there are any issues.

XRaySourceLog.txt

dirk-zimoch · 2025-01-23T13:47:35Z

I find it a bit strange that the problems start so suddenly and then affect all the scan threads, even up to "2 second". It is as if the whole IOC is suddenly frozen for at least 24 seconds (12 overruns in a row of the 2 second scan). The sending record is scanned with ".1 second" and the actual I/O is done in the asyn thread "L0". So I would have expected that only those threads would be affected. Problems with CSS on the same host points to a global problem.

Also I noticed that you have 518% CPU load (5 of 8 cores?) for the "node" (Node.js probably) program (plus some percentage for python3 and java). Can it be that whatever your JS program is doing kills the host performance?

Is the your CentOS8 system a real machine or virtual? In case it is virtual, maybe you need to tune your VM settings?

Is your machine accessing NFS mounts? In case there is something wrong with the NFS server, several processes may be affected. In particular if the machine is virtual with an NFS root file system.

If the IOC running on raspberrypi is more stable than running on your CentOS8, then the problem is most probably systematic to your CentOS8 machine and not related to the IOC.

MarkRivers · 2025-01-23T13:51:23Z

This command will show the CPU time of each thread. It can be useful for finer scale information on what is using CPU.

top -h

1458861693 · 2025-02-10T14:39:43Z

Sorry for the late reply as it was the Chinese Spring Festival recently.

My CentOS 8 is running on a real machine without mounting NFS. My Control System Studio makes extensive use of jython and Python scripts. Could this affect the operation of the IOC?

The IOC running on the RaspberryPi might be more stable than the one on CentOS8 may because only one IOC runs on the RaspberryPi, whereas multiple IOCs and the Control System Studio are running on CentOS8.

Is it recommended to use another Linux distribution such as Rocky Linux 8 instead of CentOS 8? This is currently my plan.

MarkRivers · 2025-02-10T16:24:44Z

Is it recommended to use another Linux distribution such as Rocky Linux 8 instead of CentOS 8? This is currently my plan.

No, CentOS 8 should work fine, even with several IOCs.

You should run "top" to see what is the load on the system. How much free CPU and memory capacity is there?

Mark

1458861693 mentioned this issue Jan 20, 2025

After running for a long time, an IOC using StreamDevice and ASYN exhibits significant deviations between the message transmission intervals and the set scan period. epics-modules/asyn#215

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After running for a long time, an IOC using StreamDevice and ASYN exhibits significant deviations between the message transmission intervals and the set scan period. #102

After running for a long time, an IOC using StreamDevice and ASYN exhibits significant deviations between the message transmission intervals and the set scan period. #102

1458861693 commented Jan 20, 2025

dirk-zimoch commented Jan 20, 2025

dirk-zimoch commented Jan 20, 2025 •

edited

Loading

1458861693 commented Jan 20, 2025 •

edited

Loading

dirk-zimoch commented Jan 20, 2025

1458861693 commented Jan 20, 2025 •

edited

Loading

1458861693 commented Jan 20, 2025

MarkRivers commented Jan 20, 2025

1458861693 commented Jan 20, 2025

dirk-zimoch commented Jan 20, 2025

MarkRivers commented Jan 20, 2025

1458861693 commented Jan 22, 2025 •

edited

Loading

1458861693 commented Jan 22, 2025 •

edited

Loading

dirk-zimoch commented Jan 22, 2025

1458861693 commented Jan 22, 2025

dirk-zimoch commented Jan 22, 2025 •

edited

Loading

1458861693 commented Jan 22, 2025

1458861693 commented Jan 22, 2025

MarkRivers commented Jan 22, 2025

dirk-zimoch commented Jan 22, 2025

1458861693 commented Jan 23, 2025

1458861693 commented Jan 23, 2025

MarkRivers commented Jan 23, 2025

1458861693 commented Jan 23, 2025

1458861693 commented Jan 23, 2025

dirk-zimoch commented Jan 23, 2025

MarkRivers commented Jan 23, 2025

1458861693 commented Feb 10, 2025

MarkRivers commented Feb 10, 2025

After running for a long time, an IOC using StreamDevice and ASYN exhibits significant deviations between the message transmission intervals and the set scan period. #102

After running for a long time, an IOC using StreamDevice and ASYN exhibits significant deviations between the message transmission intervals and the set scan period. #102

Comments

1458861693 commented Jan 20, 2025

dirk-zimoch commented Jan 20, 2025

dirk-zimoch commented Jan 20, 2025 • edited Loading

1458861693 commented Jan 20, 2025 • edited Loading

dirk-zimoch commented Jan 20, 2025

1458861693 commented Jan 20, 2025 • edited Loading

1458861693 commented Jan 20, 2025

MarkRivers commented Jan 20, 2025

1458861693 commented Jan 20, 2025

dirk-zimoch commented Jan 20, 2025

MarkRivers commented Jan 20, 2025

1458861693 commented Jan 22, 2025 • edited Loading

1458861693 commented Jan 22, 2025 • edited Loading

dirk-zimoch commented Jan 22, 2025

1458861693 commented Jan 22, 2025

dirk-zimoch commented Jan 22, 2025 • edited Loading

1458861693 commented Jan 22, 2025

1458861693 commented Jan 22, 2025

MarkRivers commented Jan 22, 2025

dirk-zimoch commented Jan 22, 2025

1458861693 commented Jan 23, 2025

1458861693 commented Jan 23, 2025

MarkRivers commented Jan 23, 2025

1458861693 commented Jan 23, 2025

1458861693 commented Jan 23, 2025

dirk-zimoch commented Jan 23, 2025

MarkRivers commented Jan 23, 2025

1458861693 commented Feb 10, 2025

MarkRivers commented Feb 10, 2025

dirk-zimoch commented Jan 20, 2025 •

edited

Loading

1458861693 commented Jan 20, 2025 •

edited

Loading

1458861693 commented Jan 20, 2025 •

edited

Loading

1458861693 commented Jan 22, 2025 •

edited

Loading

1458861693 commented Jan 22, 2025 •

edited

Loading

dirk-zimoch commented Jan 22, 2025 •

edited

Loading