Background#
The existing monitoring data within the group is mainly divided into system log data and business tracking data, distributed on two corresponding platforms. After the system goes online or during business peak periods, developers need to continuously monitor online data to predict risks. Switching between the two platforms frequently leads to a poor user experience.
Considerations#
- Switching between the two platforms is troublesome, so it is worth considering integrating the data sources of different platforms for processing.
- Transform active viewing into passive awareness to further enhance the user experience.
Problem Breakdown and Solutions#
- Regarding the first point, different platforms provide corresponding metric query APIs. This part mainly considers the definition, configuration, and integration of data sources.
- Based on the metric query API, a business reconnaissance configuration module has been added to the self-developed monitoring platform site. Users can configure monitoring metrics for different platforms based on project dimensions.
- Regarding the second point, after configuring and integrating the data sources, data push needs to be implemented. The internal IM tool also provides an open platform API for integration and invocation.
In addition to regular data push, it is also necessary to support user-defined subscription push cycles. node-cron is used to generate corresponding scheduled tasks. One subscription configuration corresponds to one task. When a user updates the subscription configuration, the corresponding task also needs to be updated.
Currently, the breakdown of the solution is relatively straightforward and clear. However, the problem lies in the fact that our self-developed monitoring platform service uses PM2
and starts 4 processes simultaneously. In addition, the online cluster has at least 3 machines, and there are at least 12 running node-cron
tasks. Therefore, the problem of managing periodic tasks among multiple processes arises.
Considering the overall load balancing, multiple processes are allowed to register multiple tasks, but it is necessary to ensure that only one task will be executed and that requests to modify the configuration will be sent to any of the processes. It is also necessary to ensure that the remaining processes synchronize the registration of the latest tasks.
Multi-process Management Solution#
From the backend perspective, it is obvious that a distributed lock needs to be implemented. The most common approach is to use the setnx
command of Redis
. However, after carefully considering the principle of minimizing the introduction of new dependencies, it was found that the atomic operations provided by MongoDB
can also achieve the same result. The detailed process is as follows:
The core logic involves designing two flags:
- The valid flag solves the problem of synchronizing multiple processes.
- The running flag solves the problem of concurrent execution by multiple processes.
The rules for the flags are the IPv4 address + Node process number. The process number can be directly obtained through process.env.NODE_APP_INSTANCE
, while the IP address needs to be obtained using the built-in node:os
module.
import { networkInterfaces } from 'os';
/**
* Returns the IPv4 address
*/
export const getIPAddress = () => {
const interfaces = networkInterfaces();
let address = '';
for (const devName in interfaces) {
if (Object.prototype.hasOwnProperty.call(interfaces, devName)) {
const iface = interfaces[devName] || [];
iface.forEach((alias) => {
if (
alias.family === 'IPv4' &&
alias.address !== '127.0.0.1' &&
!alias.internal
) {
address = alias.address;
}
});
}
}
return address;
};
In the end, the setting of the flags effectively solves the problem of multi-process management.