eG Innovations

How to deploy the eG VM Agent via Microsoft Endpoint Manager’s Intune to monitor your Windows 365 Cloud PC desktops.

Rajkumar Ramasamy — Fri, 15 Nov 2024 06:49:37 +0000

Today I will cover how to deploy the eG VM Agent via Microsoft Endpoint Manager’s Intune so that you can monitor your Windows 365 Cloud PC desktops.

What is Microsoft Endpoint Manager (MEM)?

MEM is an integrated suite of tools for managing devices, applications, and security across an organization. It serves as an umbrella brand that includes multiple management solutions.

Components: It combines several management solutions under one platform:

Microsoft Intune: A cloud-based service focused on Mobile Device Management (MDM) and Mobile Application Management (MAM).
Configuration Manager (SCCM): An on-premises management solution for managing desktops, servers, and devices.
Co-management: A hybrid approach allowing organizations to manage devices with both Configuration Manager and Intune simultaneously.
Other Tools: Includes Endpoint Analytics, Autopilot, and Desktop Analytics for monitoring, deploying, and managing devices.

MEM aims to unify both cloud and on-premises management tools under one console, offering flexibility and efficiency in managing the entire lifecycle of devices across an organization.

More details are covered in: What is Microsoft Endpoint Manager (MEM)? (techtarget.com)

If you use the Microsoft Intune admin center to manage your Windows 365 Cloud PCs, you can get some basic insights such as:

see how your Windows 365 Cloud PCs are doing
see the provisioning status of Cloud PCs
get a summary of the Azure network connection health in your organization
track license usage of Windows 365 Cloud PCs

Many of our customers opt for enhanced enterprise-grade monitoring and diagnostics. To get maximum insights into your Cloud PC environments we recommend using the eG VM Agent for Windows 365 Cloud PCs. The eG VM Agent for Windows 365 Cloud PCs:

is a light-weight agent to monitor PC performance and user experience.
should be run on all Cloud PCs to get a holistic picture of the cloud workspace.
can be deployed manually, but this is time consuming and could lead to errors.
can be deployed automatically by integrating eG VM agent deployment with. Microsoft Intune – this saves time, enhances efficiency and ensures compliance with monitoring needs.

Now I will walk you through setting up eG Enterprise to monitor Windows 365 Cloud PCs. First, navigate to the “Admin” tab in the eG Enterprise console.

Step 1: Add a new component to monitor your Windows 365 Cloud PCs within the eG Manager

Make sure you are on the Discover/Monitor tab (the top icon of a magnifying glass on a square on the left-hand vertical tab menu.

Select “Cloud Desktops”.

Now select, “Windows 365 Cloud PCs” from the VDI/DaaS options available.

You will be taken to a screen where you will enter a “Nick Name” for the Cloud PC group. You will also be asked to select a “Monitoring approach”. A remote agent is necessary for monitoring Windows 365 Cloud PCs. The remote agent must listen on a port for VM agents to communicate with it. This TCP port is configurable.

Click the “Update button”.

Step 2: Download the eG VM Agent command line installer

You will now be able to download the eG VM Agent. Note that the VM agent you will download this way is specific to the Windows 365 Cloud PC group that you just created.

To download the agent – use the download icon on the right-hand side of the screen associated with the “Nick Name” that you chose.

You will now be presented with the screen shown below.

There are three fields you need to set:

“VM Agent Communication Target”: The VM agent can bootstrap from a remote agent or the eG manager. If the VM agent cannot reach the eG manager, you must choose the Remote Agent here.
“Installation Method”: Choose “Command Line (One-liner) which will give you a one-liner command line in the pale blue box that is compatible with Microsoft Intune.
“Environment”: Choose the OS of the Windows 365 PCs you intend to monitor.

Having set these three fields, the pale blue box will be populated with the one-line command you will need to supply to Microsoft Intune. Use the “Copy” button and paste this into Notepad or a similar editor for later use.

Step 3: Create a command line PowerShell installation script (.ps1 format)

To convert the one-line installer to a PowerShell script use the Windows PowerShell ISE application.

Step 4: Create a device group in the Entra ID (was Azure Active Directory) tenant as a security group type and add your Cloud PCs

This will allow you to manage and push the eG VM agent out to all the Windows 365 Cloud PCs that you choose to add to that group

You will need to utilize Microsoft Entra ID for device group creation. Login in to your administrator account for Groups in the Microsoft Entra admin center, via: https://entra.microsoft.com/#view/Microsoft_AAD_IAM/GroupsManagementMenuBlade/~/AllGroups/menuId/AllGroups.

Now create a “New group” by selecting the button shown below.

Create a security group, with an appropriate group name, here we used “W365_devicegroup”:

Now use the “Add members” blade to add the subset of Cloud PCs you wish to monitor as a group.

Here – we added a single desktop, but you might want to take note of how many you have selected for when you verify deployment later.

Step 5: Assign the eG VM Agent PowerShell Script to the Device Group

This involves deploying the command line PowerShell installation script to the created device group using the Microsoft Endpoint Manager (MEM) console via the Microsoft Intune Admin Center.

Login into the Intune Admin Center and navigate to “Home” -> “Devices” -> ”Scripts and remediations”, you can navigate directly via: https://intune.microsoft.com/#view/Microsoft_Intune_DeviceSettings/DevicesMenu/~/scripts.

Select the “Platform scripts” tab as shown below. Then choose “Windows 10 and later” from the dropdown on the “Add” button as shown.

Then select the .ps1 file you created in Step 3 (above). In this example, we named it “eGVMagentInstaller”

Once you have uploaded the script, you will need to set some controls on how the script is run. Set the settings as follows:

Select “No” to the question “Run this script using the logged on credentials” – this will ensure that the script is run/executed using the privileged access of the SYSTEM account.
Choose “No” to “Enforce script signature check”. Unless you have taken additional steps to sign the script and create a certificate.
Ensure “Yes” is selected to “Run script in 64 bit PowerShell Host”.

Verify your choices and move on to the ”Assignments” step. Choose the target Windows 365 cloud PC Security group (created above) for script assignment.

Now click “Next” where you can review the group. You will probably want to check that the group has been assigned and that it contains the correct number of devices.

After this, you can wait for the configured sync interval to occur and the eG VM Agent will be rolled out. If however, you want to deploy the agent immediately you can manually push the process via a “SYNC action” as detailed in the next step.

Step 6: Initiate installation of the eG VM Agent using a push notification (SYNC action) from the MEM Intune console

The Sync device action forces the selected device to immediately check in with Intune. When a device checks in, it immediately receives any pending actions or policies assigned to it. This feature can help you immediately validate and troubleshoot policies you’re assigned to, without waiting for the next scheduled check-in.

Navigate to the Windows devices “Bulk action”, a direct link to the blade is here: https://intune.microsoft.com/#view/Microsoft_Intune_Devices/BulkActionWizardBlade.

Now choose the “Device action” type as “Sync”. Then click “Next”.

Now set:

The “OS” field to “Windows”
The “Device type” field to “Cloud PCs”
The “Device action” to “Sync”

As shown below:

Click “Next”. Now verify your choices and that the correct devices have been selected and click “Create”

Step 7: Verify the eG VM Agent has been installed

When you login to a Windows 365 Cloud PC, you can verify that the eG VM Agent has been installed via the Control Panel.

You can also verify that the process associated with the eG VM agent service is running.

Step 8: Verify that the eG VM Agent is communicating with the eG Enterprise console

Now when you visit the “Monitor” tab of the eG Enterprise console and examine your Cloud PCs you will have access to real-time metrics.

Benefits of using the eG VM Agent/Microsoft Intune integration

We think this is a great way to deploy the eG VM Agent to Windows 365 Cloud PCs, offering benefits such as:

Efficiency: Deploying the eG VM agent automatically through Microsoft Intune saves time and resources.
Cloud Integration: Deploy the VM Agent to Windows 365 Cloud PCs for streamlined management.
Execution Tracking: Verify VM agent deployment status through Intune Admin Center for insights.
Compliance Assurance: Ensure all Windows 365 Cloud PCs run the monitoring agent.

To learn more about how you can now use eG Enterprise to monitor your Cloud PCs, see: Monitoring Windows 365 Cloud | eG Innovations.

The post How to deploy the eG VM Agent via Microsoft Endpoint Manager’s Intune to monitor your Windows 365 Cloud PC desktops. appeared first on eG Innovations.

7 Myths of AVD Monitoring

Babu Sundaram — Thu, 07 Nov 2024 14:15:10 +0000

Azure Virtual Desktop (AVD) is a powerful and increasingly popular solution that allows businesses to provide secure, scalable, and cloud-based desktop virtualization, usually without the overhead of on-prem infrastructure. However, many organizations underestimate the importance of monitoring, leading to performance, compliance, and cost issues. Today, I will debunk several common myths surrounding AVD monitoring and explain why a proactive approach can save you time, stress and billing costs.

What is Azure Virtual Desktop?

Azure Virtual Desktop (AVD) is Microsoft’s cloud-based desktop and application virtualization service. It offers flexibility, scalability, and secure access to users from virtually anywhere. While Microsoft manages the underlying infrastructure, businesses are responsible for managing user experience, application performance, and cost control. Monitoring plays a vital role in ensuring that AVD deployments operate smoothly and efficiently, meeting user expectations and business requirements. Here are seven common myths about AVD monitoring and the realities behind them.

Myth #1. “AVD is a cloud service. Microsoft manages it. I don’t need to worry about monitoring and analytics.”

A widespread misconception is that since AVD is a cloud service, Microsoft takes care of everything, including monitoring. While Microsoft maintains the infrastructure, it’s important to note that ensuring user experience is still your (the customer’s) responsibility. Afterall, what applications to deploy, what type of profiles to use, what integrations to support, are all determined by you. And the decision to use AVD (rather than other digital workspace technologies like Citrix, Omnissa Horizon, etc) is also yours. Users don’t care what VDI technology is used as long as it is available and performing.

When users encounter issues such as login slowness, application failures, or sudden disconnections, they will still turn to you for resolution of these problems. It will be your responsibility to determine what caused the issue: is it a network connectivity issue to Azure, a profile management issue, an Azure storage or network failure, or whether it is a problem with application connectivity (e.g., the application being accessed could be remote – e.g., Salesforce) or with the application code. Issues at the user end – e.g., poor Wi-Fi connectivity or a bad ISP connection – can also lead to complaints about slow access. The cloud service provider, Microsoft is not responsible for troubleshooting any of these issues.

Continuous monitoring helps detect problems proactively and to identify whether the cause of these problems lies. Without monitoring, pinpointing and resolving user complaints will be much harder and take more time.

Additionally, AVD is a usage-based service, meaning that you pay based on consumption. Your interest should be in controlling the service cost and for this, you need visibility into usage patterns and trends. You will need to understand who is using the service, what resources are being consumed, and how to control costs effectively. Good end-to-end observability and monitoring are essential – and these are not areas that Microsoft handles.

Figure 1: A Connection Failure report in eG Enterprise allows the administrator to quickly identify the most problematic areas of their AVD deployment to target effort where most effective. Instant visibility on whether certain Host Pools, Session Hosts, Users or Session Desktops experience connection problems.

Myth#2. “Azure Service Health allows customers to track the status of Azure services. That’s enough.”

Microsoft does provide overall service health updates on the Azure status portal. The Azure Service Health dashboard provides an overview of health updates and open service advisories.

Figure 2: The Azure Service Health dashboard: While this is useful to track general service availability, it is not designed to offer real-time, detailed insights into specific AVD deployments.

The information presented in the Azure Service Health updates is often very generic, and status updates usually lag significantly behind real-time outages or incidents. We have some useful information on how to track Azure outages available, see: How to Protect your IT Ops from Cloud Outages (eginnovations.com).

The Azure Service Health updates reflect the actual status after a problem has become severe enough to impact a large number of customers. Further, the health indicators are not specific to your subscription and your systems. Even if the Azure status portal indicates that AVD is operational, specific issues within your subscription — such as configuration errors, Entra ID (formerly Azure AD) authentication problems, FSLogix or storage issues, or a runaway application on a session host — are not reflected.

Therefore, you cannot rely solely on the Azure status portal and the Azure Service Health dashboard for operating the AVD service efficiently.

Synthetic monitoring, which simulates user activity 24×7, is often necessary to identify performance issues that the Azure portal might miss. With this approach, software robots simulate user logons to AVD and measure logon availability and logon times. A more sophisticated approach is full session simulation, where an entire workflow – a user logging in to their session, logging into an application, doing work in the application and then signing out of the app and VDI – is simulated and the success of each step of the workflow and its response time tracked. This type of monitoring provides a proactive way to track key performance metrics like login times and application responsiveness, allowing you to address issues before they impact users, rather than relying on delayed status updates.

Figure 3: Full session simulation showing the performance of each step of the simulated workflow

External synthetic monitoring for AVD can simulate user access from specific geographic locations, mimicking home, remote workers, or branch offices. This ensures performance consistency and identifies location-specific issues before they impact users. Synthetic monitoring can also be used to simulate users using multiple applications in a realistic way even when no real users are accessing your AVD deployment. Learn more: Synthetic Monitoring of Microsoft Azure DaaS | eG Innovations.

Myth#3. “I have configured auto-scaling. This is sufficient for my AVD service to operate well.”

While auto-scaling is a valuable feature of AVD, it is not a substitute for comprehensive monitoring. There is a misconception that simply configuring auto-scaling will solve performance problems – this is akin to throwing more computing resources whenever there is an issue.

Auto-scaling responds to increased resource usage and demand by adding more session hosts, but it doesn’t analyze the underlying causes of the usage/demand spike. For instance, a rise in CPU or memory usage could be due to an application misconfiguration or a malfunctioning application. Auto-scaling may increase capacity, raising costs without resolving the root cause. By monitoring your AVD landscape, you can identify if the resource demand is legitimate or if there are application inefficiencies or misconfigurations that need addressing to optimize costs effectively.

Hence, auto-scale up is useful when configured correctly but it is not a replacement for 24×7 monitoring.

Root-cause Diagnostics for AVD – An Example

The CPU time used by user sessions (%) indicates the percentage of time, across all processors, that a user used the CPU. In contrast, the CPU usage for a user’s processes measure indicates the percentage of overall CPU time that a user is using. For example, if a user is taking up one of the CPUs for 100% of the time and there are 8 CPUs on the AVD, CPU usage for user’s processes will be 12.5% (100/800). While 12.5% may seem to be a low number, the fact that the user is taking up one of the CPUs of the AVD is significant.

Hence, CPU time used by user’s session measure is a better indicator of CPU usage by users. In the above example, since the user is consuming 100% of one processor, CPU time used by user’s session will be 100%. A high value of this measure or a consistent increase in the value of this measure demands attention. Use the detailed diagnosis to know what CPU intensive activities are being performed by the user.

Figure 4: Note the detailed diagnostics icon (the magnifying glass) is available for CPU time used by user sessions (%). Clicking on this will give you instant access to detailed information on the individual applications and processes that are using the CPU and affecting the metrics value.

Figure 5:The root-cause diagnostics showing what is using the CPU and affecting the value of the CPU time used by user sessions (%) value.

Myth#4. “Azure Monitor gives me all I need for monitoring the AVD service.”

Azure Monitor is the built-in monitoring tool in the Azure tools stack. It offers visibility and alerting on different aspects of Azure and AVD performance. One of its significant drawbacks is that it requires significant manual setup to configure log analytics and build custom dashboards. Setting up metric thresholds and alerting is manual and time-consuming. We have a detailed guide on automating metric thresholding and alerting available, see: White Paper | Make IT Service Monitoring Simple & Proactive with AIOps Powered Intelligent Thresholding & Alerting (eginnovations.com).

Also, Azure Monitor operates on a pay-per-metric / alert model, meaning costs can quickly increase if you monitor a large number of metrics or run a complex environment. It is also incredibly hard to estimate costs and budget for Azure Monitor usage. See also: How to Reduce Azure Log Analytics Costs | eG Innovations. Since Azure Monitor cost is included in Azure costs, there is a misconception that Azure Monitor is free to use, which is not the case.

A joint survey on AVD uptake from AVD Techfest and eG Innovations found that the cost concerns around Azure Monitor – both the expense (30%) but also the uncertainty (30%) as to what those costs will be are the top obstacles to using Azure Monitor with AVD. Manual configuration and the lack of out-of-the-box features (26%) were the next most significant issue reported.

Source: Azure Virtual Desktop (AVD) Adoption Trends (eginnovations.com)

Often, many customers are using multiple types of digital workspaces – e.g., Citrix on-prem for legacy applications and for use cases that need higher security, and AVD for newer use cases or to support off-shore workers. In such cases, Azure Monitor does not provide the cross-platform visibility needed. You will need to use one tool to monitor AVD and another one for the other digital workspaces. Alternatively, you may need to integrate third-party solutions or additional monitoring tools to cover gaps in Azure Monitor and ensure a comprehensive view of your entire environment. Having a unified monitoring interface and consistent dashboards and reports minimizes the learning curve for your operations and helpdesk teams.

For information on what eG Enterprise offers for AVD monitoring beyond native Azure Monitor functionality, please see: Top Azure Monitor Alternatives: eG Innovations.

Myth#5. “Monitoring is needed only to troubleshoot when AVD problems occur.”

There are many who believe that monitoring is only valuable when issues arise. While it is true that having comprehensive monitoring does simplify and accelerate troubleshooting, monitoring has several other uses, especially for a digital workspace in the cloud.

Compliance: IT organizations are subject to stringent controls these days, and it is important to track user activities including who logged in, at what time, for how long, what applications they accessed and what resources they utilized. Monitoring of AVD provides access to reports that provide all of these usage insights that are important for compliance.
Security: Monitoring access attempts to your AVD service can highlight break-in attempts to your session hosts. Entra ID (was Azure ID) is a key for security in Azure and continual real-time monitoring of Entra ID sign-in logs can identify malicious attacks on your AVD deployments such as brute force and password spraying attacks.

Figure 6: Brute force and password spraying attacks can be easily identified and the details examined via built-in reports.

Learn more about monitoring Entra ID sign-in logs: Entra ID Monitoring – Sign In Logs & Attack Detection (eginnovations.com).
Right-sizing and cost-control: Deployment of resources in the cloud costs money. If your session hosts have excess resources (CPU, memory, etc), this will result in unnecessary cost. At the same time, malfunctioning applications (or misconfigured applications – e.g., antivirus scans or backups running during peak hours) or inappropriate user activity, if left unchecked, can lead to wastage of resources, leading to increased cost. Monitoring tools provide reports highlighting which session hosts are under-sized and which ones are over-sized. Real-time monitoring also highlights conditions where a user or an application are using a high/unexpected share of resources. IT operations teams can respond to alerts regarding such conditions and ensure that user experience does not suffer, and unnecessary auto-scaling does not happen.
Improved capacity planning: With empirical insights from monitoring tools, IT operations teams can plan better for growth. They can estimate how many more users can accommodate on the current infrastructure and can plan for the number and sizing of the session hosts needed to handle additional user growth.
Management reporting: Performance and usage reports can be used to highlight to management how the digital workspace service is working and how widely it is being used. These insights can also be used to justify additional expenses on the AVD service.

In summary, don’t look at monitoring tools as being useful for troubleshooting alone.

Myth#6. “Monitoring of AVD is about monitoring your session hosts.”

A common myth is that monitoring session hosts is sufficient to manage AVD performance. Obviously, the session hosts are important because they are the ones that handle user sessions and host applications accessed by users. But monitoring the session hosts alone provides only a part of the picture.

The performance of the AVD service also depends on components such as Entra ID (formerly Azure AD), networking layers, the Azure subscription in which the session hosts reside, the connection brokering layer of Azure, etc. All of which all impact user access and experience.

For example, problems in Entra ID can prevent users from accessing the environment, and these issues will not be visible if you only monitor session hosts alone. Additionally, the logon process and user authentication are managed largely at the connection broker layer (learn more about the AVD broker in Monitor and investigate AVD Broker issues | eG Innovations), meaning that only monitoring session hosts will miss critical information about logon delays or failures. Therefore, to fully understand user behavior and experience, it is essential to monitor all layers of the AVD environment, including Entra ID and broker services, for comprehensive visibility.

Learn more:

How to monitor Entra ID (Azure AD) Step by Step | eG Innovations

Myth#7. “Monitoring can be added later after the AVD service is operational.”

Often, the focus prior to and during deployment of AVD is on the applications to be delivered, the desktop configurations to be supported, sizing of the session hosts, the technology to be used for user profiles (e.g., local profiles, FSLogix, etc), how auto-deployment will be done (e.g., scripts or by using a tool like Nerdio), etc. While the focus on these aspects of provisioning the service are important, monitoring is often an after-thought. Monitoring is often considered after costs have shot up or users are complaining about issues.

Having a proactive monitoring strategy upfront during AVD deployment is a key to a successful deployment. If you are migrating applications and desktops to AVD, use monitoring to benchmark performance before the move and after the move to AVD. This way, all the stakeholders can be on the same page, and you can determine if the migration improved user experience or not.

Having monitoring in place from day one ensures that you have visibility into performance, usage, cost, and user experience, and when an issue is detected, you don’t have to wonder what changed. You will have the data to easily determine what caused a change in cost or usage. Not considering monitoring during the planning of your AVD deployment leads to cost overruns, slow performance complaints and finger-pointing between management and operations teams.

Conclusions

Supporting a virtual desktop service in the cloud is not as simple as it sounds. With Azure Virtual Desktop technology, while Microsoft manages the cloud infrastructure, you are responsible for managing user experience, application performance, and resource optimization. Having a robust and proactive end-to-end monitoring strategy in place is a key to the success of any AVD initiative.

Learn more about AIOps-powered AVD monitoring using eG Enterprise: Azure Virtual Desktop Monitoring | eG Innovations.

The post 7 Myths of AVD Monitoring appeared first on eG Innovations.

Jetty vs Netty: A Comprehensive Comparison for Site Reliability Engineers

Arun Aravamudhan — Tue, 29 Oct 2024 05:54:14 +0000

Jetty vs Netty – A Short Overview

Here’s a quick TL;DR version in terms of the differences:

Jetty = Web servers, servlet containers, and handling HTTP requests.	Netty = Network apps, asynchronous I/O, and handling tons of connections.
Think: web apps, APIs, microservices.	Think: chat servers, real-time data, multiplayer games.

Which suits your app needs?

Jetty Introduction

Jetty is a lightweight, highly scalable web server and servlet container often used in applications where a full Jakarta EE (Java EE) stack is not required. It is known for its ease of use, flexibility, and support for various protocols including HTTP/2 and WebSocket. The lightweight servlet container is easy to embed within a Java application.

The Jetty web server (HTTP) is similar to the likes of Tomcat (see: Apache Tomcat vs Eclipse Jetty: Top Differences – GeeksforGeeks if you are looking for information on Jetty vs Tomcat).

Tomcat is great when you’re running a traditional web server, but it can feel a little heavy sometimes. Jetty, on the other hand, is like Tomcat’s agile sibling—lightweight, embeddable, and ready to be tucked into your apps as if it’s not even there.

Netty Introduction

Netty is a non-blocking I/O (NIO) framework that enables the development of scalable network applications. It is designed for building high-performance protocol servers and clients, and it excels in scenarios requiring high throughput and low latency.

What is NIO?

NIO, or Non-blocking Input/Output, lets Java handle multiple connections using a single thread, instead of needing one thread per connection like traditional I/O.

This means the thread doesn’t have to sit idle waiting for data—it can keep doing other tasks. It’s like how Node.js uses an event loop, where the system can manage many tasks at once without getting stuck waiting.

NIO is great for making applications faster and more efficient, especially when they need to handle lots of connections at the same time.

Key Differences between Netty and Jetty

Let’s now jump into a comparison where we’ll break down the differences between Jetty and Netty by each feature or capability. We’ll also explore their respective use cases, so you get a clear picture of which technology fits your specific server-side needs.

1. Architecture and Design

Jetty:

Primarily a web server and servlet container.
Suitable for serving web applications and handling HTTP/2 traffic.
Offers support for traditional web application frameworks and is easy to integrate with Java EE technologies.

Netty:

A network application framework rather than a web server. Netty is a framework to write TCP and UDP applications.
Focuses on low-level network programming and is protocol-agnostic.
Provides fine-grained control over networking aspects, making it ideal for custom protocols and high-performance requirements.

2. Performance and Scalability

Jetty:

Efficient for handling a large number of concurrent HTTP connections.
Optimized for web applications with its asynchronous processing capabilities.
Suitable for typical web server use cases but might face limitations in extremely high-throughput scenarios.
High-volume throughput can cause significant log bloat

Netty:

Designed for high performance and scalability from the ground up.
Utilizes a non-blocking I/O model which can handle millions of connections.
Ideal for applications requiring low latency and high throughput, such as financial trading systems or real-time communication platforms.
The lightweight logging designed for high-volume scenarios is considered superior to Jetty’s logging by most network application developers.

3. Ease of Use

Jetty:

Straightforward to set up and use, especially for web application development.
Extensive documentation and community support make it accessible for developers with varying levels of expertise.

Netty:

Requires more effort to set up and has a much steeper learning curve due to its low-level nature. It is far harder (and usually more expensive) to recruit staff with experience of Netty than Jetty.
Provides greater flexibility and control, which can be both an advantage and a challenge depending on the use case.

4. Use Cases

Jetty:

Web servers and servlet containers.
Microservices and RESTful APIs.
Applications requiring HTTP/2 and WebSocket support.

Netty:

Custom protocol servers and clients.
High-performance networking applications.
Systems requiring high throughput and low latency, such as game servers and messaging platforms.

Considerations for SREs when Choosing Jetty vs Netty

Reliability and Stability

Both Jetty and Netty are mature projects with active communities and regular updates. Choose based on your specific application requirements and the expertise of your team.

You can explore the Jetty project, here: The Eclipse Jetty Project :: Eclipse Jetty. Information on the Netty project is also available, please see: Netty: Get InvolvedObservability and Monitoring

Observability and Monitoring

Jetty integrates well with standard monitoring tools and provides built-in metrics for tracking performance and health.

Netty doesn’t offer built-in support for JMX (Java Management Extensions), although there have been proposals for its integration. To monitor Netty, users often need to implement custom metrics using tools like ChannelHandlers. Alternatively, monitoring solutions such as eG Enterprise can provide insights into Netty applications by tracking JVM performance and custom metrics. Transaction tracing can also be very helpful, see: Java Transaction Monitoring | eG Innovations. These tools help bridge the observability gap, allowing teams to monitor application health and performance even without native JMX support in Netty. eG Enterprise is able to monitor technologies including Jetty within the context of converged application and infrastructure monitoring to offer end-to-end observability.

Figure 1: eG Enterprise topology map showing Jetty dependencies.

eG Enterprise also offers out-of-the-box dashboards providing proactive real-time monitoring of Netty channels, connections, throughput, and key network metrics.

Figure 2: eG Enterprise dashboards give real-time overviews for Netty

eG Enterprise monitors Jetty deployments within the same single intuitive console as Netty and other technologies. Built-to be domain aware, the eG Enterprise platform includes metrics specific to each technology as well as those in common.

Figure 3: Out-of-the-box eG Enterprise dashboards give overviews of Jetty metrics including thread usage

Scalability Needs

If your application is required to handle a vast number of connections with minimal latency, Netty’s non-blocking I/O model is advantageous. For typical web applications, Jetty’s scalability features should suffice. Some community performance benchmarking data is available, for an example, see: Netty vs Jetty: Hello world performance | by Mayank C | Tech Tonic | Medium.

One benchmark available reported Netty was 2.5 faster than Jetty when tested for simple HTTP requests. However, since the Jetty result was very fast – 5ms, it demonstrates that the performance benefits are not a game-changer in many scenarios, see: https://sourcebae.com/blog/what-is-the-difference-between-jetty-and-netty/ for details of the test.

Development Complexity (and Maintenance)

Jetty is simpler for teams used to web development, while Netty requires more advanced knowledge of network programming. Because Netty solutions are often custom, it’s important to plan for ongoing maintenance and ensure clear architecture and documentation so the product remains manageable, even if key team members leave.

Conclusion on the Choice of Jetty vs Netty

For SREs, choosing between Jetty and Netty depends on your application’s specific needs and your team’s skills.

Jetty is an excellent option for typical web server use, especially for embedded scenarios, offering simplicity and solid performance for web applications when you don’t require a full Tomcat setup.

Netty is perfect for high-performance, low-latency applications, especially those requiring real-time data streaming, like chat apps or gaming servers. Its non-blocking I/O model efficiently handles thousands of connections with minimal latency, ideal for financial trading or live streaming.

As an SRE, you’re not just keeping the lights on—you’re guiding teams toward smarter decisions.

When Jetty and Netty come up in conversation, it’s your understanding of their differences and technology strengths that ensure the infrastructure is both reliable and built for performance.

The post Jetty vs Netty: A Comprehensive Comparison for Site Reliability Engineers appeared first on eG Innovations.

Monitoring and Troubleshooting Nerdio

Karthik G — Mon, 21 Oct 2024 12:47:04 +0000

Today I’ll give a brief overview on monitoring and troubleshooting Nerdio, a .NET application, popular with MSPs (Managed Service Providers) and enterprises using Microsoft Azure Virtual Desktop (AVD). Nerdio is a cloud-based application used by administrators to automate the deployment and management of virtual desktops in Azure. Nerdio Manager is the application an EUC (End User Computing) administrator uses to simplify tasks such as deployment, scaling, and management of virtual desktops within the Microsoft Azure ecosystem.

Why Should I Monitor Nerdio Manager?

Nerdio Manager is a key tool for the operation of Azure Virtual Desktop technology. If Nerdio Manager is down, or is slow to respond, AVD operations will be affected. You may face:

Provisioning Issues: Won’t be able to create, update, or delete VMs, host pools, or user assignments. Any scheduled tasks related to scaling or maintenance will not be executed.
Scaling Issues: Automated scaling operations, e.g., powering on or off VMs based on usage, may not function, leading to higher costs or service disruptions.
Session Management Problems: May not be able to manage user sessions, log off inactive users, or shadow sessions for support purposes.
Performance Issues: If scaling operations fail, users might experience degraded performance due to insufficient resources.
Having to use Manual Interventions: Admins might need to manually intervene to manage resources, perform scaling operations, or troubleshoot issues, which could increase the risk of errors and operational overhead.

Nerdio Architecture

Nerdio is a typical cloud hosted multi-tier application. As with all your cloud-based apps, that you rely on for key workflows in your business, there are a few key things you’ll want to be proactively monitoring, you’ll probably want to:

Monitor Azure Resources used by the Nerdio Manager like you would any other web application deployed on Azure (App, Key Vault, Database, etc.)
Track and alert on Nerdio Manager Health status
Use transaction tracing to identify any issues with the application. Nerdio Manager is a standard .NET application, so adopting APM (Application Performance Monitoring) tools with these capabilities is becoming increasingly common as folks rely more and more on cloud-based tools.
Monitor the Nerdio Manager’s database backend to detect any need for any proactive maintenance.

Although I’ll use Nerdio as an example, this is really an article about how you can monitor all your critical cloud-based apps whether they be .NET, Java, PHP or Node.js.

Monitoring Nerdio

eG Enterprise is unusual amongst tools used by EUC and digital workspace administrators (for monitoring AVD / Citrix / Omnissa) in that it offers converged infrastructure and application monitoring and offers APM capabilities. Other APM tools such as Dynatrace, AppDynamics and Datadog tend to lack EUC domain-aware features so aren’t very widely used by AVD / VDI administrators.

Monitoring the Nerdio Manager App

You can get some basic health information about the Nerdio app (and all your other apps) by monitoring the Azure Subscription and the “Azure App Services”.

Figure 1: Basic Nerdio health information is available via Azure App Services

In Figure 1, you can see all the metrics and events that eG Enterprise has configured automatic monitoring and alerting for. Note that there is enough information to:

Check basic availability and status
Track if the Nerdio Manager app is using too much CPU /memory
Monitor requests to the app and response time
Track if there are many HTTP errors during access to the application
Determine if the Nerdio Manager app is oversized/undersized for your workload

Monitoring Nerdio Manager Health Status

Within eG Enterprise the Nerdio Manager component model tracks the health of the Nerdio Manager.

The SSL certificate check tracks expiry days for the Nerdio Manager’s certificates.

The HTTP test checks the availability and responsiveness of the Nerdio Manager.

You can integrate with Nerdio Manager Health check API to monitor the functioning of the application.

This ensures that Nerdio Manager’s app service has connectivity to the SQL DB, Azure, and the AVD services. If there are any blocks, service unavailability, or networking issues, the health check indicates this. You can check the Nerdio Manager logs for further troubleshooting.

Monitoring Nerdio Related Azure Resources

When the Nerdio Manager application is installed, it creates certain resources in Azure the including: SQL Server and SQL Database (S1), App Service Plan and App Service (B3), Key Vault. Some information on the defaults can be found, here: Nerdio Manager Default Deployment Resources and Costs – Nerdio Manager for Enterprise (getnerdio.com).

Monitoring Azure Key Vault

From the Azure subscription you can also track the availability, responsiveness and operations on the Nerdio Manager’s key vault. See Figure 6 – a screenshot from eG Enterprise showing the data you can monitor.

Figure 6: eG Enterprise automatically monitors the Azure Key Vault usage of Nerdio and uses its AIOps engine to baseline normal behavior providing automatic anomaly detection

If you aren’t familiar with Azure Key Vault metrics, logs and events, see: Monitor Azure Key Vault | Microsoft Learn. If you’d prefer to go down the manual monitoring with Azure native tools route, information is available on the same site on how to set up monitoring for Azure Key Vault.

Right-size Nerdio’s Database

You can also monitor the Nerdio Manager’s Azure Database status, connections, size and resource utilization levels to detect any bottlenecks for database access. See figure: 7.

Figure 7: eG Enterprise monitors database usage of Azure hosted apps such as Nerdio

Nerdio recommend scaling up to a SQL database with 100 DTUs (default is 20 DTUs as shown in the last screenshot) and the App Service plan to at least a S3 or P2V2 when you are managing 200+ AVD session hosts. So, if you are in the zone where you suspect bottlenecks actual data DTU usage is useful information that can save you unnecessarily upgrading Azure dependencies.

Monitoring Nerdio Manager Database

You can use eG Enterprise to monitor Nerdio Manager’s SQL Azure database in an agentless manner. This will allow you to track slowdowns, deadlocks, top queries, sessions, waits and more to identify database bottlenecks.

There may be little you as an end-user can do to resolve such issues. However, collecting this data when troubleshooting Nerdio issues will allow you to pass the issue into Nerdio support. Often end-users use products in slightly different ways to how the vendor has designed or tested them. We see vast quantities of customers use eG Enterprise to identify inefficient SQL (especially in Java healthcare apps – but that’s one for another blog!).

Monitoring Nerdio Manager Transactions

Nerdio Manager uses .NET technology. eG Enterprise Business Transaction Monitoring for .NET applications can be used to track transaction health, if required when troubleshooting Nerdio. Distributed transaction tracing is a technique used in eG Enterprise to identify code-level issues without code changes (i.e. suitable for third-party apps like Nerdio) – Learn more: What is Distributed Tracing? Use Cases and How it fits into APM & Observability | eG Enterprise (eginnovations.com).

AIOps (Artificial Intelligence for IT Operations) for EUC

This example of monitoring Nerdio demonstrates what tools designed for more than just digital workspaces and with APM capabilities offer. The powerful AIOps (Artificial Intelligence for IT Operations) engine within eG Enterprise means monitoring is auto deployed as your systems scale. Monitoring, metric thresholds and alerting are set up out of the box and the platform auto-baselines your specific deployment and applications.

Basic EUC tools usually offer to monitor a handful of metrics (maybe 500 at best) which limits visibility. AIOps allows eG Enterprise to handle and continually monitor hundreds of thousands of metrics, logs and traces. As we saw above, even monitoring a simple .NET solution like Nerdio requires you to monitor all the Azure dependencies and the number of metrics you need soon adds up.

Some information on including APM within an app-centric strategy for EUC is covered in Application-Centric EUC Monitoring is Key to Digital Employee Experience (DEX) | eG Innovations.

Final Thoughts on Nerdio Troubleshooting and Monitoring

If you came here hoping to find out how to use Nerdio to deploy eG Enterprise, you probably want to check out: Using Nerdio Manager to Deploy eG Enterprise for AVD Monitoring (eginnovations.com). Further information on eG Enterprise’s integration with Nerdio is available, here: Nerdio Manager: Simplify & Refine AVD Deployment & Operation (eginnovations.com).

A significant issue when leveraging cloud-based apps occurs when the cloud or its portal goes down. There’s some information on how you can ensure observability on Azure or other cloud outages available here: How to Protect your IT Ops from Cloud Outages (eginnovations.com).

Extend to Monitor Other SaaS/Cloud Applications

Think about all those SaaS tools and Azure hosted apps and services you rely on. The overall methodology highlighted here applies for any cloud hosted SaaS application. A SaaS application can be monitored in many different ways:

Up/down availability,
Resource footprint on the cloud,
Detailed insights into transaction performance,
API integration for health monitoring,
Monitoring its dependencies (e.g., databases)

No more “little black-boxes” in the cloud! No more paying for resources to support SaaS that you don’t actually need! And far less arguing with SaaS vendors as to whether any problem lie within their particular little black-box!

Related Information

The post Monitoring and Troubleshooting Nerdio appeared first on eG Innovations.

Understanding Core Web Vitals – Key Metrics for Optimizing Your Website for Better User Experience

Arun Aravamudhan — Thu, 10 Oct 2024 11:20:41 +0000

Introduction to Google’s Core Web Vitals Metrics

What Are Core Web Vitals?

Core Web Vitals are a set of performance metrics introduced by Google to help website owners and developers improve the user experience. These metrics are:

Largest Contentful Paint (LCP)
First Input Delay (FID)
Cumulative Layout Shift (CLS)
Interaction to Next Paint (INP) (Recently replaced FID)

“Core Web Vitals are a set of real-world, user-centered metrics that quantify key aspects of the user experience.” — Google.

How Core Web Vitals Impact SEO and User Satisfaction

Google’s Core Web Vitals are key metrics that measure the quality of user experience on your website. These metrics focus on three primary aspects: loading performance, interactivity, and visual stability.

For B2B websites, even without relying heavily on search automation, optimizing Core Web Vitals is still important. A faster, smoother user experience builds trust and encourages engagement with business users.

John Mueller, Webmaster Trends Analyst at Google

“Core Web Vitals are part of the experience that helps people stick around on your site, engage with your content, and ideally convert. It’s not just for SEO; it’s good for users.”

For B2C websites, where SEO is a priority, Core Web Vitals have a direct impact on your site’s visibility and Google rankings. By improving these metrics, users are more likely to find your site, stay longer, and enjoy a better overall experience.

Figure 1: Page views often drop if page load times increase, and users become frustrated

Understanding the Key Core Web Vitals Metrics

Largest Contentful Paint (LCP) – What is LCP?

Largest Contentful Paint (LCP) measures the time it takes for the largest visible content element in the viewport to load. This could be an image, video, or a large block of text. LCP is a key measure because it reflects the loading performance of your site and directly impacts user satisfaction. A faster LCP means users can start engaging with your content sooner. Largest Contentful Paint aims to measure when the page’s main contents have finished loading.

First Input Delay (FID) – What is FID?

First Input Delay (FID) measures the time it takes for a page to respond to the first user interaction, such as clicking a button or a link. FID is essential for interactivity; a low FID ensures that users can interact with your page without frustrating delays. FID was replaced in the Core Website Vitals by INP in 2024 (see: Introducing INP to Core Web Vitals | Google Search Central Blog | Google for Developers).

Cumulative Layout Shift (CLS) – What is CLS?

Cumulative Layout Shift (CLS) measures the visual stability of your webpage by tracking how often and how much visible elements unexpectedly shift during page loading. A high CLS score can lead to a poor user experience, as users may accidentally click on the wrong element due to unexpected shifts.

Paul Irish, Developer Advocate at Google Chrome

“Largest Contentful Paint and Cumulative Layout Shift reflect the two most visible parts of the page load experience. They give you real insight into how users experience your site.”

New Metric: Interaction to Next Paint (INP) – What is INP?

What is an “interaction” in the context of INP?

When calculating INP, only the following interaction types are included:

Clicking with a mouse.
Tapping on a device with a touchscreen.
Pressing a key on either a physical or onscreen keyboard.

Interactions such as users hovering, zooming, or scrolling are currently not observed.

Interaction to Next Paint (INP) is a metric that replaced FID. Unlike FID, which measures the delay of the first interaction, INP evaluates the latency of all interactions throughout the lifecycle of a page. Since it is generally accepted that 90% of the time a user spends on a web page is after it has loaded, INP is intended to provide a more comprehensive picture of your site’s interactivity and responsiveness for real users.

A high INP suggests the site feels clunky or unresponsive, making it frustrating for users, while a low INP means the site responds quickly and smoothly to interactions.

Key Differences Between FID and INP

The key differences between First Input Delay (FID) and Interaction to Next Paint (INP) lie in what they measure and when they measure it:

1. What They Measure :

FID : Measures input delay—the time between a user’s first interaction (click, tap, etc.) and when the browser responds. It focuses on the initial delay before the interaction is processed.

INP : Measures responsiveness across all interactions on a page, not just the first one. It tracks how quickly the page visually responds after every interaction, capturing the full user experience.

2. When They Measure :

FID : Only measures the first interaction after the page starts loading. Once that first interaction is recorded, it does not track subsequent interactions.

INP : Monitors all interactions throughout the entire user session. It reflects how consistently responsive the page is over time by focusing on the interaction that took the longest to process.

What is a Rage Click?

A rage click is when a user rapidly clicks on the same spot or element on a webpage out of frustration, usually because the site isn’t responding or functioning as expected. This behavior indicates that something is broken, confusing, or too slow, leading users to repeatedly try interacting with the element in hopes of a response. Rage clicks are a sign of poor user experience and often point to issues like slow loading times, hidden content, or malfunctioning buttons or links. Monitoring rage clicks can help identify and fix these frustrations to improve usability. Rage clicking often correlates with poor INP values.

3. Scope :

FID : Limited to the first input delay after the page becomes interactive. It’s more about the initial perception of page responsiveness.

INP : Provides a broader, more comprehensive view of overall page responsiveness, tracking multiple interactions and visual updates during the user’s entire session.

4. Use Cases :

FID : Helps improve the early stages of page loading, making sure the first interaction isn’t delayed by tasks like heavy JavaScript execution.

INP : Evaluates the quality of the user’s experience throughout their time on the page, ensuring that interactions beyond the initial load are also fast and smooth.

In summary, FID is focused on the first input during page load, while INP gives a more complete picture by assessing the responsiveness of all interactions on the page.
FID can be misleading if only the first input is fast but subsequent interactions are slow, INP reflects the true interactivity of a page by capturing multiple interaction events, making it a better indicator of overall performance.

Summary of Differences between INP and FID

The primary difference between INP and FID lies in how they measure responsiveness:

FID measures only the delay before an interaction starts being processed (the time from when a user first interacts to when the browser can begin handling the interaction).
INP, however, measures the entire interaction process, focusing not just on the initial delay but also on how long it takes for the page to provide visual feedback or respond fully after any interaction. This means INP includes factors like the completion of the response to a user’s input.

What’s new about INP:

Your key takeaways should be:

INP measures the full duration of interaction, not just the delay before it starts (unlike FID).
INP captures delays that users experience throughout the interaction cycle, while FID focuses only on the beginning.

Therefore, while FID looks at when the browser is ready to respond, INP evaluates the overall user experience, making it more holistic.

Figure 2: FID is solely concerned with the initial lag before the browser can begin processing the user’s first interaction.
INP covers the entirety of each interaction, from start to finish, including all visual feedback or user-perceived delays.

What is FCP? How does FCP differ to LCP?

First Contentful Paint (FCP) measures the time from when the user first navigated to the page to when any part of the page’s content is rendered on the screen. For this metric, “content” refers to text, images (including background images), svg elements, or non-white canvas elements. In contrast, Largest Contentful Paint (LCP) aims to measure when the page’s main contents have finished loading.

First Contentful Paint is considered an important, user-centric metric as it measures the first point in the page load timeline where the user can see anything happening upon the screen – a fast FCP makes it more likely the user will stay on your website and a slow FCP will encourage users to abandon the site.

Because FCP only captures the very start of the web page loading experience it means that it can be somewhat meaningless in scenarios when a page shows a splash screen / pop-up (interstitial) or displays a loading indicator.

Best Practices for Optimizing Core Web Vitals Metrics

I’ll now cover some information as to how you can improve your Core Web Vital scores and what a good Core Vital score for each metric is.

“Optimizing for these factors makes the web more delightful for users across all web browsers and surfaces.” — Google.

How to optimize LCP

Here are some optimization tips for LCP:

Remove unnecessary third-party scripts:
Ads, analytics, video embeds, and widgets can drastically slow down your page. Audit and remove scripts that do not add value to the page experience.
Upgrade your web host or use a CDN:
Faster servers or a content delivery network (CDN) can significantly reduce LCP by lowering the time it takes for content to be delivered to the browser.
Use lazy loading:
Defer loading of non-critical images or media until the user scrolls to them. Ensure you do not lazy load above-the-fold content, as this can delay important visual elements. Avoid lazy loading critical elements like hero images, as this can delay LCP and hurt performance.
Prioritize Image Loading:
Ensure images load before rendering tasks to improve LCP. Use fetchpriority, apply fetchpriority=high to load LCP images sooner. Preload critical resources – use link rel=preload to fetch important resources early.
Optimize large page elements (images, videos):
Compress images and video files to reduce load time, prioritize large images (>10,000px²) for faster LCP. Use modern formats like WebP for images and serve different sizes depending on the device resolution. Progressive JPEGs may delay full image loading and negatively impact LCP. Don’t Use Low-Entropy Images for LCP – exclude low-content images, such as simple backgrounds, from being LCP candidates.
Minify and optimize CSS and JavaScript:
Remove unnecessary spaces, indentations, comments, and unused code from CSS and JavaScript files. Ensure that these resources are lightweight to avoid long rendering times.
Enable preloading of important resources:
By preloading the most important CSS, fonts, or large elements, you ensure faster rendering and load time for the main content.
Reuse Server Connections:
Serve critical resources from the same domain to avoid DNS lookups and handshakes. Avoid additional server connections that slow down resource loading.
Use 103 Early Hints:
Preload resources while waiting for the server’s full response.

An incredibly useful and detailed guide to optimizing LCP is also available from Google Engineers Philip Walton and Barry Pollard, covering JavaScript and html level changes you can make to improve performance, see: Optimize Largest Contentful Paint | Articles | web.dev.

What Value of LCP should I Look to Achieve?

Good: ≤ 2.5 seconds
Needs Improvement: 2.5 – 4.0 seconds
Poor: > 4.0 seconds

Key Strategies to Reduce FID

Some best practices to optimize the FID metric, include:

1. Defer or break down JavaScript:
  As JavaScript is single-threaded, any heavy scripts block the browser from responding to user interactions. Defer non-essential scripts and split large JavaScript files into smaller, more manageable chunks. Use Web Workers – offload intensive tasks (e.g., complex calculations, data processing) to Web Workers, which run in the background and free up the main thread.
2. Use browser caching:
  Browser caching stores reusable resources like JavaScript and CSS files locally, so subsequent page visits load faster, reducing the time to interaction.
3. Remove non-essential third-party scripts:
  Like in LCP, unnecessary third-party scripts can delay browser response. Remove or asynchronously load non-critical scripts to free up the main thread.
4. Optimize the order of script execution:
  Prioritize loading scripts that impact user interaction first. Non-essential scripts (like those for ads or analytics) should be deferred until the browser becomes interactive, lazy loading can be used (learn more: Three Types of Lazy Loading in JavaScript with example | by Muhammad Fauzan | Medium).
5. Reduce the impact of large JavaScript bundles:
  Tree-shake JavaScript code and use performance-optimized libraries. Tree shaking is a methodology to remove dead code. This reduces the amount of script that needs to be parsed and executed on page load.

Note: FID (First Input Delay) has recently (Spring 2024) been replaced in the Core Web Vitals set of metrics by INP (Interaction to Next Paint) because INP provides a more comprehensive measure of responsiveness, capturing all interactions, not just the first one.

Understanding Metric Thresholds for FID

1. - Good: ≤ 100 milliseconds
  - Needs Improvement: 100 – 300 milliseconds
  - Poor: > 300 milliseconds

How to Reduce CLS

If you are looking to optimize the CLS score of your web pages, some practical tips for improving CLS include:

1. 1. Set size attribute dimensions for all media (images, videos, iframes):
    Assigning width and height attributes to media ensures the browser knows how much space to allocate, preventing layout shifts during page load.
  2. Reserve space for dynamic content (ads, banners, etc.):
    Ad scripts and other dynamic elements can cause significant layout shifts. Set aside dedicated space in the layout for these elements to load, even before they appear.
  3. Preload key fonts and resources:
    Fonts that load late or asynchronously can cause text to shift when they finally render. Preload critical fonts and assets to stabilize layout quicker.

What is an Interstitial Web Page?

An interstitial webpage (or interstitial) is a web page displayed before or after an expected content page, interstitials often to display advertising or for regulatory reasons (e.g. GDPR), such as to confirm the user’s age (prior to showing age-restricted material) or obtain consent to store cookies. Most interstitial advertisements are delivered by an ad server.

1. 1. Avoid inserting content above existing content:
    If dynamic elements, like pop-ups or interstitials, are necessary, place them below the fold to avoid displacing critical content and impacting the CLS score.
  2. Optimize CSS for animations:
    Avoid abrupt, poorly planned CSS animations or placing CSS inside JavaScript, which could delay the page’s rendering. Follow best practices for clean, lightweight CSS.

Google engineering also offer a detailed guide to optimizing CLS with JavaScript code and mark-up level details, see: Optimize Cumulative Layout Shift | Articles | web.dev.

Metric Thresholds for CLS

1. - Good: ≤ 0.1
  - Needs Improvement: 0.1 – 0.25
  - Poor: > 0.25

How to Optimize INP

It is important to look at optimizations associated both with JavaScript but also with rendering efficiency.

1. 1. Optimize long tasks. You fundamentally want to avoid blocking the main thread. Yield to the main thread for critical, user-facing tasks. Investigate prioritizing tasks with postTask() and evaluate the use of scheduler.yield().
  2. Optimize input delay. Input delay is the period of time beginning from when the user first interacts with a page—such as tapping on a screen, clicking with a mouse, or pressing a key—up to when the event callbacks for the interaction begin to run. Every interaction begins with some amount of input delay. If input delay is contributing significantly to INP times, there are optimization options available.
  3. Script evaluation and long tasks. Look to implement the advice given by Google engineering, see: Script evaluation and long tasks | Articles | web.dev.
  4. Use web workers to run JavaScript off the browser’s main thread. As with FID this can yield significant benefits.
  5. Use content-visibility, defer rendering of off-screen elements for improved page responsiveness.
  6. Avoid large, complex layouts and layout thrashing. Layout Thrashing happens when you request layout information of an element or the document, while layout is in an invalidated state. Layout costs are dependent on the number of elements that require layout, which is a byproduct of the page’s DOM size and the complexity of those layouts. Group multiple DOM changes using document fragments to reduce re-renders.
  7. Preload critical resources. Ensure fonts, images, and other critical elements are loaded early to avoid layout shifts.
  8. Don’t break up above-the-fold CSS. Avoid lazy loading or splitting critical CSS, as it causes layout shifts.
  9. Reserve space for lazy-loaded images: Avoid layout shifts by reserving space for images before they load.
  10. Don’t add excessive ads. Too many ads can affect page speed, especially on mobile, impacting INP and overall performance.
  11. Reduce the scope and complexity of style calculations. Simplifying CSS selectors can help speed up your page’s style calculations. Warning: older browsers can be quite poor at handling selectors whilst many newer browsers have optimizations. If you are delivering web pages to the general public, who may well be using older browsers – you will probably want to ensure your QA reflects this.
  12. Minimize DOM sizes. Large DOM sizes affect interactivity (I’ve added a section below that explains DOM and its relationship to website performance). Always look to minimize the size of the DOM.
  13. Investigate client-side rendering of HTML when troubleshooting interactivity issues. If your website depends heavily on client-side rendering and you have observed poor INP values in your field data. Techniques such as providing as much HTML from the server as possible and limiting the amount of DOM nodes created on the client can help.
  14. Keep an eye on upcoming changes. Stay updated on Google’s guidelines as INP becomes a more prominent metric, it is likely the optimization advice will change.

Again, Google engineering have a comprehensive guide for optimizing and troubleshooting INP metrics, see: Optimize Interaction to Next Paint | Articles | web.dev.

Target Values for the INP metric

1. - Good: ≤ 200 milliseconds
  - Needs Improvement: 200 – 500 milliseconds
  - Poor: > 500 milliseconds

Figure 3: Continually monitoring the Core Web Vitals metrics via dashboards with a tool such as eG Enterprise gives administrators useful insights

Figure 4: Core Web Vitals can be used to pinpoint individual web pages that would benefit from optimizations (screenshot from eG Enterprise’s monitoring console)

Understanding the DOM and Its Role in Core Web Vitals

The Document Object Model (DOM) is a fundamental concept when discussing Core Web Vitals. The DOM is a programming interface that represents the structure of a webpage as a tree of objects, with each object corresponding to a part of the document, such as elements, attributes, and text.

Why the DOM Matters

1. - Performance: The time it takes to parse the HTML and build the DOM directly impacts metrics including LCP and FCP (First Contentful Paint). A large, complex DOM can slow down the rendering process.
  - Interactivity: The responsiveness of the DOM, which includes how quickly it can respond to user input, affects metrics like FID and INP. A bloated DOM can lead to delays in interaction handling.
  - Visual Stability: Changes to the DOM can cause unexpected shifts in content, impacting the CLS score.

Optimizing the DOM

Some tips for optimizing the DOM to improve web page performance, include:

1. - Keep the DOM Lean: Avoid unnecessary elements and reduce the depth of the DOM tree.
  - Efficiently Manage DOM Updates: Batch updates to reduce reflows and repaints.
  - Use Virtual DOMs: Frameworks like React use a virtual DOM to minimize direct manipulations, improving performance.

You can learn more about the history and specification details of the DOM in Document Object Model – Wikipedia, What is the DOM? and Introduction to the DOM – Web APIs | MDN (mozilla.org).

Next Steps – Advanced Analysis and Optimizations for Core Web Vitals

Barry Pollard, Web Performance Expert and Author

“Measuring and improving Core Web Vitals is not a one-time effort. It should be part of a continuous process to ensure that your site consistently delivers a great user experience.”

Understanding Core Web Vitals at a granular level is crucial, and that’s where multi-dimensional analysis comes in. By slicing and dicing data across various dimensions—such as browser type, device, or geographic location—you can uncover hidden performance issues that may only affect specific user segments. This approach allows for more targeted optimizations, ensuring a seamless experience for all users, no matter their setup or location.

In the next blog, we will dive deeper into how multi-dimensional analysis can help you fine-tune your Core Web Vitals performance across different user environments.

Conclusions on Core Web Vitals

Continually monitoring Core Web Vitals is essential because they directly impact user experience and SEO rankings. Regular tracking helps identify performance issues that can degrade page load speed, interactivity, and visual stability. As user behavior, site content, and browser technologies evolve, ongoing monitoring ensures sites remain optimized, competitive, and compliant with Google’s search algorithms. It also helps you quickly detect and root cause issues so you can resolve them and address regressions or bottlenecks that could lead to higher bounce rates, lower engagement, and reduced conversions.

Addy Osmani, Engineering Manager at Google

“Core Web Vitals are the best proxy we’ve ever had for user experience performance. It’s not about optimizing for scores but about ensuring users feel your site is fast and responsive.”

However, the Core Web Vitals metrics often only act as a canary in a coal mine, i.e. as a symptom of an issue with the platform, hardware, networking and so on, that is serving up the web page. I’ll be covering more on this (in future articles) and how to ensure slow web page tickets are diverted to the infrastructure or database teams if the problem actually lies in their remit.

The post Understanding Core Web Vitals – Key Metrics for Optimizing Your Website for Better User Experience appeared first on eG Innovations.

Top Reasons to Move to AVD (Azure Virtual Desktop)

Rachel Berry — Thu, 03 Oct 2024 06:48:06 +0000

There are a lot of articles along the lines of – “Top Reasons to Move to AVD” or “Top Benefits of AVD”. Having read a fair few, I suspect the Chat-GPT fairies have been hard at work! A veritable mush of phrases such as “flexibility”, “security”, “user experience” and so.

Let’s dissect the question – “What are the reasons to move to AVD?”. Having read a number of these stock-type articles it’s clear that the answers are answers to a wider range of questions, such as:

What are the benefits of digital workspace solutions?
What are the benefits of a virtualized digital workspace solution?
What are the benefits of a cloud virtualized solution vs an on-prem VDI solution?
If you are going for a hyperscale vendor’s Cloud DaaS solution – why move to AVD vs AWS?
And if you’ve set your heart on an Azure digital workspace solution, why choose AVD vs. Windows 365 Cloud PC

What are the benefits of digital workspace solutions?

Digital workspaces are frequently marketed with slogans such as “Work anywhere, anytime on any device”.

The “work anywhere, anytime on any device” messaging has long been a mainstay of digital workspace marketeers…

Primarily, digital workspaces provide users (usually employees) with the flexibility to work from anywhere. If you have an internet connection, you can log in and tackle your tasks from whatever location you choose. This level of flexibility often translates into a healthier work-life balance and better employee experience.

Digital workspaces can free many organizations from the constraints of physical sites and buildings. This can lead to cost savings and expansion possibilities. For example, many universities use digital workspaces to deliver practical lab classes using specialist software such as AutoCAD remotely with students accessing licensed software via their own personal devices from satellite campuses or student dormitories.

There are of course many ways to achieve this utopia of “anywhere, anytime, any device”, and they don’t have to include DaaS, VDI, cloud or virtualization. SaaS based solutions without DaaS are common (Microsoft 365 is a popular option). There are also bare-metal/physical solutions to facilitate remote working or application access – moonshot (basically a rack of laptops in a datacenter) and the “GoToMyPc” type options (the use case whereby you log into a workstation sat under your office desk). Pre-COVID, entire organizations based remote access around VPNs – less so now.

Most alternatives to virtualized VDI/DaaS simply aren’t very good, particularly for larger enterprises. So, some of the reasons listed as reasons to move to AVD are actually just arguments that you need some of the benefits of virtualization above alternatives, which whilst AVD of course offers – certainly aren’t unique to AVD.

What are the benefits of a virtualized digital workspace solution?

For a long time, digital workspaces essentially meant Citrix or VMware on-prem based around hypervisor virtualization on XenServer or ESXi (later Hyper-V and Nutanix AHV gained some traction). Virtualization platforms using hypervisors offered (and still do offer) several benefits, some of which are somewhat over-sold, benefits suggested in those stock articles widely included:

Resource Efficiency (Consolidation): Virtualization allows multiple virtual machines (VMs) to run on a single physical server, optimizing hardware usage and reducing costs. Virtualization also allows organizations to repurpose older devices and extend their lifespan.
Scalability: Hypervisors enable the easy scaling of resources, allowing businesses to add or reduce computing capacity as needed.
Simplified Management: Virtual machines can be centrally managed, making it easier to monitor, update, and maintain systems.
Isolation and Security: Each VM is isolated, reducing the risk of one VM affecting others and improving security. In a virtualized environment, you control what apps are accessed, the user may be remote (perhaps an outsourced contractor), your data stays in your data center and the contractor has limited rights. You can restrict them from downloading files etc.
Disaster Recovery: Virtualization simplifies backups, migrations, and failover strategies, enhancing business continuity. This is a genuinely strong argument for virtualization and cloud usage.
Cost Savings: By reducing the need for physical hardware and improving resource utilization, virtualization lowers infrastructure and maintenance costs. This is probably one benefit of virtualization that has always been over-sold – in practice, many virtualization projects result in no cost-savings once VDI/hypervisor licensing is included but are implemented for other benefits.

Again – these are all benefits you get with a move to AVD above using physical PCs/Desktops because AVD is based on Azure’s virtualized cloud platform built around Hyper-V, but they aren’t really arguments for using AVD per se.

What are the benefits of a cloud virtualized solution vs an on-prem VDI solution?

A cloud virtualized solution can offer certain advantages over an on-premises VDI (Virtual Desktop Infrastructure) solution, benefits frequently mentioned include:

Scalability & Time to Deploy: Cloud solutions can scale resources up or down quickly based on demand, while on-prem VDI requires purchasing and maintaining additional hardware.
Cost Efficiency: Cloud based DaaS reduces upfront capital expenses. On-prem VDI involves significant investment in hardware and infrastructure. Many articles I read mooting reasons “to move to AVD” mentioned that cloud virtualized solutions typically operate on a pay-as-you-go model, citing that you only pay for what you use and can scale up and down dynamically to minimize your costs. Ahh! If only it was so easy – in practice to get the best price and to guarantee access to Azure’s (large but still finite) resources many customers find themselves using “On-Demand Capacity Reservations” within Azure and “Reserved Instances” to reduce cost and guarantee availability.
Access to Windows Multi-session: Microsoft don’t allow customers to run Windows Enterprise multi-session in production environments outside of the Azure Virtual Desktop service. Only Microsoft or the Azure Virtual Desktop Approved Providers, Citrix and VMware, can provide access to the Azure Virtual Desktop service. It’s against the licensing agreement to run Windows multi-session outside of the Azure Virtual Desktop service for production purposes. So basically, if you want this you have to move to some genre of AVD as it isn’t available on-premises or in other clouds.
Maintenance: In the cloud, the service provider handles infrastructure maintenance and upgrades, while on-prem VDI requires dedicated IT staff for upkeep and management.
Accessibility: Cloud solutions provide easier access from any location or device, ensuring better support for remote or distributed workforces, while on-prem VDI often requires complex VPN or Gateway setups for off-site users.
Disaster Recovery: Cloud platforms generally offer built-in backup and disaster recovery features, while on-prem VDI may require additional investment in disaster recovery infrastructure.
Security: Cloud providers offer advanced security measures like encryption, continuous monitoring, and compliance with industry standards, which may be harder to implement and maintain in an on-prem VDI setup.
Rapid Deployment: Cloud environments can be deployed and configured much faster than setting up an on-premises VDI, which involves purchasing, installing, and configuring hardware.

So generally, lots of good stuff but nothing you couldn’t achieve via numerous other alternatives to AVD. In fact, there’s a lot more choice than ever beyond the cloud offerings of the traditional names such as Microsoft, Citrix and Omnissa (was VMware) – with a whole new generation of VDI/DaaS vendors—Apporto, Dizzion-Frame, Cameyo, Workspot, and Sonet.io. There’s never been so much choice for Cloud VDI or DaaS.

We’ve some helpful information for those looking to move from on-prem solutions to cloud, see: White Paper | Top 10 Requirements for Performance Monitoring of Cloud Applications and Infrastructures (eginnovations.com).

If you are going for a hyperscaler vendor’s Cloud DaaS solution – why move to AVD vs AWS?

For me this is the crux of any serious evaluation on the benefits of a move to AVD. What can you get only with AVD – pure differentiating features of AVD or characteristics of your organization that make Microsoft and Azure an optimal choice.

AWS WorkSpaces is really the only comparable technology and platform at the moment. Google isn’t really playing in the EUC VDI-replacement market and other alternatives and the vendors offering them differ significantly.

Let’s consider the choice between Azure Virtual Desktop (AVD) and Amazon WorkSpaces, here are key reasons we often hear for opting for AVD:

Integration with Microsoft Ecosystem: Many organizations describe themselves as a “Microsoft Shop”. AVD is deeply integrated with Microsoft’s suite of products such as Entra ID (was Azure Active Directory), Microsoft 365, and Teams, making it an attractive choice for businesses already using these services. Beyond this, organizations developing products for Windows are already reliant on Microsoft products or Azure services – AVD integrates smoothly with Azure DevOps for CI/CD pipelines and project management. AVD works well with Azure SQL databases or SQL Server hosted on Azure, making it easier to manage databases and applications with lower latency, especially if they’re already part of your cloud architecture. AVD is also optimized for Microsoft Visual Studio, providing developers with a familiar and high-performance development environment for building and debugging applications remotely. Those looking at AVD for development teams will probably want to explore options with: Microsoft Dev Box – Dev Workstation in the Cloud | Microsoft Azure.
Integration with Microsoft AI Services: I’ve separated this one although it really is just a detail of the previous point, simply because it is of so much interest. Microsoft is one of a few big vendors with significant AI services and for those looking to implement applications incorporating AI, Azure is a safe bet. Joseph Landes, CRO at Nerdio has written in detail on this driving factor – see: Nerdio CRO Landes: AI ‘Another Reason That Customers Should Bet On The Cloud, Bet On Azure’ (crn.com).
Licensing Flexibility: This has historically been a huge issue. AVD as a Microsoft product was able to provide more cost-effective licensing, particularly for Microsoft 365 and Windows users. Pressure from consumers, legal challenges and governmental competitive trading regulation have changed the situation somewhat particularly with respect to AWS – see: Microsoft reaches deal with European cloud players over privileging Azure – Techzine Global. In particular, the very recent announcements around Office 365 / Microsoft 365 on Amazon WorkSpaces have levelled the playing field somewhat (more details in: Amazon WorkSpaces finally supports Office 365, but why now? | TechTarget). Microsoft do not allow customers to run Windows Enterprise multi-session in production environments outside of the Azure Virtual Desktop service. Only Microsoft or the Azure Virtual Desktop Approved Providers, Citrix and VMware, can provide access to the Azure Virtual Desktop service.
Performance and Latency: For organizations with significant infrastructure already in Azure, AVD may offer lower latency and better performance due to proximity to Azure services and reduced network hops. We see lots of organizations choosing Amazon WorkSpaces for similar reasons, i.e. because they already have huge datastores in AWS.
Datacenter Location: Beyond the latency benefits of choosing your nearest datacenter location (and if that happens to be Azure – it might influence you to choose AVD) – many countries and industries have regulations and data compliance laws that mean data must stay within a certain geographic locality – if Azure happens to have a datacenter where an alternative doesn’t Azure may become the natural choice. In practice, many end up choosing alternative DaaS/VDI options from local MSPs (Managed Service Providers) and so on because AVD is available only in certain countries.
Windows Experience and Optimizations: AVD provides an optimal native Windows 10/11 experience. As the Microsoft DaaS solution, it is likely that AVD will always have some advantages and be first to offer optimizations for other MS solutions such as Teams. AVD also offers multi-session Windows desktops – an option only available on Azure and not available on-premises.
Security and Compliance: AVD leverages Azure’s extensive security and compliance certifications and services, which can be a significant benefit for industries with strict regulatory requirements. Microsoft’s reputation and past track record mean it is considered in many geographies as a rock-solid choice security wise. AWS’s reputation and brand is probably comparably well-regarded, other vendors possibly less so. If organizations are already using Microsoft for on-premises authentication e.g. Active Directory – AVD and Azure offers a ready-to-go hybrid option with Entra ID and the thing that was called Azure AD Connector (now Entra ID Connect – Microsoft Entra Connect and Microsoft Entra Connect Health installation roadmap. – Microsoft Entra ID | Microsoft Learn).
Ease-of-Setup: The Azure portal is pretty friendly, particularly for SMEs (Small and Medium Enterprises), if you are a traditional business with office staff needing to spin up 5 new desktops for your summer interns – it’s very easy to do so. There’s a bit of basic monitoring available. Although manual, at small scale the skills barrier to entry is extremely low – any IT team without cloud experience could adopt it. The barrier to entry for AWS WorkSpaces is somewhat higher and does require a bit of AWS knowledge. There is also a strong ecosystem of third-party GUI-led tools around AVD – Nerdio for Management, eG Enterprise for monitoring and so on, that overcomes some of the limitations of the Azure native tooling. For organizations not looking to go down the Terraform or DiY (Do-it-Yourself) paths, this can make AVD an appealing choice.

And if you’ve set your heart on an Azure digital workspace solution, why choose to move to AVD vs. Windows 365 Cloud PC?

Thankfully this question has been very thoroughly covered elsewhere, many times. There’s a great comprehensive article from Nerdio on this choice, see: Windows 365 vs. Azure (Windows) Virtual Desktop | Nerdio (getnerdio.com). The good folks at GO-EUC have done some impressively thorough performance comparison and in their latest article have done some very detailed cost comparisons – see: Unveiling the True Cost: Single-User Microsoft Azure Virtual Desktop vs. Windows 365 | GO-EUC. Update Nov 24: A few more figures published on LI from a session from Ruben Spruijt and Dr. Benny Tritsch – see: https://www.linkedin.com/posts/harjinder-t-187a042_eucforum-windows365-euc-activity-7268220614782468096-ldhN, comments on the post make an interesting read too.

Plus, a recent video from another Nerdio staffer, see: https://youtu.be/7hR2P7sDqQ8. The video basically sums up the comparison as:

Windows 365 Cloud PC:

Subscription Based: Pay a flat monthly fee.
Simplified Management: Managed through Intune, ideal for existing Intune environments.
Ideal Use Case: Organizations needing easy, streamlined deployment with minimal admin overhead.

Note: There are some nuances on licensing and scalability – covered in: Monitoring Windows 365 Cloud | eG Innovations.

Azure Virtual Desktop (AVD):

Consumption Based: Pay for what you use. Alibi with the caveat that many end up in the weeds of “On-Demand Capacity Reservations” within Azure and “Reserved Instances” to reduce cost and guarantee availability.
Customizable and Enterprise Features: Suitable for companies needing enterprise features found in on-prem VDI – such as GPUs, remote apps, and tailored setups.
Best for: Organizations with specific needs or looking for specific enterprise features typically found in on-prem VDI.

Some CAD / AEC software demands a GPU and these power users need GPU-enabled desktops, such as those offered by AVD, above lighter weight alternatives

In practice many organizations leverage both Microsoft offerings in parallel and the availability of Cloud PCs may be an additional benefit of choosing AVD.

Conclusions on whether you should move to AVD

“It depends”, I hate that phrase! But with so many next-generation alternatives available there is a lot of choice now available, it just depends on which benefits you actually need and what you can afford.

The post Top Reasons to Move to AVD (Azure Virtual Desktop) appeared first on eG Innovations.

Cloud Observability vs Monitoring: A Practical Guide to Go Beyond Cloud-Native Tools

Miguel Yap — Thu, 26 Sep 2024 12:37:21 +0000

As organizations move their application workloads to the cloud, understanding the difference between cloud observability vs monitoring is crucial to ensure optimal performance and seamless operations. While both concepts are often mentioned in tandem, they serve different purposes, and mastering each can help organizations thrive in increasingly complex cloud environments.

In our new free eBook, “How to Achieve Full Observability in the Cloud: Nine Practical Steps to Go Beyond Cloud-Native Monitoring”, we cover the nuances of these two concepts and explain with practical steps how to achieve full observability – the key to success in any cloud migration project.

Cloud Monitoring: The First Step

Monitoring is the foundation of any robust cloud strategy. It involves tracking metrics, logs, and trace data from various layers of your cloud infrastructure and the applications running on it. For example, monitoring can provide you with insights into the performance of CPU usage, memory consumption, or response times of critical applications.

While cloud monitoring is essential, it has its limitations. Monitoring only tells you what is happening. It does not explain why something is happening or provide visibility into the deeper layers of your cloud architecture to pinpoint root causes.

That’s where cloud observability comes in.

Cloud Observability: The Next Evolution

The difference between cloud observability vs monitoring lies in the depth of insight. Observability goes beyond simple metric tracking. It is about having complete visibility into the internal state of your cloud systems by observing and understanding the complex interactions within your services.

Achieving full observability means being able to:

Monitor every layer and every tier of your service delivery chain
Obtain cloud-specific insights into different environments like AWS (Amazon Web Services), Azure, and Alibaba Cloud
Gain macro views of service topology for quick problem demarcation
Access deep application visibility to spot anomalies in real-time
Leverage AIOps (Artificial Intelligence for IT Operations) to automate root-cause diagnostics

With observability, you gain the ability to troubleshoot more effectively, optimize performance, and reduce downtime. It helps you answer critical questions like why a service is underperforming, where a bottleneck is occurring, or how a failure can be prevented in the future.

The Importance of Full Observability in Cloud Migration

For businesses undergoing cloud migration, monitoring alone is not enough. Observability becomes essential as it provides a 360-degree view of the entire cloud environment. Our eBook explains how incorporating key observability benefits like resource right-sizing, cost optimization, and outage visibility can make a significant impact on the success of cloud migration projects.

By leveraging observability, organizations can:

Proactively detect and resolve issues before they impact end users
Right-size resources for optimal performance and cost-efficiency
Maintain compliance with robust tracing and auditing capabilities

If you are ready to take your cloud strategy to the next level, this new eBook – “How to Achieve Full Observability in the Cloud: Nine Practical Steps to Go Beyond Cloud-Native Monitoring” is must-read. Whether you are just beginning your cloud migration journey or looking to optimize your current cloud infrastructure, this guide will help you navigate the complexities of cloud observability vs monitoring and set you up for long-term success.

To learn more about eG Innovations solutions for cloud, hybrid-cloud and multi-cloud – please visit: Public, Private, And Hybrid Cloud Monitoring Tools.

The post Cloud Observability vs Monitoring: A Practical Guide to Go Beyond Cloud-Native Tools appeared first on eG Innovations.

Exception Monitoring in Java – A Guide to Handling Java Exceptions

Arun Aravamudhan — Mon, 23 Sep 2024 13:07:10 +0000

Why is Exception Monitoring in Java Important?

Exception monitoring in Java plays a vital role in Java application performance monitoring by providing real-time insights into the health and stability of the application. Java is now the backbone of many critical and complex business applications in sectors such as banking, healthcare, finance, retail, and e-commerce. The complexity of these systems is also compounded by the fact that they involve distributed Java microservices that communicate across various layers. When an exception occurs in one service, it can cascade to other upstream services. This can potentially cause widespread disruptions affecting business and user experience.

Minor issues can rapidly escalate and propagate quickly in the absence of full observability and proactive monitoring. Exception monitoring in Java plays a vital role in Java application performance monitoring by providing real-time insights into the health and stability of the application.

Key Reasons Why Exception Monitoring in Java is Essential

Monitoring exceptions in Java is essential for several key reasons:

Proactive Issue Detection: Exception monitoring enables developers to identify issues before they become major problems. By capturing and analyzing exception data, developers can trace performance bottlenecks to their root causes, allowing for early detection and proactive prevention of errors.
Performance Penalty: Exceptions in Java are expensive and shouldn’t be used for flow control. When an exception is thrown, it interrupts the JVM’s optimized execution, leading to performance issues. The JVM has to stop optimizations, handle the exception, and deal with memory cleanup, which slows down the system.
Root Cause Analysis: Java exception monitoring helps identify the root causes of errors, whether from code defects, configuration issues, or environmental factors.
User Experience: Frequent and unresolved exceptions can lead to poor application performance, affecting user satisfaction. Monitoring ensures that exceptions are quickly identified and addressed to maintain smooth performance and improve the user experience.
Trend Analysis: By tracking exceptions over time, you can identify recurring issues or bottlenecks in your applications and take corrective actions.

What is a Java Exception?

In Java, an exception is an event that disrupts the normal flow of a program’s execution. This event occurs when the JVM (Java Virtual Machine) detects an error or an unexpected condition, such as invalid data or system issues. Java exceptions are handled through a structured error-handling mechanism.

Top Java Performance Problems & How to Fix them | eG Innovations

Common Causes of Java Exceptions:

Common reasons that exceptions are thrown within Java code, include:

Understanding the common causes of exceptions is crucial for effectively handling exceptions in Java applications. Here are some common issues that can lead to exceptions.

1. Network Drops

Connectivity issues can disrupt I/O operations, leading to exceptions when the application attempts to read from or write to a network resource. This can occur due to:

Unstable Internet Connections: Fluctuations in network connectivity can cause timeouts or interruptions in data transmission.
Server Downtime: If the server is temporarily unavailable, attempts to connect can throw exceptions like SocketException.
Firewall Restrictions: Firewalls may block certain ports or protocols, preventing access to required resources.

2. Invalid Input

Exceptions can arise from user or system-provided data that doesn’t meet expected formats or validation criteria. Common scenarios include:

Type Mismatches: Entering a string where a number is expected can trigger a NumberFormatException.
Out-of-Range Values: Providing values outside the acceptable range, such as negative numbers for age, can lead to IllegalArgumentException.
Malformed Data: Input that does not conform to expected patterns, such as an improperly formatted email address, can cause validation exceptions.

3. Non-existent Files

File operations that attempt to access files that are missing or have been moved can result in exceptions such as:

FileNotFoundException: This occurs when the application tries to read a file that does not exist at the specified path.
IOException: General input/output errors can occur if there are issues reading from or writing to a file, such as lack of permissions or disk space.

4. Code-level Bugs

Logical or syntactical errors in the code can lead to exceptions during execution. These bugs can manifest in various ways:

Null Pointer Exceptions: Attempting to access methods or properties on a null object can throw a NullPointerException.
Index Out of Bounds: Accessing an array or list with an invalid index can result in an ArrayIndexOutOfBoundsException.
Infinite Loops: Logical errors that create infinite loops can eventually lead to a StackOverflowError due to excessive recursion.

5. Resource Exhaustion

Applications can run into exceptions when system resources are depleted, such as:

OutOfMemoryError: This occurs when the Java Virtual Machine (JVM) cannot allocate an object because it has run out of memory. We’ve got some more information on troubleshooting “the dreaded OutOfMemoryError” available, here: Java Application Monitoring – How IT Ops can Diagnose Memory Leaks at Scale.
File Descriptor Limits: Exceeding the maximum number of open file descriptors can lead to IOException when trying to open new files.

6. Database Issues

Interacting with databases can introduce exceptions due to:

Connection Failures: Issues connecting to the database, such as incorrect credentials or unreachable servers, can throw SQLException.
Query Errors: Syntax errors in SQL queries or violations of database constraints (like unique constraints) can also result in exceptions.

In large Java applications, even small errors can spread through the system, causing slow performance and bad user experiences. Without good monitoring, these problems can go unnoticed until they cause major disruptions. Exception monitoring with transaction tracing helps teams catch and fix errors early, stopping issues before they affect the application’s stability and users.

Figure 1: Visualization of Java Application Transaction Execution Using Transaction Tracing
Note how the observability tool clearly pinpoints the source of an error, which has a cascading effect on a particular user. SREs and AppOps can quickly diagnose and resolve the root cause before it impacts other users.

Simply identifying the occurrence of an exception isn’t enough—knowing the precise line of code where the issue originated allows teams to quickly address the root cause. This level of granularity is key to maintaining performance, minimizing downtime, and enhancing the user experience.

Figure 2: Code-Level Precision in Exception Monitoring in Java

SREs and IT Ops need an observability tool that identifies the exact line of code where an exception occurred. This provides them with the detailed insights needed for quick and effective issue resolution.

7. Environmental Factors

Sometimes, exceptions are caused by external factors that are outside the application’s control:

Configuration Issues: Incorrect or missing configuration files can lead to ConfigurationException or similar errors.
Dependency Failures: If an external service or library that your application depends on fails or changes unexpectedly, it can lead to runtime exceptions.

Java Exceptions vs Errors – What is the Difference?

In Java, both exceptions and errors are abnormal conditions that disrupt the normal flow of a program. However, they differ in their causes, how they’re handled, and their impact on the application:

Exceptions: These are conditions that a program can handle, such as file not found, invalid input, or database connection issues. They are typically recoverable by applying appropriate handling strategies.
Errors: Errors represent more serious problems, usually outside the program’s control, such as OutOfMemoryError. These are generally unrecoverable and signify critical issues in the JVM (Java Virtual Machine) environment.

Examples of JVM Errors:

Two very common JVM exceptions, that you will invariably encounter at some point, are:

OutOfMemoryError: Thrown when the Java Virtual Machine can no longer allocate an object because it has run out of memory.
StackOverflowError: Occurs when a method recurses too deeply.

The main difference is that exceptions can be anticipated and potentially fixed by the application, while errors usually can’t be recovered from and signal deeper issues with the system or JVM.

The Hierarchy of Java Exceptions

Java exceptions are structured in a hierarchy, starting from the Throwable class:

Throwable
- Error (Unrecoverable issues, e.g., JVM crashes)
- Exception
  - Checked Exceptions: Must be caught or declared (e.g., IOException).
  - Unchecked Exceptions: Also known as Runtime Exceptions, they don’t need to be declared or caught (e.g., NullPointerException, ArrayIndexOutOfBoundsException)

Unchecked Exceptions in Java

Unchecked exceptions in Java typically indicate programming errors or flaws in the logic of the application. They occur due to conditions like:

1. NullPointerException

Description: This exception is thrown when an application attempts to call methods on a null object reference.
Example: NullPointerException can occur when trying to call a method on an object that has not been initialized or has been set to null.

2. ArrayIndexOutOfBoundsException

Description: This exception is thrown when an application attempts to access an array with an index that is out of range.
Example: ArrayIndexOutOfBoundsException can occur when trying to access an array element with an index that is less than 0 or greater than or equal to the length of the array.

3. IllegalArgumentException

Description: This exception is thrown when an application passes an argument to a method that does not meet the expected criteria.
Example: IllegalArgumentException can occur when passing invalid or inappropriate arguments to a method, such as passing a negative number where a positive number is expected.

Characteristics of Unchecked Java Exceptions

Key features of unchecked Java exceptions, include:

No Forced Handling: Unlike checked exceptions, unchecked exceptions do not require explicit handling by the developer. The JVM does not enforce the declaration of these exceptions in method signatures.
Deep-Rooted Problems: These kinds of exceptions often signal deep-rooted problems in the code. They can be harder to catch because they arise from fundamental flaws in the application’s logic.
Runtime Errors: Unchecked exceptions are runtime errors, meaning they occur during the execution of the program, rather than at the time of compilation.

Exception Handling in Java

Exception handling in Java revolves around four key components:

Try: Defines the block of code that is watched for exceptions.It means we can’t use try block alone. The try block must be followed by either catch or finally.
Catch: Handles the exception if one occurs in the try block It must be preceded by a try block which means we can’t use catch block alone. It can be followed by finally block later.
Finally: Optional block that executes after try/catch regardless of the outcome. Putting cleanup code in a finally block is always good practice, even when no exceptions are anticipated.
Throw/Throws: Used to explicitly throw an exception or declare an exception in a method signature.

What are Exception Handlers in Java?

Exception handlers are blocks of code that manage and mitigate the effects of an exception. When an exception occurs, the handler either attempts to resolve the issue and / or logs it for further analysis.

The JVM provides various runtime services to Java programs. This includes exception handling, which allows programmers to catch and handle exceptions that occur during program execution.

Why Exception Handling is Important in Java:

Robustness: Ensures that the application can handle unexpected scenarios without failing completely.
Error Tracking: Helps developers identify and fix bugs more easily.
Recovery: Allows programs to recover from certain errors without interrupting user experiences.
Logging: Captures information about exceptions for troubleshooting and monitoring purposes.

Final Thoughts on Exception Handling and the Value of Exception Monitoring in Java

Exception handling is more than just catching errors in Java code—it’s about designing resilient Java applications that can gracefully handle failures and recover from issues. With proper exception handling mechanisms and by putting in place exception monitoring in Java to identify and pinpoint the root-cause issues, you can build robust applications that minimize downtime, enhance user experiences and use resources optimally.

Next Blog: How eG Enterprise Can Monitor and Resolve Exceptions in Java Applications at Scale

In our next Java blog, we will explore how eG Enterprise provides comprehensive monitoring of Java applications, enabling teams to not only track exceptions in real-time but also gain insights into the root cause and resolve them efficiently.

Figure 3: In our next article, we’ll cover the built-in reports from eG Enterprise that track Java exceptions. You’ll probably recognize some of the exceptions described in this article.

Figure 4: In our next blog, I will demonstrate how SRE and IT Ops can leverage simple dashboards and reports, without any coding knowledge, to identify the root cause of Java issues and hold App Dev teams and third-party vendors accountable.

eG Enterprise is an Observability solution for Modern IT. Monitor digital workspaces,
web applications, SaaS services, cloud and containers from a single pane of glass.

Free Trial See the platform

Learn More:

If you enjoyed this blog, you might be interested to read some others:

The post Exception Monitoring in Java – A Guide to Handling Java Exceptions appeared first on eG Innovations.

5 Essential Questions for Developing an Effective AVD Monitoring Strategy

Geert Geboers — Fri, 13 Sep 2024 09:34:22 +0000

Is your AVD monitoring strategy truly effective? As organizations increasingly adopt Azure Virtual Desktop (AVD) to support remote work, ensuring a seamless and secure user experience becomes a priority. A robust AVD monitoring, and observability strategy is essential to achieve this, allowing you to maintain performance, security, and user satisfaction across your virtual desktops and apps.

But where do you start? Here are five essential areas to question when developing your AVD monitoring strategy to ensure you are covering some critical requirements.

1. Are You Monitoring AVD to Know When It Is Working and When It Is Not? What Is the Performance and User Experience from Each Location You Care About?

The most fundamental aspect of any AVD monitoring strategy is ensuring that you have real-time insights into the availability and health of your Azure Virtual Desktop environment. AVD performance monitoring is not just about knowing when things are working; it is equally about being alerted as soon as they are not (or even being warned _before_ things actually stop working so that you can avert issues).

Synthetic monitoring is essential for understanding the user experience, especially when monitoring Azure Virtual Desktop (AVD). It helps you determine when AVD is operational and how it performs from specific locations you care about. Unlike relying on user reports or Azure’s often delayed status page, synthetic monitoring provides real-time insights.

By implementing logon simulations (simulating the user logons) and full session simulations (simulating the logon and then working using key apps within the session) from key locations, you can proactively detect issues, assess performance, and ensure a smooth user experience. This benefits the helpdesk by enabling them to inform users about known issues, trigger proactive calls to Microsoft, and resolve problems before they escalate.

Figure 1: eG Enterprise includes a logon simulator for AVD – other synthetic monitoring tools include a web app simulator, full session simulator and protocol simulators.

By continuously running synthetic tests from different geographic points, you can identify latency issues, downtime, and response times, ensuring a consistent user experience across various regions, even before real users encounter any problems. Learn more: Synthetic Monitoring of Microsoft Azure DaaS.

A free AVD logon simulator is available here: Free AVD Logon Simulator for Azure VD | eG Innovations.

Note: eG Enterprise’s synthetic monitoring is licensed per test location – i.e. you can perform as many simulations/tests as you like without incurring per test costs, details available – here: eG Enterprise IT Monitoring Licensing – Cost-Effective & Flexible (eginnovations.com).

2. Do You Know Who Is Connecting to AVD, for How Long, and What Apps They Are Using? Can You Provide Evidence for Audits?

Understanding user behavior within your AVD environment is essential for both operational efficiency and security. Your AVD monitoring strategy should include detailed insights into who is connecting to your virtual desktops, the duration of their sessions, and the specific applications they are using. This not only helps in optimizing resource allocation but also plays a vital role in security and compliance strategies.

Figure 2: A good monitoring tool will allow you to quickly identify top applications by resource usage with out-of-the-box reports without the need to use Azure’s KQL (Kusto Query Language).

Tracking user activity can help you detect unusual behavior that might indicate a security breach. Additionally, detailed logs of user sessions and application usage are invaluable when it comes to Azure Virtual Desktop audits. Having the ability to provide evidence of compliance with regulatory standards is increasingly important for most organizations and mandatory for many. Good AVD monitoring tools can help ensure that your environment is secure and that you have the necessary traceability and historical reports to satisfy auditors and local regulations.

To achieve full visibility of AVD user connections and session usage, you need to leverage an observability solution that monitors more than just AVD hosts, VMs and sessions. Monitoring the authentication and connection technologies in use is essential – technologies such as Entra ID (was Azure AD) and the AVD Broker. Learn more about monitoring Entra ID, here: How to monitor Azure AD Step by Step | eG Innovations.

Figure 3: Steep and sudden spikes in sign-in failures often indicate a service failure and such failures often impact users in specific locations. Daily working patterns e.g., the 9 am morning logon or 1pm back from lunch surge become very clear. Anomalous behavior such as users logging on at 3am from unusual locations should trigger red flags.

Figure 4: The top of the eG Enterprise “Connection Failure” report which allows the administrator to quickly identify the most problematic areas of their AVD deployment to target effort where most effective. Instant visibility on whether certain Host Pools, Session Hosts, Users or Session Desktops experience connection problems.

Beyond security, management staff need to understand employee behavior and application usage.

Figure 5: Many customers find built-in reports such as the “AVD Users – Active / Idle Time” report useful as an overview that can highlight anomalous employee work patterns.

Application reports allow IT Ops to evaluate the popularity of applications and services, combined with application cost reports this can be a powerful tool to evaluate how critical certain apps are and whether they are providing value to the organization.

Figure 6: An example of one of many out-of-the-box application reports available within eG Enterprise.

3. Do You Have a Handle on All Aspects of AVD User Experience – Logon, App Launch, Screen Latency, etc.?

User experience and the digital employee experience (DEX) is at the heart of any virtual desktop environment. An effective AVD monitoring strategy must encompass all facets of user interaction—from logon times and application launch speeds to screen latency.

Figure 7: Protocol and graphics metrics are just a few of around 50 user experience metrics that eG Enterprise will collect automatically that you will have available to debug AVD issues. You’ll also have detailed real-time and historical data on a wealth of metrics such as TCP and UDP RTTs and rates.

Real time and historical monitoring of user experience metrics and measures means that:

management gains a clear view of DEX,
operations can identify optimization needs,
and helpdesk teams are informed of issues before users complain.

eG Enterprise provides detailed monitoring with automated, one-click diagnostics (no manual scripts required) that pinpoint the root cause of user experience issues. Learn more: Troubleshooting Azure Virtual Desktop (AVD) Sessions – Key User Experience and Graphics Metrics to Monitor | eG Innovations.

4. Do You Monitor Every Tier of the AVD Delivery Chain and Have the Ability to Troubleshoot Issues Quickly?

Azure Virtual Desktop environments are complex, comprising multiple tiers that include authentication technologies, network infrastructure, virtualization layers, operating systems, and applications. To maintain a reliable AVD environment, you must monitor every tier of this delivery chain comprehensively. This includes not just the virtual machines and applications but also the underlying network and storage components.

Figure 8: The AVD Broker is only one component of the AVD Logon process and as such other key components need to be monitored and events correlated across the entire end-to-end AVD infrastructure. Auto-discovery and topology maps (such as this one within eG Enterprise) aid help desk operators understand the connection between components and auto-correlate and filter alerts so that a root-cause failure as shown in the Azure AD Connector does not trigger secondary alarms on the AVD Broker.

Full end-to-end observability is necessary to identify bottlenecks that might be frustrating your users. For example, long logon times could be a result of inefficient profile loading or FSLogix problems, while slow application launches might be due to resource contention or network issues. AVD monitoring tools must provide you with detailed insights into all dependencies if you need to troubleshoot user experience and infrastructure issues fast. Learn more: Troubleshooting Azure Virtual Desktop (AVD) Issues through Logon and Beyond.

Your AVD monitoring strategy should employ tools that can monitor infrastructure and application layers independently and in conjunction. By doing so, you ensure that issues can be identified no matter where they arise in the delivery chain. Furthermore, having the capability to troubleshoot these issues quickly is essential. This means your monitoring solution should offer real-time alerts and diagnostics that help you pinpoint the root cause of any problem swiftly (preferably a monitoring tool should do this out-of-the-box).

Figure 9: eG Enterprise automatically monitors and raises alerts on Azure infrastructure issues in Azure AVD dependencies such as networking, storage and other key services that can cause connection problems. This means admins can avert issues before users are affected. eG Enterprise will also keep track of the billing, costs and subscription limits on these resources.

Endpoints are often the source of user experience issues and an AVD monitoring solution should provide insights into the performance of the client device users are using and their final mile network (Wi-Fi / ISP).

Figure 10: Endpoint data is included within user experience dashboards. As well as the entire end-to-end topology converging application, networking, endpoint data and so on.

Learn more: Troubleshooting Azure Virtual Desktop (AVD) Issues through Logon and Beyond | eG Innovations.

5. Do You Have the Ability to Provide Your Helpdesk with Simple, Consistent Views So They Can Triage Problems Quickly?

A critical, and often overlooked, aspect of an AVD monitoring strategy is ensuring that your helpdesk team has the tools and information they need to support users effectively. This involves providing simple, consistent views of the AVD environment that help them quickly identify and triage issues. AVD helpdesk monitoring dashboards should offer a clear overview of key performance indicators (KPIs) and alert statuses, allowing the team to address problems before they escalate.

By integrating user-friendly dashboards and automated alerting systems into your AVD monitoring solution, you empower your helpdesk to be more proactive and efficient. This not only improves the overall user experience but also reduces the time and effort spent on troubleshooting and problem resolution.

Figure 11: Individual AVD user experience dashboards designed for L1/L2 frontline helpdesk operators and AVD administrators include key metrics, alerts, end-client information, logon breakdowns, FSLogix details, application usage and more.

A good AVD monitoring strategy will embrace features beneficial to helpdesks, specific features such as:

Maintenance modes: whereby AVD administrators can put parts of the deployment into maintenance and avoid triggering alerts to helpdesks
Stakeholder specific views and functionality: whereby helpdesk staff are limited by RBAC to only see details relevant to their roles. This is important in organizations with high security requirements.
ITSM integrations: whereby AVD alerts and support tickets are integrated into tools used in the wider organization such as ServiceNow, Jira, MS Teams, and so on.

Learn more about monitoring strategies and features that can help the helpdesks in the article – Empowering IT Help Desks with IT Service Monitoring (eginnovations.com).

Conclusion for an Effective AVD Monitoring Strategy

Developing a comprehensive AVD monitoring strategy is essential for maintaining the performance, security, and reliability of your Azure Virtual Desktop environment. By asking these five critical questions—covering everything from real-time monitoring to user experience and helpdesk support—you can ensure that your strategy addresses all key aspects of AVD management.

Whether you are focused on AVD performance monitoring, AVD security monitoring, or optimizing the user experience, a well-rounded strategy will help you achieve the operational excellence needed to support your organization’s digital workspaces for years to come. Investing in robust monitoring tools, observability practices and AIOps-driven automation now, will pay dividends in the form of improved user satisfaction, reduced downtime, and greater overall efficiency.

The post 5 Essential Questions for Developing an Effective AVD Monitoring Strategy appeared first on eG Innovations.

8 Key Factors of a Successful MSP Monitoring Strategy – Determining your MSP Monitoring Strategy for the Next Decade

Mike Ferioli — Wed, 04 Sep 2024 05:31:31 +0000

As the Managed Service Provider (MSP) landscape continues to evolve, developing a robust MSP monitoring strategy is essential for MSPs wanting to stay ahead in an increasingly complex digital environment. Rapid advancements in technology, coupled with the growing complexity of IT environments, necessitates a shift in how MSPs approach monitoring. The fiercely competitive MSP market and external pressures on costs such as cloud pricing mean that customer expectations are high but profit margins slim. Indeed, recent research from Service Leadership found that 28% of MSPs aren’t profitable. With an array of new tools and techniques available, MSPs must construct a comprehensive strategy that not only meets current needs but is also scalable, adaptable, and capable of addressing future challenges and changing market pressures.

The Importance of a Forward-Looking MSP Monitoring Strategy

An effective MSP monitoring strategy is essential for maintaining the integrity and performance of client infrastructures. Monitoring has evolved beyond merely tracking network uptime or server availability. The next decade will bring new complexities, including the proliferation of modern applications, the rise of AI and machine learning, and the increased adoption of cloud and hybrid environments. To successfully navigate these changes, MSPs must develop an AIOps (Artificial Intelligence for IT Operations) driven monitoring strategy that leverages modern technology, automation, and intelligent analytics.

Today I’ll cover 8 key factors you can include in a modern MSP monitoring strategy to help stay ahead, namely:

Auto-Deployment
Universal Monitor
Auto-Discovery
Capabilities to Monitor Modern Apps
Deep App Insights
Synthetic Monitoring
Auto-Remediation
Pay-Per-Use Licensing

Auto-Deployment: The Foundation of Agility in an MSP Monitoring Strategy

Auto-deployment is a foundational element of a modern MSP observability and monitoring strategy. As client environments expand in size and complexity, the ability to automatically deploy monitoring agents across various devices and systems becomes crucial. Auto-deployment streamlines the onboarding process, reduces human error, and ensures consistent monitoring across all client assets. This capability not only enhances operational efficiency but also enables MSPs to scale their services effortlessly as client needs evolve.

Auto-deployment technologies also ensure that monitoring systems remain up-to-date with the latest patches and configurations, minimizing risk factors such as security vulnerabilities.

Universal operator and agent technologies are now ubiquitous allowing MSPs to leverage IaC (Infrastructure-as-Code) workflows and deployment tooling to roll out day-zero monitoring that scales as the systems do. We often see eG Enterprise deployed in this way in conjunction with technologies including Red Hat OpenShift, Kubernetes (Operators), Containers, Nerdio, Citrix PVS/MVS, Terraform, ARM templates, AWS CloudFormation, Puppet, Pulumi, BICEP, Rancher and so on.

Useful articles are available on auto-deployment and monitoring:

Universal Monitor: Achieving Comprehensive Visibility

A universal monitor is a key component of an effective MSP monitoring strategy. In an increasingly complex IT landscape, MSPs need a unified view of their clients’ environments. A universal monitor provides this by integrating data from various sources into a single pane of glass, offering a holistic view that is essential for identifying patterns, spotting anomalies, and making informed decisions quickly. This level of visibility is crucial for maintaining service levels and addressing issues before they impact end users, making the universal monitor a cornerstone of any successful MSP monitoring strategy.

For MSPs offering multi-tenant services, a high level of support and feature maturity for multi-tenancy within the monitoring tool is vital. Read more about eG Enterprise’s support for secure multi-tenancy: What is multi-tenancy? Multi-tenancy for MSPs Explained.

eG Enterprise natively supports 500+ infrastructure components, Cloud providers and applications. Which means that it provides out-of-the-box deep insights, thresholds, dashboards and reports for any of these 500+ components.

Auto-Discovery: Ensuring Comprehensive Coverage

Auto-discovery is another critical feature in a modern MSP monitoring strategy. As client environments become more dynamic, with assets being frequently added or removed (often via auto-scale), auto-discovery tools automatically detect new devices, applications, and services as they are commissioned. This ensures that nothing slips through the cracks and that the entire application and infrastructure landscape is consistently monitored. Auto-discovery not only saves time but also ensures comprehensive coverage, allowing MSPs to monitor all aspects of their clients’ infrastructure without manual intervention.

Learn more about various auto-discovery methods and how you can evaluate the success of auto-discovery methodologies, see: Autodiscovery – IT Glossary | eG Innovations.

Figure 1: Auto-discovery enables rich visual topology maps within eG Enterprise which explain the dependencies associated with components such as databases even in auto-scaled systems and failover architectures where there may be multiple paths user transactions take.

Monitoring Modern Apps: Adapting Your MSP Monitoring Strategy

The shift towards cloud-native and microservices architectures introduces new challenges that a modern MSP monitoring strategy must address. Monitoring modern applications requires tools and techniques that go beyond traditional methods. These applications are often distributed, highly dynamic, and built using a mix of on-premises, cloud, and third-party services. To effectively monitor such environments, MSPs need capabilities that can manage the complexity and scale of modern apps. This includes real-time monitoring of containers, microservices, and serverless functions, and the ability to track the performance and health of APIs and other integration points. Adapting your MSP monitoring strategy to include these capabilities is now essential for keeping pace with customers’ expectations.

Figure 2: Modern applications and services may depend on cloud services, often in multiple clouds. Payment gateways (as shown) and geo-location services are commmon dependencies.

Deep Application Insights: Moving Beyond Surface-Level Metrics

To keep up with the complexity of modern applications, your MSP monitoring strategy may need to provide you with deep application insights. Surface-level monitoring is no longer sufficient; MSPs need a granular view of each component’s performance. Deep application insights involve monitoring at the code level, insights into database performance, tracking user interactions and distributed transaction tracing, and analyzing performance metrics across various layers of the stack. These insights enable MSPs to identify bottlenecks, optimize performance, and deliver a superior user experience. They are also invaluable for troubleshooting complex issues that may not be detected through traditional monitoring methods, making deep application insights a critical aspect of a modern MSP monitoring strategy.

Figure 3: Modern monitoring tools will include coverage of front-end services allowing easy routing of issues to the front-end vs the back-end teams as appropriate.

Figure 4: APM tooling designed for MSP administrators will automatically identify genuine application performance issues, differentiating them from infrastructure issues. This information helps MSPs prove to customers and third-party app suppliers where the true root-cause responsibility lies.

Unlike most Application Performance Monitoring (APM) tools, eG Enterprise is designed for the MSP administrator and helpdesk. It is also licensed accordingly!

Synthetic Monitoring: Proactive Problem Identification in Your MSP Monitoring Strategy

Synthetic monitoring is another vital element of an effective MSP monitoring strategy. By simulating user interactions and transactions, synthetic monitoring allows MSPs to proactively identify potential issues before they affect real users. This proactive approach is particularly useful for ensuring the availability and performance of critical applications, especially those that are customer-facing. Synthetic monitoring can simulate various scenarios, such as different geographies, devices, and network conditions, providing a comprehensive understanding of how an application will perform under different circumstances. Incorporating synthetic monitoring into your MSP monitoring strategy helps maintain high service levels and prevent downtime.

Figure 5: The eG Enterprise Web App simulator is one of several synthetic monitoring features within the platform. It supports multi-step transactions that can be replayed 24×7 to emulate real user interactions to websites from multiple locations whilst recording website availability and end-to-end response times.

Auto-Remediation: Reducing Mean Time to Resolution (MTTR) with Your MSP Monitoring Strategy

Automation is set to play an even greater role in MSP monitoring strategies in the coming decade, particularly around auto-remediation. Auto-remediation tools automatically resolve common issues without human intervention, reducing the time to resolution and freeing up valuable resources. For example, if a server goes down, a platform such as eG Enterprise with auto-remediation capabilities can automatically restart it or shift workloads to a backup server. This not only minimizes downtime but also reduces the workload on support and helpdesk teams. Auto-remediation is increasingly used to enable MSPs to deliver more reliable services.

Learn more:

Of course, with automation you need to ensure that automated actions, as well as human-instigated ones, are fully traceable and that a human operator can always understand any actions performed by the monitoring tool, such auditing is also essential for an MSP security strategy, see: Auditing Capabilities in IT Monitoring Tools | eG Innovations for details.

Pay-Per-Use Licensing: Flexibility in Your MSP Monitoring Strategy

Finally, as MSPs look to the future, the financial model of their MSP monitoring strategy will be just as important as the technical aspects. Pay-per-use licensing offers a flexible and scalable approach, allowing MSPs to align costs with the value delivered to clients. This model ensures that MSPs are only paying for what they use, making it easier to manage costs and scale services as needed. Pay-per-use licensing is particularly beneficial for MSPs that serve clients with fluctuating needs, as it allows them to adjust their monitoring resources in real-time without the burden of fixed costs.

The synthetic testing tools with eG Enterprise are licensed per installation machine. Once installed you can run unlimited tests on your applications infrastructure without racking up per test PAYG type costs. With eG Enterprise there is no per app or per app server licensing for APM. For Java apps, eG Enterprise licensing is priced per operating system, on which the JVMs are running. If your operating system has three JVMs running on it, then you are charged for the single operating system. With many other APM tools, full-stack licensing is based on each target host’s memory size. eG Enterprise is typically 10-20% of the cost of these solutions.

Learn more: eG Enterprise IT Monitoring Licensing – Cost-Effective & Flexible (eginnovations.com)

Conclusion: Building a Future-Proof MSP Monitoring Strategy

Determining an MSP monitoring strategy for the next decade requires a forward-thinking approach that embraces automation, deep insights, and flexibility. By incorporating key factors such as auto-deployment, universal monitoring, auto-discovery, and synthetic monitoring, MSPs can build an observability strategy that not only meets the needs of today but is also ready to tackle the challenges of tomorrow. As the digital landscape continues to evolve, MSPs that invest in modern monitoring capabilities will be better positioned to deliver exceptional service, maintain high levels of client satisfaction, and grow their businesses in a competitive market.

Learn More:

The post 8 Key Factors of a Successful MSP Monitoring Strategy – Determining your MSP Monitoring Strategy for the Next Decade appeared first on eG Innovations.