Several retailers, including Neiman Marcus, had site issues this past Black Friday weekend. Here is some advice from the folks at Dynatrace, an application performance management software company, on the top three things retailers can do when things go bad online.
1/ Have a plan.
Everyone needs a playbook. Retailers need to make sure they have plan in place when things go bad. The plan should not just be a reactive disaster recovery plan, that alone would be a bad plan. The retailer’s plan has to look beyond disaster recovery and make sure that their sites and applications have been thoroughly tested well before any holiday event.
The plans have to include ALL parts of the organization, it should not just be limited to operations, it has to include digital business owners, development, test and operations.
2/Have the right tools.
“Everyone has a plan until you get punched in the nose”. Having a site go down during a holiday event is a traumatic experience for a retailer. It’s not enough to have a plan, but you also have to have the right tools in place.
When things go bad for a retailer, we typically see them go into a “war room” scenario. Every team has their own set of tools which typically tell them that what they are responsible for is green, but still the problem persists. Retailers need tools that bring teams together, they need a common tool that everyone in the organization can use including operations, development, test and digital business owners.
3/Understand the data.
Having the right tools in place is a step in the right direction, having the expertise to understand what the data is telling you is another thing. Most retailers have teams of individuals who all understand their part of the application, and have a few “rock stars” who have the big picture in how it all works together. Modern retail applications are incredibly complex, with applications having hundreds if not thousands of dependencies. Sifting through log files (which is what we see most often) is the old (and most costly) way of solving a problem.
Retailers need to have an understanding of the relationship between application components and external internet factors. While most retailers have a few rock stars what they need is a next generation set of tools to automatically notify that event is occurring, analyze all the dependencies and discover the root cause of the issue.