The first time I was confronted with the idea of voice picking, I was intrigued. This was quite a long time ago, but when I saw the design spec I sat down with the consultant who was designing the system to learn more. I asked him, “Why the heck would a user want to talk to the computer.” His response: “Because functions executed using voice are much more efficient.” Clearly, technology evolves, and that’s a good thing.
The technologies we’ve used up until now have some clear deficiencies. We started with scanning guns that were miles ahead of manual processes, but still had some limitations.
Cons with the RF handheld devices
- Users need to put the device down to finish their pick, and then pick it back up to continue scanning. That slows things down.
- Users always need to be concerned that they don’t drop and break the device, so they spend extra time looking for a safe place to set it down.
- Users must press keys on a small and crowded interface to use the device, which leads to errors and delays.
- Handheld devices are easy to lose, and even get dropped by mistake into boxes shipped to customers.
Next, wrist-mounted devices came along to address some of the concerns. The biggest boon was the hands-free capability, but there were still some down sides.
Cons with the wrist-mounted devices
- The smaller form factor required users to press multiple keys for certain commands. Log in and use are a struggle, and take lots of time.
- Some users found the wearables bulky and heavy.
- The second shift disliked using devices that had been strapped on to other workers for hours. (Sweaty, smelly, and unsanitary.)
- Complex key commands also makes training onerous.
- Finally, voice technology emerged to promise multiple benefits.
- Voice combines hands free and eyes free. The picker doesn’t have to carry a device or look at a screen which speeds everything. This provided clear benefit for to the picking process by increasing throughput (i.e. the amount of product received through the inbound doors and shipped through the outbound doors). There’s a clear connection between orders shipped, orders invoiced, and company revenue. A billion-dollar retailer/distributor typically picks about 100 million units, so a savings of half a second per pick, results in 50 million seconds saved, equivalent to 13,889 hours, which at rate of $14 per hour, translates to $194,446 is savings per year.
- Voice commands preclude workers chatting with each other, which offers an unexpected time savings. In addition, the cool factor of the technology enhances user adoption. Also with voice, it keeps the operators going constantly, they don’t have the time to chit chat with their fellow pickers, hence that time is saved also.
- Paper-based systems are fast, but error prone. The voice system, because it is closely intertwined with the warehouse management system (WMS), caught errors immediately. In the end, it provided the best speed and accuracy combination.
Speed vs. Accuracy
Early in this evolution toward voice there was excitement. In fact, some wanted to make everything voice-enabled. It’s not a total panacea for all the productivity pains in the warehouse. For one thing, voice solutions can be quite expensive, because:
- an interface with the WMS requires the development of extra custom interfaces. It needed an extra integration software to be developed to build interfaces to the WMS (Warehouse Management System), which was mostly custom
- Extra servers are needed to communicate with voice terminals
- The typical implementation timeline was three to six months.
- The voice terminals were, and still are, very expensive.
Ease vs. Cost
Overall, voice was very expensive and took a while to implement.
Later, I got a chance to talk to some of the distribution center managers that had already implemented voice. This is what they told me.
- Voice improves productivity. They were promised up to 60%, but the average productivity improvement was somewhere around 20%.
- It supports multiple languages.
- Voice creates a real time, highly-accurate picking process (i.e. picks are immediately reported to and validated by the WMS).
- Early voice technology followed a voice template approach. Users had to train the system when they started using it for the first time. This made it less accurate in some key situations, including:
- When a user had a cold or some other situation that changed the voice.
- When frustration or other emotions change the voice.
- When there is a lot of background noise.
- Voice systems used a check digit for warehouse locations so that the suer would have to read the check digit, adding a separate step, requiring both extra tasks and extra labels.
How has voice technology improved?
User-friendly voice technology leverages consumer devices
Consumer mobile devices, such as cell phones, have changed the game for the enterprises. Purpose-built devices are only required when powerful scanning technology is needed. Meanwhile, for most applications, consumer technology offers some benefits:
- Familiar technology requires less training time.
- The addition of voice capabilities (such as Siri and Alexa) to many consumer devices increases comfort level of operators.
- Technology advances, such as fast charging, offer better user experiences.
- Compatible consumer style headsets with micro phones designed for sports are great for warehouse users and are less expensive.
More powerful speech recognition engines
The latest Android and Apple devices have sophisticated voice engines with built in artificial intelligence and machine learning capabilities. It puts these capabilities into the hands of users at no cost. These voice engines can be easily leveraged using available application programming interface (APIs). The Google Voice Engine has taken technological leadership since Google converted to transcription using Long Short-term Memory Recurrent Neural Networks.
Meanwhile, computing power on consumer devices is on the rise, which makes voice commands both quick and accurate. Native voice recognition engines don’t require the recording of voice templates, which saves time for users. It also addresses the issue of voice changes due to emotion or illness.
Specific modes address background noise issues
As many WMSs become highly configurable and flexible, they can be easily configured to work with voice solutions and don’t require any speaking. Instead inputs are provided through the scanner, by scanning relevant barcodes, which works perfectly in a noisy environment. This also eliminates the need for check digits, because scanners are used to scan barcodes. This is faster and more accurate. In addition, many WMSs have built in voice interfaces, which means that solutions don’t need to be built from scratch.
Plug & Play implementation of voice solutions
Voice solutions are becoming plug and play, because consumer devices run highly-sophisticated terminal emulations that are easy to use with soft overlay keyboards and communication capabilities. Quick connection to a WIFI network and easily configurable voice solutions for text to speech capabilities get things up and running quickly.
Further, devices have gotten sophisticated enough to be used for both voice and scanning. Workers can move readily from one task or department to another using the same device at all times.
Costs coming down
Overall, the cost of the voice solution has come down significantly for several reason:
- The ability to use off the shelf hardware rather than expensive voice specific hardware, which could cost anywhere from $4,000 to $5,500.
- There is no need to buy the integration software to voice enable the functions.
- Implementation times are shrunk considerably when using standard technology.
- With standard and affordable equipment, voice capabilities are in the realm of even smaller sized operations.
Today, organizations that have never used a voice system should look at getting on board this trend. Further, if you have an older system from fifteen or even five years ago, you can save a lot by upgrading. Let us know your thoughts on this in the comments section below.