The Basics
One of the most important steps of automating is finding the window or control with which you'd like to interact. This can often be challenging to completely frustrating for those who are not familiar with the Windows API or have used other automation tools.
There are some basics concepts which you'll need to understand before embarking on your automation journey. Some of the items described here may not be 100% technically accurate; I'm trying to keep things simple for the general public, so unless something is way off-base, let's not get pedantic here.
Processes and Threads Whenever you start a program (executable, EXE) such as Excel, the Windows loads the program into its own process. Generally speaking, the process is that programs sandbox in which to do its work. This includes its main window, memory (RAM) allocation, and its main thread.
The main thread is where the program begins execution; loading up the program, drawing things on the screen, and awaiting messages from Windows about things happening which affect the program, like events, keyboard and mouse input for the user. The main thread pretty much just executes in a loop, asking Windows if it has any messages, processing those messages, then repeating until the program is closed.
Window and Handles The most fundamental aspect of any Windows GUI application are windows. Most people are familiar with the concept of a window, which is a big container where all of the parts of the application are displayed. However, it's not that simple; each control (drop-down list, menu, scroll bar, text box, etc. is actually a window (as far as Windows is concerned)*.
Each window has a handle which references an address (location) in memory. Think of this just like your home address, with your address, you can tell anyone exactly where to find your house. Windows does this the same way, each window (control) has its own address than can be used to access and communicate with the window.
The program will have one main window, which is usually the one you see on the screen. Inside that window are usually dozens or hundred of windows which are usually the controls with which you interact or are used to display stuff to you. Windows can have parents, siblings, and children. So the program is really made up of just a bunch of windows that all relate to each other in some way, except the main window which has no parent. Most of the time, a window without a parent is considered a top-level window and is often the program's main window.
*Note: This is generally true, but there are exceptions when the program is manually rendering items, such as everything inside a web browser's rendering window (the web page itself). The only window that Windows knows about is the big square area containing the web browser's rendering window, not the individual text boxes, for example, the web browser itself manages those. Some cross-platform applications also manage their own controls which sometimes are not accessible directly by the Windows API.
Messages Pretty much everything that happens in Windows relates to messages. Messages are used to relay and communicate with all applications and Windows itself. For example, each tiny movement you make with the mouse generates a message that is sent to the window directly below the mouse cursor. In fact, S+ installs a low-level mouse hook which allows S+ to receive each and every mouse message, this allows S+ to intercept the right-click for capturing the mouse to recognize a gesture; S+ also opts to not allow that message to be relayed further so no other application knows the right mouse down even ever happened. Again, this is simplified, but is generally correct.
As mentioned above, the program's main thread processes each message it received and decides whether or not the program should do anything based on the message.
Windows API, S+ Actions, and Alien The Windows API (Application Programming Interface) is simply a catalog of thousands of very granular functions which allow programs to do things; it is the building blocks of creating applications for Windows. S+ utilizes numerous Windows API functions just to function as well as leverages many API calls to perform your action scripts.
In S+, I've encapsulated and/or combined many Windows API functions to simplify automating things with gestures. For example, the S+ action acCenterWindowToScreen actually calls several Windows API calls to accomplish the task requested in a simplified way, for the person who just wants to center a program's window.
While I've incorporated a pretty thorough base of granular action functions into S+, there are times when you may need to call a specific Windows API call for which I have not created and exposed an action. For example, IsIconic is a Windows API call which reports whether or not a window is currently minimized. S+ does not contain an action function for that particular API, so in order to access and call Windows directly, you'll need to define a binding to that call in the Lua scripting engine via something called Alien.
Alien essentially lets S+ call the Windows API directly, but it requires a bit of setup for the call which can be quite confusing for someone who isn't a developer. This tutorial will not go into depth about doing this (yet), but you will sometimes see posts in the forum which reference defining a function in the Global Lua tab in S+. What this does is define a new function which can be used in your action scripts.
|