Wednesday, March 9, 2011

Run our code in another process. Simple method with CreateRemoteThread & LoadLibrary

[Tip]
Inject our code into another process using CreateRemoteThread & LoadLibrary.

[Details]
In general, any process can load a DLL dynamically by using the LoadLibrary API. If we use CreateRemoteThread(), we can start LoadLibrary function in another process, in-effect our library will load in another process.
Both LoadLibrary and FreeLibray are functions residing in kernel32.dll. Because kernel32 is guaranteed to be present and at the same load address in every "normal" process , the address of LoadLibrary/FreeLibray is the same in every process too.
The steps to start a dll in another process.
1.       Retrieve a HANDLE to the remote process (OpenProcess).
2. Allocate memory for the DLL name in the remote process (VirtualAllocEx).
3. Write the DLL name, including full path, to the allocated memory (WriteProcessMemory).
4. Map your DLL to the remote process via CreateRemoteThread & LoadLibrary.
5. Wait until the remote thread terminates (WaitForSingleObject); this is until the call to LoadLibrary returns. Put another way, the thread will terminate as soon as our DllMain (called with reason DLL_PROCESS_ATTACH) returns.
6. Retrieve the exit code of the remote thread (GetExitCodeThread). Note that this is the value returned by LoadLibrary, thus the base address (HMODULE) of our mapped DLL.
7. Free the memory allocated in Step #2 (VirtualFreeEx).
8. Unload the DLL from the remote process via CreateRemoteThread & FreeLibrary. Pass the HMODULE handle retreived in Step #6 to FreeLibrary(via lpParameter in CreateRemoteThread).
Note: If your injected DLL spawns any new threads, be sure they are all terminated before unloading it.
9. Wait until the thread terminates (WaitForSingleObject).

Please check the attached sample application to demonstrate usage of CreateRemoteThread.
Run InjectApp, give any process ID to inject InjectLibrary.dll to that process.
InjectApp injects “InjectLibrary.dll” to the specifeid process[Provide process ID and process Inject button].
InjectLibrary creates a file from its DllMain(). This file logs current process id, just to prove this library started in another process.
 
// Implementation of InjectLibrary.dll
BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                                                            )
{
            switch (ul_reason_for_call)
            {
            case DLL_PROCESS_ATTACH:
        {
// Here prepares a file at D:\\File.txt, to prove which process loaded this library.
            FILE *pFIle = fopen( "D:\\File.txt", "w+" );
            fprintf( pFIle, "Dll Started From Process ID:%d", GetCurrentProcessId() );
            fclose(pFIle);
        }
            case DLL_THREAD_ATTACH:
            case DLL_THREAD_DETACH:
            case DLL_PROCESS_DETACH:
                        break;
            }
            return TRUE;
}


[Reference]
http://www.codeproject.com/KB/threads/winspy.aspx#section_3
http://www.codeproject.com/KB/threads/winspy.aspx
http://www.programmersheaven.com/2/Inject-code-to-Portable-Executable-file


Posted By : Santhosh G.

Triple Buffering: Third buffer for fast graphics display

[Tip]
Triple Buffering: Third buffer for fast graphics display.
 
[Details]
There are actually two buffers on modern graphics cards, the Primary Buffer and the Secondary Buffer, also often called the Front Buffer and the Back Buffer. Both are storage areas on the Video RAM of the graphics card, and the process of using two buffers at one time is called Double Buffering. It was only relatively recently that graphics cards had enough VRAM to provide two buffers at all resolutions, since a single frame of high resolution detailed graphics can take up a great deal of video memory, much less two of them.

The graphics card uses the secondary buffer to compose a new frame while the primary buffer is sending an existing completed frame to the monitor. When these tasks are done, the buffers are essentially 'flipped' around so that the recently completed frame in the secondary buffer now becomes the primary buffer ready to send to the monitor, while a new frame begins composing in what was the primary buffer a moment ago. This is repeated over and over and thus the use of two buffers means that the graphics card is not constantly waiting for a single frame buffer to be cleared before getting on with rendering more frames to store there. It's like putting out a fire using two buckets of water instead of just one - one bucket can be filled with water while the contents of the other is being thrown on the fire, and then they're switched and the process repeated; much faster than just using a single bucket.

There is still a problem with double buffering, and that is when VSync is enabled, the graphics card can often fill both buffers and then have to stop working on any new frames until the monitor indicates it is ready for a new frame for its next refresh. Only then can the graphics card clear the primary buffer, switch buffers and begin rendering the next frame in the secondary buffer. This waiting is what causes a drop in FPS when VSync is enabled on many systems.

Wouldn't it then make sense to have more than two buffers? Why not three buffers for example - that would give the graphics card more room to render frames without having to worry about where to store them before they're sent to the monitor, even if VSync is enabled. Well there is an option which does just that, called Triple Buffering. And it generally does precisely what the name implies, it creates a third buffer in the VRAM, which we can call the Tertiary buffer.

[Reference]

Posted By : Santhosh G.

CUDA: GPGPU without Graphics Knowledge.

[Tip] 
CUDA is NVIDIA’s parallel computing architecture. It enables dramatic increases in computing performance by harnessing the power of the GPU.
 
[Details]
"CUDA programming" and "GPGPU programming" are not the same (although CUDA runs on GPUs). Previously, writing software for a GPU meant programming in the language of the GPU. CUDA permits working with familiar programming concepts while developing software that can run on a GPU. It also avoids the performance overhead of graphics layer APIs by compiling your software directly to the hardware (GPU assembly language, for instance), thereby providing great performance.
With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for CUDA, including image and video processing, computational biology and chemistry, fluid dynamics simulation, CT image reconstruction, seismic analysis, ray tracing, and much more.
Computing is evolving from "central processing" on the CPU to "co-processing" on the CPU and GPU. To enable this new computing paradigm, NVIDIA invented the CUDA parallel computing architecture that is now shipping in GeForce, ION, Quadro, and Tesla GPUs, representing a significant installed base for application developers.

Here is a very simple example code Details are expalined at http://drdobbs.com/cpp/207402986.
Sample code to increment all elements in the Array in Host(CPU) and Device(GPU):
void incrementArrayOnHost(float *a, int N )
{
  int i;
  for (i=0; i < N; i++) a[i] = a[i]+1.f;
}
//
__global__ void incrementArrayOnDevice(float *a, int N)
{
/*
In the kernel on the CUDA-enabled device, several built-in variables are available
that were set by the execution configuration of the kernel invocation.
They are:

blockIdx which contains the block index within the grid.
threadIdx contains the thread index within the block.
blockDim contains the number of threads in a block.
*/
  int idx = blockIdx.x*blockDim.x + threadIdx.x;
  if (idx<N) a[idx] = a[idx]+1.f;
}

// Calling CUDA kernel for incrementing an Array[a_d] of size ARRAY_MAX
incrementArrayOnDevice <<< 1, ARRAY_MAX>>> (a_d, ARRAY_MAX); // a_d is the float array in GPU, ARRAY_MAX is the size of Array.


[Reference]

Posted By : Santhosh G.

Vertical Synchronization: How to avoid tearing effect in high FPS display

[Tip] 
When displaying graphics in high FPS, Tearing Effect can be avoided by enabling Vertical Sync.
 
[Details]
Screen tearing is a visual artifact in video where information from two or more different frames is shown in a display device in a single screen draw.

The artifact occurs when the video feed sent to the device isn't in sync with the display's refresh, be it due to non-matching refresh rates, or simply lack of sync between the two. During video motion, screen tearing creates a torn look as edges of objects (such as a wall or a tree) fail to line up.
Tearing can occur with most common display technologies and video cards, and is most noticeable on situations where horizontally-moving visuals are commonly found, such as a indicator moving horizontally in the screen.

Vertical Synchronization, also called Vertical Sync, or simply VSync for short, was primarily required because of the physical limitations of CRT monitors. A CRT monitor has to constantly light up the phosphors on the screen many times per second to maintain an image, and can only do this a certain number of times per second based on how fast the electron gun in the monitor can move. Each time it has to redraw the entire screen again, it moves the electron gun inside the monitor from the bottom of the screen to point to the top left of the screen, ready to 'repaint' all the lines on the screen from top left to bottom right, and back again for the next refresh. The period during which the electron gun moves to the top of the screen for a new refresh is called the Vertical Blanking Interval (VBI).

Enabling VSync tells your graphics card to synchronize its actions with your monitor. That means the graphics card is only allowed to swap its frame buffer and send a new frame to the monitor when the monitor says it is ready to repaint a new screen - i.e. during the VBI[Vertical Blanking Interval]. Your graphics card and monitor do not have to be in sync; they can still operate properly when VSync is disabled, however when VSync is disabled, you can experience a phenomenon called Tearing in periods when your graphics card and monitor go out of sync, precisely because the graphics card and monitor are acting without regard for each other's limitations.

Tearing

It is an unfortunate fact that if you disable VSync, your graphics card and monitor will inevitably go out of synch. Whenever your FPS exceeds the refresh rate (e.g. 120 FPS on a 60Hz screen), or in general at any point during which your graphics card is working faster than your monitor, the graphics card produces more frames in the frame buffer than the monitor can actually display at any one time. The end result is that when the monitor goes to get a new frame from the primary buffer of the graphics card during VBI, the frame may be made up of two or more different frames overlapping each other. This results in the onscreen image appearing to be slightly out of alignment or 'torn' in parts whenever there is any movement - and thus it is referred to as Tearing.

In OpenGL, WGL_EXT_swap_control can be used to turn on or off VSync.

[Reference]

Posted By : Santhosh G.

Security Best Practices for C++

[Title]
Security Best Practices for C++

[Details]
This topic contains information about recommended security tools and practices. Using these resources and tools does not make applications immune from attack, but it makes successful attacks less likely.
Visual C++ Security Features
This section discusses security features that are built into the Visual C++ compiler and linker.
/GS (Buffer Security Check)
This compiler option instructs the compiler to insert overrun detection code into functions that are at risk of being exploited. When an overrun is detected, execution is stopped. By default this option is on.
/SAFESEH (Image has Safe Exception Handlers)
This linker option instructs the linker to include in the output image a table that contains the address of each exception handler. At runtime, the operating system uses this table to make sure that only legitimate exception handlers are executed. This helps prevent the execution of exception handlers introduced by a malicious attack at runtime. By default this option is disabled.
/NXCOMPAT , /NXCOMPAT (Compatible with Data Execution Prevention)
These compiler and linker options enable Data Execution Prevention (DEP) Compatibility. DEP guards the CPU against executing non-code pages.
/analyze (Enterprise Code Analysis)
This compiler option activates code analysis that reports potential security issues such as buffer overrun, un-initialized memory, null pointer dereferencing, and memory leaks. By default this option is disabled. See Code Analysis for C/C++ Overview for more information.
/DYNAMICBASE (Use address space layout randomization)
This linker option enables building an executable image that can be loaded at different locations in memory at the beginning of execution. This option also makes the stack location in memory much less predictable.
Security-Enhanced CRT
 
The C Runtime Library (CRT) has been augmented to include secure versions of functions that pose security risks. (The unchecked strcpy string copy function, for example.) The older, nonsecure versions of these functions are now deprecated, and therefore their use causes compile-time warnings. We strongly encourage you to use the secure versions of these CRT functions instead of choosing to suppress the compilation warnings.
 
SafeInt Library
 
SafeInt Library helps prevent integer overflows and other exploitable errors that might result when the application performs mathematical operations. The SafeInt library includes the SafeInt Class, the SafeIntException Class, and several SafeInt Functions.
The SafeInt class protects against integer overflow and divide-by-zero exploits. It lets you handle comparisons between values of different types, and provides two error handling policies. The default policy is for the SafeInt class to throw a SafeIntException class exception to report why a mathematical operation cannot be completed. The second policy is for the SafeInt class to stop program execution. You can also define a custom policy.
Each SafeInt function protects one mathematical operation from an exploitable error. You can use two different types of parameters without having to convert them to the same type. Use the SafeInt class to protect multiple mathematical operations.
Checked Iterators
 
A checked iterator is an iterator that enforces container boundaries. By default, when a checked iterator is out of bounds, it generates an exception and ends program execution. A checked iterator provides other levels of response that depend on values assigned to preprocessor defines such as _SECURE_SCL_THROWS and _ITERATOR_DEBUG_LEVEL. For example, at _ITERATOR_DEBUG_LEVEL=2, a checked iterator provides comprehensive correctness checks in debug mode, that are made available by using asserts.
 
Code Analysis for Managed Code
 
Code Analysis for Managed Code, also known as FxCop, is a tool which checks assemblies for conformance to the Microsoft .NET Framework Design Guidelines. FxCop analyzes the code and metadata within each assembly to check for defects in the following areas:
  • Library design
  • Localization
  • Naming conventions
  • Performance
  • Security
Code Analysis for Managed Code is included in Visual Studio Application Lifecycle Management.
Windows Application Verifier
 
Available as part of the Application Compatibility Toolkit, the Application Verifier (AppVerifier) is a tool that can help developers identify potential application compatibility, stability, and security issues.
The AppVerifier monitors how an application uses the operating system. It watches the file system, registry, memory, and APIs while the application is running, and recommends source-code level fixes for the issues it uncovers.
The verifier lets you perform the following:
  • Test for potential application compatibility errors caused by common programming mistakes.
  • Examine an application for memory-related issues.
  • Test an application's compliance with the requirements for current logo programs such as the Windows 7 Software Logo Program and Windows Server 2008 R2 Logo Program.
  • Identify potential security issues in an application.
  [Reference]

Posted By : Preethymol K. S 
 

Compiler Intrinsics

[Tip]
Most functions are contained in libraries, but some functions are built in (that is, intrinsic) to the compiler. These are referred to as intrinsic functions or intrinsics.

[Detail]
If a function is an intrinsic, the code for that function is usually inserted inline, avoiding the overhead of a function call and allowing highly efficient machine instructions to be emitted for that function. An intrinsic is often faster than the equivalent inline assembly, because the optimizer has a built-in knowledge of how many intrinsics behave, so some optimizations can be available that are not available when inline assembly is used. Also, the optimizer can expand the intrinsic differently, align buffers differently, or make other adjustments depending on the context and arguments of the call.
The use of intrinsics affects the portability of code, because intrinsics that are available in Visual C++ might not be available if the code is compiled with other compilers and some intrinsics that might be available for some target architectures are not available for all architectures. However, intrinsics are usually more portable than inline assembly. The intrinsics are required on 64-bit architectures where inline assembly is not supported.
Some intrinsics, such as __assume and __ReadWriteBarrier, provide information to the compiler, which affects the behavior of the optimizer.
Some intrinsics are available only as intrinsics, and some are available both in function and intrinsic implementations. You can instruct the compiler to use the intrinsic implementation in one of two ways, depending on whether you want to enable only specific functions or you want to enable all intrinsics. The first way is to use #pragma intrinsic(intrinsic-function-name-list). The pragma can be used to specify a single intrinsic or multiple intrinsics separated by commas. The second is to use the /Oi (Generate Intrinsic Functions) compiler option, which makes all intrinsics on a given platform available. Under /Oi, use #pragma function(intrinsic-function-name-list) to force a function call to be used instead of an intrinsic. If the documentation for a specific intrinsic notes that the routine is only available as an intrinsic, then the intrinsic implementation is used regardless of whether /Oi or #pragma intrinsic is specified. In all cases, /Oi or #pragma intrinsic allows, but does not force, the optimizer to use the intrinsic. The optimizer can still call the function.
Some standard C/C++ library functions are available in intrinsic implementations on some architectures. When calling a CRT function, the intrinsic implementation is used if /Oi is specified on the command line.
A header file, Intrin.h, is available that declares prototypes for the intrinsic functions. Additionally, certain Windows headers declare functions that map onto a compiler intrinsic.

[Reference]

Posted By : Preethymol K. S 

Discovering Memory Leaks

Title - Discovering Memory Leaks
Details - If you are working on a serious project, you will propably use a sophisticated memory manager which will discover memory leaks (along with their exact position in the code),
but if you just want to do a quick test in a small program, there is an easy way to let the debugger check for memory leaks:

- Include crtdbg.h in your project (this file is included in the
Microsoft Platform SDK
)
- At the beginning of main() or WinMain () put this code:

int flag = _CrtSetDbgFlag(_CRTDBG_REPORT_FLAG);
flag |= _CRTDBG_LEAK_CHECK_DF;
_CrtSetDbgFlag(flag);
This works in Visual C++ and, if you run the programm in debug mode, it will report memory leaks in the debug output window 
("Immediate Window" in the latest Visual Studio version). 
 
Reference-
http://tipsandtricks.runicsoft.com/Cpp/DiscoveringMemoryLeaks.html
 
Posted By : Preethymol K. S