Wednesday, March 9, 2011

Run our code in another process. Simple method with CreateRemoteThread & LoadLibrary

Inject our code into another process using CreateRemoteThread & LoadLibrary.

In general, any process can load a DLL dynamically by using the LoadLibrary API. If we use CreateRemoteThread(), we can start LoadLibrary function in another process, in-effect our library will load in another process.
Both LoadLibrary and FreeLibray are functions residing in kernel32.dll. Because kernel32 is guaranteed to be present and at the same load address in every "normal" process , the address of LoadLibrary/FreeLibray is the same in every process too.
The steps to start a dll in another process.
1.       Retrieve a HANDLE to the remote process (OpenProcess).
2. Allocate memory for the DLL name in the remote process (VirtualAllocEx).
3. Write the DLL name, including full path, to the allocated memory (WriteProcessMemory).
4. Map your DLL to the remote process via CreateRemoteThread & LoadLibrary.
5. Wait until the remote thread terminates (WaitForSingleObject); this is until the call to LoadLibrary returns. Put another way, the thread will terminate as soon as our DllMain (called with reason DLL_PROCESS_ATTACH) returns.
6. Retrieve the exit code of the remote thread (GetExitCodeThread). Note that this is the value returned by LoadLibrary, thus the base address (HMODULE) of our mapped DLL.
7. Free the memory allocated in Step #2 (VirtualFreeEx).
8. Unload the DLL from the remote process via CreateRemoteThread & FreeLibrary. Pass the HMODULE handle retreived in Step #6 to FreeLibrary(via lpParameter in CreateRemoteThread).
Note: If your injected DLL spawns any new threads, be sure they are all terminated before unloading it.
9. Wait until the thread terminates (WaitForSingleObject).

Please check the attached sample application to demonstrate usage of CreateRemoteThread.
Run InjectApp, give any process ID to inject InjectLibrary.dll to that process.
InjectApp injects “InjectLibrary.dll” to the specifeid process[Provide process ID and process Inject button].
InjectLibrary creates a file from its DllMain(). This file logs current process id, just to prove this library started in another process.
// Implementation of InjectLibrary.dll
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
            switch (ul_reason_for_call)
            case DLL_PROCESS_ATTACH:
// Here prepares a file at D:\\File.txt, to prove which process loaded this library.
            FILE *pFIle = fopen( "D:\\File.txt", "w+" );
            fprintf( pFIle, "Dll Started From Process ID:%d", GetCurrentProcessId() );
            case DLL_THREAD_ATTACH:
            case DLL_THREAD_DETACH:
            case DLL_PROCESS_DETACH:
            return TRUE;


Posted By : Santhosh G.

Triple Buffering: Third buffer for fast graphics display

Triple Buffering: Third buffer for fast graphics display.
There are actually two buffers on modern graphics cards, the Primary Buffer and the Secondary Buffer, also often called the Front Buffer and the Back Buffer. Both are storage areas on the Video RAM of the graphics card, and the process of using two buffers at one time is called Double Buffering. It was only relatively recently that graphics cards had enough VRAM to provide two buffers at all resolutions, since a single frame of high resolution detailed graphics can take up a great deal of video memory, much less two of them.

The graphics card uses the secondary buffer to compose a new frame while the primary buffer is sending an existing completed frame to the monitor. When these tasks are done, the buffers are essentially 'flipped' around so that the recently completed frame in the secondary buffer now becomes the primary buffer ready to send to the monitor, while a new frame begins composing in what was the primary buffer a moment ago. This is repeated over and over and thus the use of two buffers means that the graphics card is not constantly waiting for a single frame buffer to be cleared before getting on with rendering more frames to store there. It's like putting out a fire using two buckets of water instead of just one - one bucket can be filled with water while the contents of the other is being thrown on the fire, and then they're switched and the process repeated; much faster than just using a single bucket.

There is still a problem with double buffering, and that is when VSync is enabled, the graphics card can often fill both buffers and then have to stop working on any new frames until the monitor indicates it is ready for a new frame for its next refresh. Only then can the graphics card clear the primary buffer, switch buffers and begin rendering the next frame in the secondary buffer. This waiting is what causes a drop in FPS when VSync is enabled on many systems.

Wouldn't it then make sense to have more than two buffers? Why not three buffers for example - that would give the graphics card more room to render frames without having to worry about where to store them before they're sent to the monitor, even if VSync is enabled. Well there is an option which does just that, called Triple Buffering. And it generally does precisely what the name implies, it creates a third buffer in the VRAM, which we can call the Tertiary buffer.


Posted By : Santhosh G.

CUDA: GPGPU without Graphics Knowledge.

CUDA is NVIDIA’s parallel computing architecture. It enables dramatic increases in computing performance by harnessing the power of the GPU.
"CUDA programming" and "GPGPU programming" are not the same (although CUDA runs on GPUs). Previously, writing software for a GPU meant programming in the language of the GPU. CUDA permits working with familiar programming concepts while developing software that can run on a GPU. It also avoids the performance overhead of graphics layer APIs by compiling your software directly to the hardware (GPU assembly language, for instance), thereby providing great performance.
With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for CUDA, including image and video processing, computational biology and chemistry, fluid dynamics simulation, CT image reconstruction, seismic analysis, ray tracing, and much more.
Computing is evolving from "central processing" on the CPU to "co-processing" on the CPU and GPU. To enable this new computing paradigm, NVIDIA invented the CUDA parallel computing architecture that is now shipping in GeForce, ION, Quadro, and Tesla GPUs, representing a significant installed base for application developers.

Here is a very simple example code Details are expalined at
Sample code to increment all elements in the Array in Host(CPU) and Device(GPU):
void incrementArrayOnHost(float *a, int N )
  int i;
  for (i=0; i < N; i++) a[i] = a[i]+1.f;
__global__ void incrementArrayOnDevice(float *a, int N)
In the kernel on the CUDA-enabled device, several built-in variables are available
that were set by the execution configuration of the kernel invocation.
They are:

blockIdx which contains the block index within the grid.
threadIdx contains the thread index within the block.
blockDim contains the number of threads in a block.
  int idx = blockIdx.x*blockDim.x + threadIdx.x;
  if (idx<N) a[idx] = a[idx]+1.f;

// Calling CUDA kernel for incrementing an Array[a_d] of size ARRAY_MAX
incrementArrayOnDevice <<< 1, ARRAY_MAX>>> (a_d, ARRAY_MAX); // a_d is the float array in GPU, ARRAY_MAX is the size of Array.


Posted By : Santhosh G.

Vertical Synchronization: How to avoid tearing effect in high FPS display

When displaying graphics in high FPS, Tearing Effect can be avoided by enabling Vertical Sync.
Screen tearing is a visual artifact in video where information from two or more different frames is shown in a display device in a single screen draw.

The artifact occurs when the video feed sent to the device isn't in sync with the display's refresh, be it due to non-matching refresh rates, or simply lack of sync between the two. During video motion, screen tearing creates a torn look as edges of objects (such as a wall or a tree) fail to line up.
Tearing can occur with most common display technologies and video cards, and is most noticeable on situations where horizontally-moving visuals are commonly found, such as a indicator moving horizontally in the screen.

Vertical Synchronization, also called Vertical Sync, or simply VSync for short, was primarily required because of the physical limitations of CRT monitors. A CRT monitor has to constantly light up the phosphors on the screen many times per second to maintain an image, and can only do this a certain number of times per second based on how fast the electron gun in the monitor can move. Each time it has to redraw the entire screen again, it moves the electron gun inside the monitor from the bottom of the screen to point to the top left of the screen, ready to 'repaint' all the lines on the screen from top left to bottom right, and back again for the next refresh. The period during which the electron gun moves to the top of the screen for a new refresh is called the Vertical Blanking Interval (VBI).

Enabling VSync tells your graphics card to synchronize its actions with your monitor. That means the graphics card is only allowed to swap its frame buffer and send a new frame to the monitor when the monitor says it is ready to repaint a new screen - i.e. during the VBI[Vertical Blanking Interval]. Your graphics card and monitor do not have to be in sync; they can still operate properly when VSync is disabled, however when VSync is disabled, you can experience a phenomenon called Tearing in periods when your graphics card and monitor go out of sync, precisely because the graphics card and monitor are acting without regard for each other's limitations.


It is an unfortunate fact that if you disable VSync, your graphics card and monitor will inevitably go out of synch. Whenever your FPS exceeds the refresh rate (e.g. 120 FPS on a 60Hz screen), or in general at any point during which your graphics card is working faster than your monitor, the graphics card produces more frames in the frame buffer than the monitor can actually display at any one time. The end result is that when the monitor goes to get a new frame from the primary buffer of the graphics card during VBI, the frame may be made up of two or more different frames overlapping each other. This results in the onscreen image appearing to be slightly out of alignment or 'torn' in parts whenever there is any movement - and thus it is referred to as Tearing.

In OpenGL, WGL_EXT_swap_control can be used to turn on or off VSync.


Posted By : Santhosh G.

Security Best Practices for C++

Security Best Practices for C++

This topic contains information about recommended security tools and practices. Using these resources and tools does not make applications immune from attack, but it makes successful attacks less likely.
Visual C++ Security Features
This section discusses security features that are built into the Visual C++ compiler and linker.
/GS (Buffer Security Check)
This compiler option instructs the compiler to insert overrun detection code into functions that are at risk of being exploited. When an overrun is detected, execution is stopped. By default this option is on.
/SAFESEH (Image has Safe Exception Handlers)
This linker option instructs the linker to include in the output image a table that contains the address of each exception handler. At runtime, the operating system uses this table to make sure that only legitimate exception handlers are executed. This helps prevent the execution of exception handlers introduced by a malicious attack at runtime. By default this option is disabled.
/NXCOMPAT , /NXCOMPAT (Compatible with Data Execution Prevention)
These compiler and linker options enable Data Execution Prevention (DEP) Compatibility. DEP guards the CPU against executing non-code pages.
/analyze (Enterprise Code Analysis)
This compiler option activates code analysis that reports potential security issues such as buffer overrun, un-initialized memory, null pointer dereferencing, and memory leaks. By default this option is disabled. See Code Analysis for C/C++ Overview for more information.
/DYNAMICBASE (Use address space layout randomization)
This linker option enables building an executable image that can be loaded at different locations in memory at the beginning of execution. This option also makes the stack location in memory much less predictable.
Security-Enhanced CRT
The C Runtime Library (CRT) has been augmented to include secure versions of functions that pose security risks. (The unchecked strcpy string copy function, for example.) The older, nonsecure versions of these functions are now deprecated, and therefore their use causes compile-time warnings. We strongly encourage you to use the secure versions of these CRT functions instead of choosing to suppress the compilation warnings.
SafeInt Library
SafeInt Library helps prevent integer overflows and other exploitable errors that might result when the application performs mathematical operations. The SafeInt library includes the SafeInt Class, the SafeIntException Class, and several SafeInt Functions.
The SafeInt class protects against integer overflow and divide-by-zero exploits. It lets you handle comparisons between values of different types, and provides two error handling policies. The default policy is for the SafeInt class to throw a SafeIntException class exception to report why a mathematical operation cannot be completed. The second policy is for the SafeInt class to stop program execution. You can also define a custom policy.
Each SafeInt function protects one mathematical operation from an exploitable error. You can use two different types of parameters without having to convert them to the same type. Use the SafeInt class to protect multiple mathematical operations.
Checked Iterators
A checked iterator is an iterator that enforces container boundaries. By default, when a checked iterator is out of bounds, it generates an exception and ends program execution. A checked iterator provides other levels of response that depend on values assigned to preprocessor defines such as _SECURE_SCL_THROWS and _ITERATOR_DEBUG_LEVEL. For example, at _ITERATOR_DEBUG_LEVEL=2, a checked iterator provides comprehensive correctness checks in debug mode, that are made available by using asserts.
Code Analysis for Managed Code
Code Analysis for Managed Code, also known as FxCop, is a tool which checks assemblies for conformance to the Microsoft .NET Framework Design Guidelines. FxCop analyzes the code and metadata within each assembly to check for defects in the following areas:
  • Library design
  • Localization
  • Naming conventions
  • Performance
  • Security
Code Analysis for Managed Code is included in Visual Studio Application Lifecycle Management.
Windows Application Verifier
Available as part of the Application Compatibility Toolkit, the Application Verifier (AppVerifier) is a tool that can help developers identify potential application compatibility, stability, and security issues.
The AppVerifier monitors how an application uses the operating system. It watches the file system, registry, memory, and APIs while the application is running, and recommends source-code level fixes for the issues it uncovers.
The verifier lets you perform the following:
  • Test for potential application compatibility errors caused by common programming mistakes.
  • Examine an application for memory-related issues.
  • Test an application's compliance with the requirements for current logo programs such as the Windows 7 Software Logo Program and Windows Server 2008 R2 Logo Program.
  • Identify potential security issues in an application.

Posted By : Preethymol K. S 

Compiler Intrinsics

Most functions are contained in libraries, but some functions are built in (that is, intrinsic) to the compiler. These are referred to as intrinsic functions or intrinsics.

If a function is an intrinsic, the code for that function is usually inserted inline, avoiding the overhead of a function call and allowing highly efficient machine instructions to be emitted for that function. An intrinsic is often faster than the equivalent inline assembly, because the optimizer has a built-in knowledge of how many intrinsics behave, so some optimizations can be available that are not available when inline assembly is used. Also, the optimizer can expand the intrinsic differently, align buffers differently, or make other adjustments depending on the context and arguments of the call.
The use of intrinsics affects the portability of code, because intrinsics that are available in Visual C++ might not be available if the code is compiled with other compilers and some intrinsics that might be available for some target architectures are not available for all architectures. However, intrinsics are usually more portable than inline assembly. The intrinsics are required on 64-bit architectures where inline assembly is not supported.
Some intrinsics, such as __assume and __ReadWriteBarrier, provide information to the compiler, which affects the behavior of the optimizer.
Some intrinsics are available only as intrinsics, and some are available both in function and intrinsic implementations. You can instruct the compiler to use the intrinsic implementation in one of two ways, depending on whether you want to enable only specific functions or you want to enable all intrinsics. The first way is to use #pragma intrinsic(intrinsic-function-name-list). The pragma can be used to specify a single intrinsic or multiple intrinsics separated by commas. The second is to use the /Oi (Generate Intrinsic Functions) compiler option, which makes all intrinsics on a given platform available. Under /Oi, use #pragma function(intrinsic-function-name-list) to force a function call to be used instead of an intrinsic. If the documentation for a specific intrinsic notes that the routine is only available as an intrinsic, then the intrinsic implementation is used regardless of whether /Oi or #pragma intrinsic is specified. In all cases, /Oi or #pragma intrinsic allows, but does not force, the optimizer to use the intrinsic. The optimizer can still call the function.
Some standard C/C++ library functions are available in intrinsic implementations on some architectures. When calling a CRT function, the intrinsic implementation is used if /Oi is specified on the command line.
A header file, Intrin.h, is available that declares prototypes for the intrinsic functions. Additionally, certain Windows headers declare functions that map onto a compiler intrinsic.


Posted By : Preethymol K. S 

Discovering Memory Leaks

Title - Discovering Memory Leaks
Details - If you are working on a serious project, you will propably use a sophisticated memory manager which will discover memory leaks (along with their exact position in the code),
but if you just want to do a quick test in a small program, there is an easy way to let the debugger check for memory leaks:

- Include crtdbg.h in your project (this file is included in the
Microsoft Platform SDK
- At the beginning of main() or WinMain () put this code:

int flag = _CrtSetDbgFlag(_CRTDBG_REPORT_FLAG);
This works in Visual C++ and, if you run the programm in debug mode, it will report memory leaks in the debug output window 
("Immediate Window" in the latest Visual Studio version). 
Posted By : Preethymol K. S 

Tuesday, March 1, 2011

Debugging Tips - @CLK

We can use ‘Watch’ window to get time information. @CLK can serve as a timer in your watch window. In many cases, we just want a rough idea of the time between two points, and @CLK makes it easy to find out how long it took to execute between two breakpoints. Please note that this time includes the debugger overhead. The trick is to enter @CLK in watch window, running time between two breakpoints will be added up to the current clock value. You can reset the value by typing @CLK=0 in watch window.
The time is in microseconds, to get time in milliseconds, set the @CLK to @CLK/1000. Although it is not a perfect timer, @CLK is good enough for some general guesses.

<- Add break point here (B1)
<<Code block>>
<- Add another break point here (B2)

Debug the code using F5. Once you reach B1, type @CLK in watch window which will display current clock value. Now type @CLK=0 in watch window to reset the clock. Now press F5 to continue, once the control reaches B2, watch window will show the time taken for executing code block between B1 and B2.

Note: This will not include the break time @B1.

Posted By : Junaij M

Access Computer Information with the System Information Tool

Title - Access Computer Information with the System Information Tool

Details - Windows XP has a built-in tool that offers a wealth of information about your computer. System Information gives users rapid access to hardware resource and software environment information, component and application data, system history, useful tools such as Net and Direct X Diagnostics and much more.

System Information can be a very useful tool for troubleshooting computer problems. For example, digging into items listed in the "Software Environment" can give you information about which programs have recently experienced serious errors, driver details, network connections, running tasks and more. The System History view records changes to hardware resources and the software environment and this can help you track down problems that may have occurred due to driver upgrades or hardware changes.

An integrated search feature allows you to more easily find what you are looking for.

To launch the System Information tool:

   1. Click "Start" and then "Run".
   2. In the Run field, enter "msinfo32.exe" without the quotes.
   3. Click "OK" and the tool should launch.

Alternatively, you can click "Start /All Programs/Accessories/System Tools/System Information" to open the tool directly from the Start Menu.

Reference :-

Posted By : Preethymol K. S

Why You Shouldn't Store auto_ptr Objects in STL Containers

Title - Why You Shouldn't Store auto_ptr Objects in STL Containers

Details - copying or assigning one auto_ptr to another makes changes to the original in addition to the expected changes in the copy. To be more specific, the original object transfers ownership of the pointer to the target, thus making the pointer in the original null. Imagine what would happen if you did something like this:

std::vector <auto_ptr <Foo> > vf;/*a vector of auto_ptr's*/
// ..fill vf
int g()
  std::auto_ptr <Foo> temp=vf[0]; /*vf[0] becomes null*/

When temp is initialized, the pointer of vf[0] becomes null. Any attempt to use that element will cause a runtime crash. This situation is likely to occur whenever you copy an element from the container. Remember that even if your code doesn't perform any explicit copy or assignment operations, many algorithms (std::swap(), std::random_shuffle() etc.) create a temporary copy of one or more container elements. Furthermore, certain member functions of the container create a temporary copy of one or more elements, thereby nullifying them. Any subsequent attempt to the container elements is therefore undefined.

Reference –

 Posted By : Preethymol K. S

Integrated Performance Primitives (IPP

Tip - About Integrated Performance Primitives (IPP)

Details -  Multicore Power for Multimedia and Data Processing

Intel® Integrated Performance Primitives (Intel® IPP) is an extensive library of multicore-ready, highly optimized software functions for multimedia, data processing, and communications applications. Intel IPP offers thousands of optimized functions covering frequently used fundamental algorithms.


Intel IPP functions are designed to deliver performance beyond what optimized compilers alone can deliver, by matching the function algorithms to low-level optimizations based on the processor's available features such as Streaming SIMD Extensions (SSE) and other optimized instruction sets.
Support for multicore processors
Intel® IPP functions are fully thread-safe, and many are internally threaded to help you get the most out of today’s multicore processors.

Operating systems

Use the same API for application development on multiple operating systems: Windows*, Linux* and MAC OS*.

Programming languages

Intel® IPP natively supports C and C++ development; cross-language usage examples provided for C#/.NET and Java*.

Processor support

Intel® IPP is validated for use with multiple generations of Intel® and compatible processors including but not limited to: Intel® Atom™ processor, Intel® Core™2 processor, Intel® Core™ processor, Intel® Pentium® D processor, Intel®
Pentium® M processor, Intel® Xeon™ processor, Intel ® Pentium® 4 processor, Intel® Celeron® processor.

Source code usage samples

Jumpstart your application development with source code samples incorporating Intel® IPP, including video/audio/speech codec’s, image processing, data compression, and other high-level algorithm implementations.

Support for future instruction sets and additional CPU cores

Intel® IPP is optimized for current multicore and future manycore processors. As new instruction sets become supported in Intel CPUs, just relink with the latest version of Intel IPP to achieve the greater application performance provided by the new instruction sets.

Royalty-free redistribution

Redistribute unlimited copies of the runtime libraries with your application.


Posted By : Binu M D 

Wednesday, February 23, 2011

Debugging a blue screen

Title - Debugging a blue screen

Details - Have you ever wondered how to obtain extra information from the infamous Blue Screen of Death (BSOD) that will sometimes show up and give you a cryptic, Stop: 0×00000000 error message, before flashing off the screen? The error message is trying to point you to a fatal operating system error that could be caused by a number of problems. When the system encounters a hardware problem, data inconsistency, or similar error, it may display a blue screen containing information that can be used to determine the cause of the error. This information includes the STOP code and whether a crash dump file was created. It may also include a list of loaded drivers and a stack trace.
Microsoft’s WinDBG will help you to debug and diagnose the problem and then lead you to the root cause so you can fix it.
Steps For Analyze
  1. Create and capture the memory dump associated with the BSOD you are trying to troubleshoot.
  2. Install and configure WinDBG and the Symbols path to the correct Symbols folder.
  3. Use WinDBG to Debug and analyze the screen dump, and then get to the root cause of the problem.
A minidump is a smaller version of a complete, or kernel memory dump.  Usually Microsoft will want a kernel memory dump.  But the debugger will analyze a mini-dump and quite possibly give information needed to resolve.  If it's all you have, then debug it, rather than waiting for the machine to crash again.  Open the file in the debugger (see below) just as opening memory.dmp.
Steps to create memory dump
Keep in mind that if you are not experiencing a blue screen fatal system error, there will be no memory dump to capture.
1. Press the WinKey + Pause.
2. Click Advanced, and under Start Up and Recovery.
3. Uncheck Automatically Restart.
4. Click on the dropdown arrow under Write Debugging Information.
5. Select Small Memory Dump (64 KB) and make sure the output is %SystemRoot%\Minidump.
6. Restart the PC normally, as this will allow the System to error and Blue Screen and then create the Minidump.
The location of the Minidump files can be found here:
To download and install the Windows debugging tools for your version of Windows, visit the Microsoft Debugging Tools Web site.
Follow the prompts, and when you install, take note of your Symbols location, if you accept the default settings this Microsoft Support Knowledge Base article will explain how to read the small memory dump files that Windows creates for debugging purposes.

Dump Analyze using WinDBG

 Open WinDBG and select File and select Open Crash Dump and then navigate to the minidump file created earlier, highlight it, and select Open.
Click on:
! analyze –v
As shown in Figure C under Bugcheck Analysis.

Figure C

! analyze -v


The problem creating the BSOD was caused by the installed driver software for a USB modem. The answer to the problem was achieved by using the WinDBG tool to Debug and analyze the memory dump file.


Posted By : Binu M D