Archive | Programming and Code RSS for this section

Length of Array (and no we can’t always know that)

One way of checking whether an array is allocated correctly is the check if there are as many cells as you think there was going to be. In C++, this can be done using sizeof operator. Sizeof output the number of bytes an variable is taking up. So the size of an array can be found using the command

int num=5;
double array[num];

size=sizeof(array)/sizeof(*array);

Hang on! What’s the size of *array? Usually the asterisk indicate you’re working with a pointer. Basically, this part output the size of the variable contained in the first cell of the array.sizeof(array) gives the size of the entire array, so the code gives you the number of cells.

However, this method doesn’t work if the array is allocated dynamically using

int num=5;
double* array;

array=new double[num];

If we try to find the size using

size=sizeof(array)/sizeof(*array);

the result will be 1. Why is that? This is because array is declared as a pointer of type double, so it gives the size of a double variable, unlike when array is allocated statically. The sizeof(*array) will also give the size of a double variable. So no matter how large num is, this method will always give the size of 1.

How can we find the size of a dynamically allocated array? I don’t know a work around that doesn’t involve knowing the size of the array, because the array wouldn’t have any information that allows us to infer the size of the array.

Advertisements

C programming: multiple function, multiple source code and multi-dimensional array

This will be another technical note, so I don’t need to search for the solution again in the future. It is frustrating that the solution to a particular technical problem often are hard to find on the internet. It seems to me that a lot of people are asking, a lot are answering, but there isn’t a good systematic categorization to make finding the relevant solution easier.

Multiple function program

In the case where the program is long and complicated, but tend to repeating computing the same function, what can be done is we can write the repeating function and call it whenever we needed it. This reduce the tedious work of typing the same thing over and over again in the code.

A function  needs two things, the argument(s) and the return value. This is why the function itself has a type according to the value they return. The function, like variables, also needs to be declared before it can be called by the main function. (Just think about how would you know the phone number of a particular person whom you don’t know the person’s face or name!) If there is only one source file, declaring the function can be done in two ways:

1. function is declared and written before main function

#include <stdio.h>
#include <math.h>

double power_series(int d, /*order to truncate the series*/
double x)
{
double return=0;
int i=0;
for(i=0;i<=d;i++){
result=result+pow(x,i);
}
return(result);
}

int main(){
double result=0;
int d=4;
double x=3;
result=power_series(d,x);
return(0);

}

2. function is declared before the main function but written after the main function

#include <stdio.h>
#include <math.h>

double power_series(int d, double x); /*declaration of function*/<

int main(){
double result=0;
int d=4;
double x=3;
result=power_series(d,x);
return(0);

}
/*detail of the function*/
double power_series(int d, /*order to truncate the series*/
double x)
{
double return=0;
int i=0;
for(i=0;i<=d;i++){
result=result+pow(x,i);
}
return(result);
}

Similar principle goes for function on a different source file, the function must be declared before it can be called. Let’s say, we created source files called main.c for the main function and series.c for containing calculation of series. To link this together, we need a header called series.h to be added to both main.c and series.c. This header will contain all the necessary libraries and declaration of functions in series.c.

main.c series.h series.c
#include <stdio.h>
#include <series.h>
int main(){}
#include <math.h>
double series(int d, double x);
#include <series.h>double series(int d, double x){

}

Passing Array as Argument

The function can take as much argument as is needed, but can only return only one value. In a lot of cases, we need a function that can return multiple values of the calculation, an array of it even. To by pass the limitation of function return, we can instead instruction the function to place the value into a particular memory which is associated with a variable in the main function. What the function would need is a pointer to where they should place the function which can be passed into the function by

double power_series(int d, double x, double* y); /*passing location of y*/.

To pass the location to the function in, we use

power_series(d,x,&y); /*y is the variable that will hold the result from the function*/.

To assign the value to y inside the functions, we need to refer to *y instead of y.

Passing and array into the function is done on the same principle. When writing

power_series(d,x,y[ ]); /*passing y as array*/,

the function is receiving the pointer for array y[ ] which has to be a one-dimensional array (as limited as it is). The good thing is the handling of the array is easy in the function as it can just treat it as a variable. The problem, however, is when the array of more than one dimension has to be passed into the function.

A way to get around this we have to look at how an array is constructed in the memory. The elements of an array is stored consecutive to one another. In two-dimensional array, for example, will be stored as

|A[0][0]|A[0][1]|A[0][2]|A[1][0]|A[1][1]|A[1][2]|….

To access the element is then to call the correct value in the correct position which follows the relation

A[i][j]=*(A[i]+j).

This means in order to pass an array, only an address to an element in the array is needed regardless of the size and dimension of the array. So what we send into the function is then

power_series(d,x,&y[0][0]);

To assign value to the element of the array in the function is then

*(y+10)=result; /*assigning value to the 11th element of the array*/.

One thing to be careful about is how to address row and column. It is best to pass the dimensions of the array into the function in order to allow the computation of row and column.

Passing the array in this manner can also be used for structures. The difference is that the component has to be specified like this

(*(y+10)).component1=result;

The parenthesis before .component1 is absolutely essential to address the component correctly.

For further reading, see the reference.

References

1. 2D Arrays & pointer to a pointer(**): http://www.cs.cmu.edu/~ab/15-123S09/lectures/Lecture%2006%20-%20%20Pointer%20to%20a%20pointer.pdf

2. Pointers and Structures: http://www.taranets.net/cgi/ts/1.37/ts.ws.pl?w=329;b=282

Compiling and Running MPI Programs on WestGrid

This document is intended to be for my own reference for most part, but I will try to go through it step-by-step just so anybody new to using ssh client and WestGrid like myself would know what to expect as well. I would recommend that new users read the official QuickStart guide here, if you have not yet done so, before proceeding.

In order to use WestGrid, we’ll need a SSH (Secure Shell) client which act as a terminal for us to access the server and a file transfer client (we can’t use the same terminal for both functions, I’m afraid). I use PuTTY as SSH client and PSFTP for file transfer. These two comes together in an installation package. Other SSH clients and file transfer clients work as well, but please check the list given in the QuickStart guide for compatibility.

Transferring Files

In order to compile a program, we need to transfer the source code to the server first. The logging in is pretty simple as the terminal will guide you through it, but first you must choose the server to log into. In this case, I choose the Checkers on WestGrid. To decide which server to use depends on the task we want to do. I choose Checkers because I will be doing parallel processing using MPI and Checkers supports this.

psftp-login

To choose the server in PSFTP, simply type

open server-name.westgrid.ca

It will ask for the username and password in succession. Then, BOOM, you’re in!

There are a number of commands that we can use and we can access the list by typing help. Here are some useful ones to remember:

lcd [directory] This is for accessing local directory like using cd in local command line.
put [file] This is for transferring a file from local machine to the server.
get [file] This is for downloading a file from the server to the local machine.
ls This is for listing the files in your directory on the server
bye This is for leaving the server and PSFTP

I found it easier to first go to the folder where the source file is using lcd then uploading the file by using put. We can check if the file is there by using ls. Once the file is up on the server, we switch to PuTTY to compile the source code. We can log in through PuTTY and PSFTP at the same time, but the login has to be done separately.

Compiling the Source Code

Once the source files are on the server, we can use the compilers on the server. First, we log in using PuTTY. No extra setting is needed aside from specifying the server we want to log in. Once PuTTY starts to make the connection, it will ask for username and password again.

putty-login

The exact command depend entirely on what language the program was written in. The full description of the programming including compiling on Checkers can be found here. Since I coded my test in C with MPI, I use the command

mpicc -O3 main.c -lm -o main

which I take from here. This command use gcc as the underlying compiler (if you are not familiar with that, it’s a common c compiler in Unix and Linux), therefore the options following mpicc are the same as what you would put in for gcc. -O3 (capital O) is an optimization option. You can find more detail here. -lm link the code to standard math library, and -o [file] simply specify the output.

Submitting and Running the Program

To run the program on WestGrid, we need to send a job request to the system. This is because the computational resources are shared with other jobs and, especially in the case of the MPI, the number of processors and memories had to be put aside for large jobs. The resources on WestGrid is managed by TORQUE which takes the job request in form of batch job scripts. This has to be written on a separate file and uploaded onto the server into the same directory as the output of the compiler.

mainpbs

The .pbs file is written in shell script which always start with

#! /bin/bash

This command specify the path in which the code is to be interpret. It will try to find Bash execution in /bin.

The following lines specify the options for running the job which is written after #PBS. #PBS -S /bin/bash does pretty much the same thing as the line before which is specifying the shell to run the script in. Typically, WestGrid has default limit to how much processor, memory, and time we can use for a job. To specify our own limit we use the commands

#PBS -l procs=10 to specify number of processors for the job
#PBS -l mem=2000mb to specify the amount of memory needed
#PBS -l walltime=72:00:00 to specify the job runtime.

Different resources which can be set under -l can be done at once by separating them with commas or specify when submitting a job by putting the option after qsub command.

qsub -l procs=10,pmem=2gb,walltime=72:00:00 main.pbs

If the resources have already been specify, we can simply go qsub main.pbs. For details and options that WestGrid support, read here.

In the later part of the script # will be used as comment symbol.

cd $PBS_O_WORKDIR find the directory where the job is (essentially where the .pbs file is). $PBS_O_WORKDIR let TORQUE find the path automatically rather than typing the absolute path ourselves. The dollar sign indicates that it is a variable.

echo is used to display information of the job. However, the information is not displayed on the terminal and it is actually a better idea to store your output as a file on your server directory instead.

The command to run the program in my case is mpiexec ./main. Number of processors can also be specified here if it hasn’t been previously by using option -n [# of procs] before ./main. More option for command mpiexec can be found here.

Transferring Output File

Once WestGrid finish running the job, we should have the output file ready in the server directory. To move this file to local machine, we once again switched to PSFTP and use command get [file]. This will move the file to your local directory which is presumably the source code directory from earlier. Once we’re satisfy that that we have the correct result, we can log out from PSFTP using the command bye and from PuTTY by simply shutting down the window.

MATLAB Code for MOT Lifetime – Linear Fitting

% #1 read .cvs file
C=uiimport(‘FileFolder\C2mot-gradient200032.txt’);

x=1:length(C.data(:,1));
plot(x,C.data(:,2));

% #2 find I_max
B=C.data(14000:17000,2);
I_max=mean(B);

w=log(I_max-C.data(:,2));

% #5 linear fit =>> slope is -1/tau =>> This is the first return value of
%polyfit
p_flu=polyfit(C.data(7200:10000,1),w(7200:10000),1);
tau=-1/p_flu(1)

MLoad= I_max-(exp(p_flu(2)).*(exp(-C.data(:,1)/tau)));
plot(x(5000:25000),C.data(5000:25000,2),x(5000:25000),MLoad(5000:25000))

MATLAB Code for Area Fit

%% Cloud area and Volume

% Imaging parameters
Magn=1; %magnification of the imaging system
px_sizew=3.75*10^(-3); %width of CCD pixel-in mm
px_sizeh=3.75*10^(-3); %height of CCD pixel-in mm
px_area=px_sizew*px_sizeh; %area for each pixel in mm^2

%% load image
C1=imread(‘C:\Users\Anarisil\Desktop\MOTPhase2\20120502\7-4\50ms.jpg’);

CAvrg=double(C1);

%% Dimension of the picture

[px_height,px_width]=size(CAvrg);
x=1:px_width;
y=1:px_height;

%% finding the maximum value of that point and the 1/e value
MaxInt=max(max(CAvrg));
threshold=MaxInt/exp(1);

%% %% counting
px_count=0;

for i=1:px_height
for j=1:px_width
if CAvrg(i,j)>=threshold
px_count=px_count+1;
else
end
end
end

%px_count will give the number of pixels with values above the threshold

%% determining the volume
%assumption: x=y=z
V(1,1)=px_count*px_area;
V(2,1)=4/(3*sqrt(pi))*(V(1,1))^(3/2); %in mm^3

MATLAB Code for Cloud Size – Gaussian Fit

% Imaging parameters
Magn=1; %magnification of the imaging system
px_sizew=3.75*10^(-3); %width of CCD pixel-in mm
px_sizeh=3.75*10^(-3); %height of CCD pixel-in mm

%% Loading files and average%

%import background

bck1=imread(‘FileFolder\bck1.jpg’);

BckAvrg=double(bck1);

%import array
%C1=dlmread(‘FileFolder\1ms-10to18-10steps-50mscomp-control5averaged’,’\t’);

%import image

C1=imread(‘FileFolder\40ms.jpg’);
CAvrg=double(C1);

%% Dimension of the picture

[px_height,px_width]=size(CAvrg);
x=1:px_width;
y=1:px_height;

%% reduce data to one dimension%
windowSize = 10;
Sx=filter(ones(1,windowSize)/windowSize,1,sum(CAvrg));
%fit the overall sum
lnSx=log(Sx);

p=polyfit(x(700:850),lnSx(700:850),2);
A2=p(1);
A1=p(2);
A0=p(3);
sigmaSx=sqrt(-1/(2*A2));
muSx=A1*sigmaSx^2;
ASx=exp(A0+muSx^2/(2*sigmaSx^2));
subplot(2,2,1),
plot(x,Sx,x,ASx*exp(-(x-muSx).^2/(2*sigmaSx.^2)));

%Sy
Sy=filter(ones(1,windowSize)/windowSize,1,sum(CAvrg’));
lnSy=log(Sy);

p=polyfit(y(500:600),lnSy(500:600),2);
A2=p(1);
A1=p(2);
A0=p(3);
sigmaSy=sqrt(-1/(2*A2));
muSy=A1*sigmaSy^2;
ASy=exp(A0+muSy^2/(2*sigmaSy^2));
subplot(2,2,2),
plot(y,Sy,y,ASy*exp(-(y-muSy).^2/(2*sigmaSy.^2)));

%% find center – the highest peak from the sum%
[peakx,Cenx]=max(ASx*exp(-(x-muSx).^2/(2*sigmaSx.^2)));
[peaky,Ceny]=max(ASy*exp(-(y-muSy).^2/(2*sigmaSy.^2)));

Cenx=Cenx
Ceny=Ceny

%Single out the row/column to represent each dimension and normalized%

%% ### finding the Gaussian fit for X ###

Ex=0;

%Use one slice of the image
%Ex=CAvrg(Ceny,:);

%OR add n slices around the center of the cloud
n=10;
for i=Ceny-(n/2):Ceny+(n/2)
Ex=Ex+CAvrg(i,:);
end
Ex=Ex/(n+1);

%plot(Ex)
ExMin=mean(Ex(1200:length(Ex)));
NormEx=Ex-ExMin;

%smoothing NormEx
windowSize = 1; % 1== no smoothing
fEx=filter(ones(1,windowSize)/windowSize,1,NormEx);

lnEx=log(fEx);

p=polyfit(x(630:830),lnEx(630:830),2);
A2=p(1);
A1=p(2);
A0=p(3);
sigmax=sqrt(-1/(2*A2));
mux=A1*sigmax^2;
Ax=exp(A0+mux^2/(2*sigmax^2));
subplot(2,2,3),
plot(x,fEx,x,Ax*exp(-(x-mux).^2/(2*sigmax.^2)));
rx=sqrt(2)*px_sizew*sigmax./Magn % in mm

%Regression Dianogsis: R-squared
x_bar=mean(fEx);
x_residual=(fEx-Ax*exp(-(x-mux).^2/(2*sigmax.^2))).^2;
Sx_error=sum(x_residual);
dev_Avrgx=(fEx-x_bar).^2;
Sx_total=sum(dev_Avrgx);
Rx_Squared=1-(Sx_error/Sx_total);

%Regression Dianogsis: comparing integration
rez_x=abs((trapz(fEx)-trapz(Ax*exp(-(x-mux).^2/(2*sigmax.^2))))/trapz(fEx))

%% ### finding the Gaussian fit for Y ###

Ey=0;

%Use one slice of the image
%Ey=CAvrg(:,Cenx);
%Cenx=Cenx;
%OR add n slices around the center of the cloud

n=10;
for i=Cenx-(n/2):Cenx+(n/2)
Ey=Ey+CAvrg(:,i);
end
Ey=Ey/(n+1);

EyMin=mean(Ey(1:100));
NormEy=Ey-EyMin;

%smoothing NormEy
windowSize = 1;
fEy=filter(ones(1,windowSize)/windowSize,1,NormEy);

lnEy=log(double(fEy));
y=(1:px_height)’;

p=polyfit(y(500:630),lnEy(500:630),2);
A2=p(1);
A1=p(2);
A0=p(3);
sigmay=sqrt(-1/(2*A2));
muy=A1*sigmay^2;
Ay=exp(A0+muy^2/(2*sigmay^2));
subplot(2,2,4),
plot(y,fEy,y,Ay*exp(-(y-muy).^2/(2*sigmay.^2)))
ry=sqrt(2)*px_sizeh*sigmay./Magn %in mm

%Regression Dianogsis: R-squared
y_bar=mean(fEy);
y_residual=(fEy-Ay*exp(-(y-muy).^2/(2*sigmay.^2))).^2;
Sy_error=sum(y_residual);
dev_Avrgy=(fEy-y_bar).^2;
Sy_total=sum(dev_Avrgy);
Ry_Squared=1-(Sy_error/Sy_total)

%Regression Dianogsis: comparing integration
rez_y=abs((trapz(fEy)-trapz(Ay*exp(-(y-muy).^2/(2*sigmay.^2))))/trapz(fEy))