cuda 同步与计时

同步block

_syncthreads()

同步kernel

cudaDeviceSynchronize()
waits until all preceding commands in all streams of all host threads have completed.

同步stream

cudaStreamSynchronize()
takes a stream as a parameter and waits until all preceding commands in the given stream have completed. It can be used to synchronize the host with a specific stream, allowing other streams to continue executing on the device.

Although CUDA kernel launches are asynchronous, all GPU-related tasks placed in one stream (which is default behaviour) are executed sequentially.
如果在kernel中使用printf,因为kernel调用是异步的,所以要使用DeviceSynchronize()进行同步,否则没有输出。
CUDA提供了两种对kernel进行同步的方式:

  • 使用cudaThreadSynchronize()进行显示同步,使主机进入阻塞状态,停止运行并等待所有已经提交的kernel执行完毕。
  • 利用cudaMemcpy()实现阻塞式数据传输,实际上内部调用了cudaThreadSynchronize()。

 
 

marvin安装与使用

 
cuDNN安装

cp lib* cudnn_dir/lib64/
cp cudnn.h cudnn_dir/include/
cd cudnn_dir
export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH

如出现error while loading shared libraries: libcudnn.so.4: cannot open shared object file: No such file or directory错误,是文件权限问题,可进行如下操作

cd cudnn_dir
rm -rf libcudnn.so libcudnn.so.4
chmod u=rwx,g=rx,o=rx libcudnn.so.4.0.4
ln -s libcudnn.so.4.0.4 libcudnn.so.4
ln -s libcudnn.so.4 libcudnn.so

 
marvin依赖cuda 7.5和cuDNN 4rc
curl -L https://github.com/PrincetonVision/marvin/tarball/master | tar zx
mv PrincetonVision* marvin && cd marvin
./compile.sh

c++模版中的dependent type和typename

Qualified and unqualified names

A qualified name is one that specifies a scope. For instance, in the following C++ program, the references to cout and endl are qualified names:

#include <iostream>
int main()  {
   std::cout << "Hello world!" << std::endl;
}

In both cases, the use of cout and endl began with std::.

Dependent and non-dependent names

A dependent name is a name that depends on a template parameter. Suppose we have the following declaration (not legal C++):

template <class T>
class MyClass {
   int i;
   vector<int> vi;
   vector<int>::iterator vitr;
   T t;
   vector<T> vt;
   vector<T>::iterator viter;
};

The types of the first three declarations are known at the time of the template declaration. However, the types of the second set of three declarations are not known until the point of instantiation, because they depend on the template parameter T.
The names T, vector<T>, and vector<T>::iterator are called dependent names, and the types they name are dependent types. The names used in the first three declarations are called non-dependent names, at the types are non-dependent types.
如下面一段代码中,const_iterator是从属类型,需要在它前面加上typename。否则,在某些情况下,会导致编译解析时产生二义性。

template<typename C>
bool lastGreaterThanFirst(const C& container){
	if(container.empty())
		return false;
	typename C::const_iterator begin(container.begin());
	typename C::const_iterator end(container.end());
	return *--end > *begin;
}

下面进行详细解释

template <class T>
void foo() {
   T::iterator * iter;
   ...
}

如果定义一个嵌套类型的类,

class ContainsAType {
   class iterator { ... }:
   ...
};

foo<ContainsAType>();  在这种情况下,iter将会被声明为一个指向T::iterator 类型的指针变量。
但是如果有人按以下方式声明类,

class ContainsAValue {
   static int iterator;
};

foo<ContainsAValue>(); 在这种情况下,将会有两种解析结果:一个叫做iter的变量,或者静态变量T::iterator。只有在实例化后才能消除他们之间的歧义。
Before a qualified dependent type, you need typename. Without typename, there is a C++ parsing rule that says that qualified dependent names should be parsed as non-types even if it leads to a syntax error. 
头疼,先mark下来。。
参考:http://pages.cs.wisc.edu/~driscoll/typename.html
 
 

STL系列——list

template < class T, class Alloc = allocator<T> > class list;
list是一个序列容器,是以双端链表的形式实现的。list相对于其他序列容器,在任意位置的删除、移动、提取元素的性能更好,但是在随机访问上的性能表现比较差。

Member functions

default (1)	list();
                explicit list (const allocator_type& alloc);
fill (2)	explicit list (size_type n, const allocator_type& alloc = allocator_type());
                list (size_type n, const value_type& val,const allocator_type& alloc = allocator_type());
range (3)	template <class InputIterator>
                    list (InputIterator first, InputIterator last,const allocator_type& alloc = allocator_type());
copy (4)	list (const list& x);
                list (const list& x, const allocator_type& alloc);
move (5)	list (list&& x);
                list (list&& x, const allocator_type& alloc);
initializer list (6)
                list (initializer_list<value_type> il,const allocator_type& alloc = allocator_type());
copy (1)	list& operator= (const list& x);
move (2)	list& operator= (list&& x);
initializer list (3)
                list& operator= (initializer_list<value_type> il);

用法:

std::list<int> first (3);      // list of 3 zero-initialized ints
std::list<int> second (5);     // list of 5 zero-initialized ints
second = first;
first = std::list<int>();

Iterators

begin(), end(), rbegin(), rend(), cbegin(), cend(), crbegin(), crend()
当容器为空时,begin() 返回的迭代器不能解引用,end( )和begin( )返回一样。

Capacity

bool empty() const noexcept;              // 判断是否为空
size_type size() const noexcept;          // list中元素个数
size_type max_size() const noexcept;

Element access

std::list::front( )

 reference front();
const_reference front() const;

std::list::back( )

reference back();
const_reference back() const;

Modifiers