无锁编程

无锁编程有两个关键点:

  • 原子操作或者cas原语
  • aba问题
inline bool CAS(pointer_t *addr, pointer_t &old_value, pointer_t &new_value)
{
    bool ret;
    __asm__ __volatile__(
    "lock cmpxchg16b %1;\n"
    "sete %0;\n"
    :"=m"(ret),"+m" (*(volatile pointer_t *) (addr))
    :"a" (old_value.ptr), "d" (old_value.tag), "b" (new_value.ptr), "c" (new_value.tag));
    return ret;
}

lock锁地址总线
aba问题:在读取v值和进行cas操作前,如果有一个线程将v值由a改成b,另外一个线程在将b改成了a,就会cas操作混乱。解决方法是用标记或版本编号与进行CAS操作的每个值相关联,并原子的更新值和标记。
 
入队列

EnQueue (x) //进队列改良版
{
    q = new record ();
    q->value = x;
    q->next = NULL;
    p = tail;
    oldp = p
    do {
        while (p->next != NULL)
            p = p->next;
    } while( CAS (p.next, NULL, q) != TRUE); //如果没有把结点链上,再试
    CAS (tail, oldp, q); //置尾结点
}

出队列

DeQueue () //出队列
{
    do{
        p = head;
        if (p->next == NULL){
            return ERR_EMPTY_QUEUE;
        }
    while( CAS (head, p, p->next);
    return p->next->value;
}

 
https://www.codeproject.com/Articles/153898/Yet-another-implementation-of-a-lock-free-circular
http://www.drdobbs.com/parallel/writing-lock-free-code-a-corrected-queue/210604448?pgno=1
https://blog.csdn.net/chenjiayi_yun/article/details/16333595
https://blog.csdn.net/mergerly/article/details/39009473 scsp无锁kfifo队列
 
 
 
 

CUDA Dynamic Parallelism

 
CUDA 5.0中引入动态并行化,使得在device端执行的kernel的线程也能跟在host上一样launch kernels,只有支持CC3.5或者以上的设备中才能支持。动态并行化使用CUDA Device Runtime library(cudadevrt),它是一个能在device code中调用的CUDA runtime子集。

编译链接

为了支持动态并行化,必须使用两步分离编译和链接的过程:首先,设定-c和-rdc=true(–relocatable-device-code=true)来生成relocatable device code来进行后续链接,可以使用-dc(–device -c)来合并这两个选项;然后将上一步目标文件和cudadevrt库进行连接生成可执行文件,-lcudadevrt。过程如下图Figure 4: The Separate Compilation and Linking Process for Dynamic Parallelism.

nvcc -arch=sm_35 -dc myprog.cu -o myprog.o
nvcc -arch=sm_35 myprog.o -lcudadevrt -o myprog

或者简化成一步

nvcc -arch=sm_35 -rdc=true myprog.cu -lcudadevrt -o myprog.o

执行、同步

在CUDA编程模型中,一组执行的kernel的线程块叫做一个grid。在CUDA动态并行化,parent grid能够调用child grids。child grid继承parant grid的特定属性和限制,如L1 cache、shared_memory、栈大小。如果一个parent grid有M个block和N个thread,如果对child kernel launch没有控制的话,那个将产生M*N个child kernel launch。如果想一个block产生一个child kernel,那么只需要其中一个线程launch a kernel就行。如下

if(threadIdx.x == 0) {
  child_k <<< (n + bs - 1) / bs, bs >>> ();
}

grid lanuch是完全嵌套的,child grids总是在发起它们的parent grids结束前完成,这可以看作是一个一种隐式的同步。

如果parent kernel需要使用child kernel的计算结果,也可以使用CudaDeviceSynchronize(void)进行显示的同步,这个函数会等待一个线程块发起的所有子kernel结束。往往不知道一个线程块中哪些子kernel已经执行,可以通过下述方式进行一个线程块级别的同步

void threadBlockDeviceSynchronize(void) {
  __syncthreads();
  if(threadIdx.x == 0)
    cudaDeviceSynchronize();
  __syncthreads();
}

CudaDeviceSynchronize(void)调用开销较大,不是必须的时候,尽量减少使用,同时不要在父kernel退出时调用,因为结束时存在上述介绍的隐式同步。

内存一致

当子 grids开始与结束之间,父grids和子grids有完全一致的global memory view。

当子kernel launch的时候,global memory视图不一致。

__device__ int v = 0;
__global__ void child_k(void) {
  printf("v = %d\n", v);
}
__global__ void parent_k(void) {
  v = 1;
  child_k <<< 1, 1 >>>> ();
  v = 2; // RACE CONDITION
  cudaDeviceSynchronize();
}

在子kernel launch之后,显示同步之前,parent grid不能对 child grid读取的内存做写入操作,否则会造成race condition。

向Child grids传递指针

指针的传递存在限制:

  • 可以传递的指针:global memory(包括__device__变量和malloc分配的内存),zero-copy host端内存,常量内存。
  • 不可以传递的指针:shared_memory(__shared__变量), local memory(包括stack变量)

Device Streams和Events

所有在device上创建的streams都是non-blocking的,不支持默认NULL stream的隐式同步。创建流的方式如下

cudaStream_t s;
cudaStreamCreateWithFlags(&s, cudaStreamNonBlocking);

一旦一个device stream被创建,它能被一个线程块中其他线程使用。只有当这个线程块完成执行的时候,这个stream才能被其他线程块或者host使用。反之亦然。
Event也是支持的,不过有限制,只支持在不同stream之间使用cudaStreamWaitEvent()指定执行顺序,而不能使用event来计时或者同步。

Recursion Depth和Device Limits

递归深度包括两个概念:

  • nesting depth:递归grids的最大嵌套层次,host端的为0;
  • synchronization depth:cudaDeviceSynchronize()能调用的最大嵌套层次,host端为1,cudaLimitDevRuntimeSyncDepth应该设定为maximum 所以你吃肉你咋体on depth加1,设定方式如 cudaDeviceLimit(cudaLimitDevRuntimeSyncDepth, 4).

maximum nesting depth有硬件限制,在CC3.5中, 对depth 的限制为24. synchronization depth也一样。
从外到内,直到最大同步深度,每一次层会保留一部分内存来保存父block的上下文数据,即使这些内存没有被使用。所以递归深度的设定需要考虑到每一层所预留的内存。
另外还有一个限制是待处理的子grid数量。pending launch buffer用来维持launch queue和追踪当前执行kernel的状态。通过

cudaDeviceSetLimit(cudaLimitDevRuntimePendingLaunchCount, 32768);

来设定合适的限制。否则通过cudaGetLastError()调用可以返回CudaErrorLaunchPendingCountExceeded的错误。
动态并行化执行有点类似树的结构,但与CPU上树处理也有些不同。类似深度小,分支多,比较茂密的树的执行结构,比较适合动态并行化的处理。深度大,每层节点少的树的执行结构,则不适合动态并行化。

characteristic tree processing dynamic parallelism
node thin (1 thread) thick (many threads)
branch degree small (usually < 10) large (usually > 100)
depth large small

参考:http://devblogs.nvidia.com/parallelforall/cuda-dynamic-parallelism-api-principles/

cuda 同步与计时

同步block

_syncthreads()

同步kernel

cudaDeviceSynchronize()
waits until all preceding commands in all streams of all host threads have completed.

同步stream

cudaStreamSynchronize()
takes a stream as a parameter and waits until all preceding commands in the given stream have completed. It can be used to synchronize the host with a specific stream, allowing other streams to continue executing on the device.

Although CUDA kernel launches are asynchronous, all GPU-related tasks placed in one stream (which is default behaviour) are executed sequentially.
如果在kernel中使用printf,因为kernel调用是异步的,所以要使用DeviceSynchronize()进行同步,否则没有输出。
CUDA提供了两种对kernel进行同步的方式:

  • 使用cudaThreadSynchronize()进行显示同步,使主机进入阻塞状态,停止运行并等待所有已经提交的kernel执行完毕。
  • 利用cudaMemcpy()实现阻塞式数据传输,实际上内部调用了cudaThreadSynchronize()。

 
 

marvin安装与使用

 
cuDNN安装

cp lib* cudnn_dir/lib64/
cp cudnn.h cudnn_dir/include/
cd cudnn_dir
export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH

如出现error while loading shared libraries: libcudnn.so.4: cannot open shared object file: No such file or directory错误,是文件权限问题,可进行如下操作

cd cudnn_dir
rm -rf libcudnn.so libcudnn.so.4
chmod u=rwx,g=rx,o=rx libcudnn.so.4.0.4
ln -s libcudnn.so.4.0.4 libcudnn.so.4
ln -s libcudnn.so.4 libcudnn.so

 
marvin依赖cuda 7.5和cuDNN 4rc
curl -L https://github.com/PrincetonVision/marvin/tarball/master | tar zx
mv PrincetonVision* marvin && cd marvin
./compile.sh

c++模版中的dependent type和typename

Qualified and unqualified names

A qualified name is one that specifies a scope. For instance, in the following C++ program, the references to cout and endl are qualified names:

#include <iostream>
int main()  {
   std::cout << "Hello world!" << std::endl;
}

In both cases, the use of cout and endl began with std::.

Dependent and non-dependent names

A dependent name is a name that depends on a template parameter. Suppose we have the following declaration (not legal C++):

template <class T>
class MyClass {
   int i;
   vector<int> vi;
   vector<int>::iterator vitr;
   T t;
   vector<T> vt;
   vector<T>::iterator viter;
};

The types of the first three declarations are known at the time of the template declaration. However, the types of the second set of three declarations are not known until the point of instantiation, because they depend on the template parameter T.
The names T, vector<T>, and vector<T>::iterator are called dependent names, and the types they name are dependent types. The names used in the first three declarations are called non-dependent names, at the types are non-dependent types.
如下面一段代码中,const_iterator是从属类型,需要在它前面加上typename。否则,在某些情况下,会导致编译解析时产生二义性。

template<typename C>
bool lastGreaterThanFirst(const C& container){
	if(container.empty())
		return false;
	typename C::const_iterator begin(container.begin());
	typename C::const_iterator end(container.end());
	return *--end > *begin;
}

下面进行详细解释

template <class T>
void foo() {
   T::iterator * iter;
   ...
}

如果定义一个嵌套类型的类,

class ContainsAType {
   class iterator { ... }:
   ...
};

foo<ContainsAType>();  在这种情况下,iter将会被声明为一个指向T::iterator 类型的指针变量。
但是如果有人按以下方式声明类,

class ContainsAValue {
   static int iterator;
};

foo<ContainsAValue>(); 在这种情况下,将会有两种解析结果:一个叫做iter的变量,或者静态变量T::iterator。只有在实例化后才能消除他们之间的歧义。
Before a qualified dependent type, you need typename. Without typename, there is a C++ parsing rule that says that qualified dependent names should be parsed as non-types even if it leads to a syntax error. 
头疼,先mark下来。。
参考:http://pages.cs.wisc.edu/~driscoll/typename.html
 
 

STL系列——list

template < class T, class Alloc = allocator<T> > class list;
list是一个序列容器,是以双端链表的形式实现的。list相对于其他序列容器,在任意位置的删除、移动、提取元素的性能更好,但是在随机访问上的性能表现比较差。

Member functions

default (1)	list();
                explicit list (const allocator_type& alloc);
fill (2)	explicit list (size_type n, const allocator_type& alloc = allocator_type());
                list (size_type n, const value_type& val,const allocator_type& alloc = allocator_type());
range (3)	template <class InputIterator>
                    list (InputIterator first, InputIterator last,const allocator_type& alloc = allocator_type());
copy (4)	list (const list& x);
                list (const list& x, const allocator_type& alloc);
move (5)	list (list&& x);
                list (list&& x, const allocator_type& alloc);
initializer list (6)
                list (initializer_list<value_type> il,const allocator_type& alloc = allocator_type());
copy (1)	list& operator= (const list& x);
move (2)	list& operator= (list&& x);
initializer list (3)
                list& operator= (initializer_list<value_type> il);

用法:

std::list<int> first (3);      // list of 3 zero-initialized ints
std::list<int> second (5);     // list of 5 zero-initialized ints
second = first;
first = std::list<int>();

Iterators

begin(), end(), rbegin(), rend(), cbegin(), cend(), crbegin(), crend()
当容器为空时,begin() 返回的迭代器不能解引用,end( )和begin( )返回一样。

Capacity

bool empty() const noexcept;              // 判断是否为空
size_type size() const noexcept;          // list中元素个数
size_type max_size() const noexcept;

Element access

std::list::front( )

 reference front();
const_reference front() const;

std::list::back( )

reference back();
const_reference back() const;

Modifiers

 

STL系列——vector

 
template < class T, class Alloc = allocator<T> > class vector; // generic template
vector用来表示数组,是大小可以动态改变的序列容器。vector内部使用动态分配的数组来存储元素。当元素增加需要分配更多空间的时候,需要将原始数组中内容拷贝到新的数组,然后销毁原始数组。z再分配是一个比较耗时的过程。为了内存使用和再分配之间的平衡,数组的长度的增长,不同的库有b不同的实现策略。
如gcc(这里使用版本是 gcc 5.2), 第一次分配时,capacity() == size()。如果后面增加内容,假设增加长度len, 原始vector长度为size( ),
若len > size(), 重新分配空间,增长后,capacity() = len + size();
若 len <= size( ) 并且 len + size()> capacity(),重新分配空间,增长后,capacity() = size()*2 ;
若len <= size( ) 并且 len+ size( ) <= capacity( ),不重新分配。

构造函数

default (1)   vector();
              explicit vector (const allocator_type& alloc);
fill (2)      explicit vector (size_type n, const allocator_type& alloc = allocator_type());
              vector (size_type n, const value_type& val,
                        const allocator_type& alloc = allocator_type());
range (3)     template <class InputIterator>
                    vector (InputIterator first, InputIterator last,
                                const allocator_type& alloc = allocator_type());
copy (4)      vector (const vector& x);
              vector (const vector& x, const allocator_type& alloc);
move (5) vector (vector&& x);
vector (vector&& x, const allocator_type& alloc);
initializer list (6) vector (initializer_list<value_type> il,
       const allocator_type& alloc = allocator_type());

几种使用方法

// constructors used in the same order as described above:
std::vector<int> first;                                // empty vector of ints
std::vector<int> second (4,100);                       // four ints with value 100
std::vector<int> third (second.begin(),second.end());  // iterating through second
std::vector<int> fourth (third);                       // a copy of third
// the iterator constructor can also be used to construct from arrays:
int myints[] = {16,2,77,29};
std::vector<int> fifth (myints, myints + sizeof(myints) / sizeof(int) );

赋值构造函数

copy (1)               vector& operator= (const vector& x);
move (2)               vector& operator= (vector&& x);
initializer list (3)   vector& operator= (initializer_list<value_type> il);

迭代器

begin(), end(), rbegin(), rend(), cbegin(), cend(), crbegin(), crend()
当容器为空时,begin() 返回的迭代器不能解引用。(string为空时,begin() 可以解引用)

Capacity

size_type size() const noexcept;
size_type max_size() const;           // return the maximum number of elements that the vector can holds. due to known systems or library implementation limitations.

void resize (size_type n);
void resize (size_type n, const value_type& val);

如果n小于当前容器size(), 只保留开头的n个元素;如果n大于容器size(), 通过插入进行扩展。同时如果n 大于当前capacity(), 一个自动空间重分配操作会执行。
size_type capacity() const noexcept;       // 返回当前vector分配的存储空间大小(expressed in terms of elements)
bool empty() const noexcept;
void reserve (size_type n);         // request a change in capacity   当n大于当前capacity时,会导致容器重分配空间;否则没有影响。
void shrink_to_fit();    // 要求容器减小他的capacity到适合他的size,没有强制要求,具体示实现。可能会导致reallocation,但是不会修改vector的size或者内容。

Element Access

// Returns a reference to the element at position n in the vector container.
 reference operator[] (size_type n);
const_reference operator[] (size_type n) const;
// Access element
// Returns a reference to the element at position n in the vector.
reference at (size_type n);
const_reference at (size_type n) const;
//Access first element
//Returns a reference to the first element in the vector.
reference front();
const_reference front() const;
//Access last element
//Returns a reference to the last element in the vector.
 reference back();
const_reference back() const;
// Returns a direct pointer to the memory array used internally by the vector to store its owned elements.
 value_type* data() noexcept;
const value_type* data() const noexcept;

Modifier

range (1)        template <class InputIterator>
                     void assign (InputIterator first, InputIterator last);
fill (2)         void assign (size_type n, const value_type& val);
initializer list (3)  void assign (initializer_list<value_type> il);

range适用于迭代器或者数组
push_back( )、pop_back( )

single element (1)    iterator insert (const_iterator position, const value_type& val);
fill (2)              iterator insert (const_iterator position, size_type n, const value_type& val);
range (3)             template <class InputIterator>
                           iterator insert (const_iterator position, InputIterator first, InputIterator last);
move (4)              iterator insert (const_iterator position, value_type&& val);
initializer list (5)  iterator insert (const_iterator position, initializer_list<value_type> il);
iterator erase (const_iterator position);
iterator erase (const_iterator first, const_iterator last);

删除指定位置的元素,或者一段区间的元素
返回值为指向删除元素后第一个元素的iterator

void swap (vector& x);

与另一个有相同类型(T, Alloc)的容器的内容进行交换。 包括内容、capacity。其实就是相当于两个指针换了指向的位置,对应的属性(data、size、capacity等)也改变了。

void clear() noexcept;

清除vector里面的内容,将容器大小减小到0。不保证有内存的重新分配。

vector<int>( ).swap(x)

可以强制清空vector x,并重分配空间

template <class... Args>
iterator emplace (const_iterator position, Args&&... args);

C++11新特性,Construct and insert element。

The container is extended by inserting a new element at position. This new element is constructed in place using args as the arguments for its construction.
This effectively increases the container size by one.
template <class... Args>
  void emplace_back (Args&&... args);

Construct and insert element at the end

Allocator

allocator_type get_allocator() const noexcept;
// Returns a copy of the allocator object associated with the vector.

例子

// vector::get_allocator
#include <iostream>
#include <vector>
int main ()
{
  std::vector<int> myvector;
  int * p;
  unsigned int i;
  // allocate an array with space for 5 elements using vector's allocator:
  p = myvector.get_allocator().allocate(5);
  // construct values in-place on the array:
  for (i=0; i<5; i++) myvector.get_allocator().construct(&p[i],i);
  std::cout << "The allocated array contains:";
  for (i=0; i<5; i++) std::cout << ' ' << p[i];
  std::cout << '\n';
  // destroy and deallocate:
  for (i=0; i<5; i++) myvector.get_allocator().destroy(&p[i]);
  myvector.get_allocator().deallocate(p,5);
  return 0;
}

关系操作符

(1) template <class T, class Alloc>
  bool operator== (const vector<T,Alloc>& lhs, const vector<T,Alloc>& rhs);
(2) template <class T, class Alloc>
  bool operator!= (const vector<T,Alloc>& lhs, const vector<T,Alloc>& rhs);
(3) template <class T, class Alloc>
  bool operator<  (const vector<T,Alloc>& lhs, const vector<T,Alloc>& rhs);
(4) template <class T, class Alloc>
  bool operator<= (const vector<T,Alloc>& lhs, const vector<T,Alloc>& rhs);
(5) template <class T, class Alloc>
  bool operator>  (const vector<T,Alloc>& lhs, const vector<T,Alloc>& rhs);
(6) template <class T, class Alloc>
  bool operator>= (const vector<T,Alloc>& lhs, const vector<T,Alloc>& rhs);

== 操作首先比较size,如果相同,然后使用operator==顺序比较,直到第一个不匹配的
<操作则使用operator<进行顺序比较
其他比较操作内部使用 operator == 或者 operator < 操作进行比较

template <class T, class Alloc>
  void swap (vector<T,Alloc>& x, vector<T,Alloc>& y);

交换操作,类似于x.swap(y)

模板特化

template < class T, class Alloc = allocator<T> > class vector; // generic template
template <class Alloc> class vector<bool,Alloc>;               // bool specialization

 

STL系列——string

 
说到string,不得不先介绍basic_string。basic_string是各种字符类型string的泛化。

std::basic_string

template < class charT,
           class traits = char_traits<charT>,    // basic_string::traits_type
           class Alloc = allocator<charT>        // basic_string::allocator_type
           > class basic_string;
Generic string class
The basic_string is the generalization of class string for any character type (see string for a description).

Template parameters

charT
Character type.
The string is formed by a sequence of characters of this type.
This shall be a non-array POD type.
traits
Character traits class that defines essential properties of the characters used by basic_string objects (see char_traits).
traits::char_type shall be the same as charT.
Aliased as member type basic_string::traits_type.
Alloc
Type of the allocator object used to define the storage allocation model. By default, the allocator class template is used, which defines the simplest memory allocation model and is value-independent.
Aliased as member type basic_string::allocator_type.

Note: Because the first template parameter is not aliased as any member type, charT is used throughout this reference to refer to this type.

Template instantiations

      u16string  String of 16-bit characters (class )
      u32string   String of 32-bit characters (class )
string是basic_string类模板的一个实例化,使用char类型,同时使用basic_string默认的char_trailts和allocator类型。

构造函数

default (1)        string();
copy (2)           string (const string& str);
substring (3)      string (const string& str, size_t pos, size_t len = npos);
from c-string (4)  string (const char* s);
from sequence (5)  string (const char* s, size_t n);
fill (6)           string (size_t n, char c);
range (7)          template <class InputIterator>
                   string  (InputIterator first, InputIterator last);
initializer list (8) string (initializer_list<char> il);
move (9)             string (string&& str) noexcept;

使用介绍

std::string s0 ("Initial string");
  // constructors used in the same order as described above:
  std::string s1;
  std::string s2 (s0);
  std::string s3 (s0, 8, 3);                       // from pos = 8, len = 3
  std::string s4 ("A character sequence", 6);      // first n characters pointed by s
  std::string s5 ("Another character sequence");   // pointed by s
  std::string s6a (10, 'x');                   // Fills the string with n consecutive copies of character c.
  std::string s6b (10, 42);      // 42 is the ASCII code for '*'
  std::string s7 (s0.begin(), s0.begin()+7);

赋值运算符 std::string::operator=

string (1)           string& operator= (const string& str);
c-string (2)         string& operator= (const char* s);
character (3)        string& operator= (char c);
initializer list (4) string& operator= (initializer_list<char> il);
move (5)             string& operator= (string&& str) noexcept;

使用介绍

str1 = "Test string: ";   // c-string
str2 = 'x';               // single character
str3 = str1 + str2;       // string

迭代器

begin()、end()       an iterator or a const_iterator(if string object is const)
rbegin()、rend()     a reverse_iterator or a const_reverse_iterator
cbegin()、cend()     const_iterator
crbegin()、crend()

Capacity

size_t size() const noexcept;
size_t length() const noexcept;
// size 和 length相同,返回string实际字节数,不一定和capacity不同
void resize (size_t n);
void resize (size_t n, char c);     // 调整string的长度到length,如果n < length截短;如果n > length,用字符c进行填充,c没有指定,填充空字符
size_t max_size() const noexcept;  // string能够到达的最大长度,与操作系统或者库的实现有关
size_t capacity() const noexcept;  // 当前给string分配的空间,当前空间用完时,自动扩展,理论最大长度为max_size();任意时刻都能修改;可以被reserve()显式地修改
void reserve (size_t n = 0);       //如果n>capacity(),增加capacity()到n;其他情形不强制约定;reserver不改变string内容和length
void clear() noexcept;              // 清空string中内容
bool empty() const noexcept;        // 判断string内容是否为空
void shrink_to_fit();                // C++11, 减少capacity到string的size;不是强制的,具体试容器实现;不改变string内容和length

元素访问

      char& operator[] (size_t pos);
const char& operator[] (size_t pos) const;
// 返回pos位置指向字符的引用,如果pos == length,返回空字符引用 如果string是常量,返回 const char&,否则返回char&
char& at (size_t pos);
const char& at (size_t pos) const;  // 类似于[],不过这里可以检测pos是否小于length,否则抛出异常out_of_range()
char& back();
const char& back() const;  // 返回string中最后一个字符的引用,string不能为空,否则抛出异常
 char& front();
const char& front() const;  // 返回string中第一个字符的引用,string不能为空,否则抛出异常

Modifiers

// std::string::operator+=
// Append to string
// return value: *this
string (1)           string& operator+= (const string& str);
c-string (2)         string& operator+= (const char* s);
character (3)        string& operator+= (char c);
initializer list (4) string& operator+= (initializer_list<char> il);
// Extends the string by appending additional characters at the end of its current value

 

// std::string::append
// Append to string. Extends the string by appending additional characters at the end of its current value
string (1)        string& append (const string& str);
substring (2)     string& append (const string& str, size_t subpos, size_t sublen = npos);
c-string (3)      string& append (const char* s);
buffer (4)        string& append (const char* s, size_t n);
fill (5)          string& append (size_t n, char c);
range (6)         template <class InputIterator>
                      string& append (InputIterator first, InputIterator last);
initializer list(7) string& append (initializer_list<char> il);
void push_back (char c);
// Appends character c to the end of the string, increasing its length by one.
// string::push_back
#include <iostream>
#include <fstream>
#include <string>
int main ()
{
  std::string str;
  std::ifstream file ("test.txt",std::ios::in);
  if (file) {
    while (!file.eof()) str.push_back(file.get());
  }
  std::cout << str << '\n';
  return 0;
}
// return value: *this
string (1)        string& assign (const string& str);
substring (2)     string& assign (const string& str, size_t subpos, size_t sublen = npos);
c-string (3       string& assign (const char* s);
buffer (4)        string& assign (const char* s, size_t n);
fill (5)          string& assign (size_t n, char c);
range (6)         template <class InputIterator>
                     string& assign (InputIterator first, InputIterator last);
initializer list(7) string& assign (initializer_list<char> il);
move (8)            string& assign (string&& str) noexcept;
// Inserts additional characters into the string right before the character indicated by pos (or p):
// return value
// The signatures returning a reference to string, return *this.
// Those returning an iterator, return an iterator pointing to the first character inserted.
string (1)          string& insert (size_t pos, const string& str);
substring (2)       string& insert (size_t pos, const string& str, size_t subpos, size_t sublen = npos);
c-string (3)        string& insert (size_t pos, const char* s);
buffer (4)          string& insert (size_t pos, const char* s, size_t n);
fill (5)            string& insert (size_t pos,   size_t n, char c);
                    iterator insert (const_iterator p, size_t n, char c);
single character (6) iterator insert (const_iterator p, char c);
range (7)            template <class InputIterator>
                         iterator insert (iterator p, InputIterator first, InputIterator last);
initializer list (8)  string& insert (const_iterator p, initializer_list<char> il);
// Erase characters from string
// Erases part of the string, reducing its length:
// return value:
// The sequence version (1) returns *this.
// The others return an iterator referring to the character that now occupies the position of the first character erased, or string::end if no such character exists.
sequence (1)  string& erase (size_t pos = 0, size_t len = npos);
character (2) iterator erase (const_iterator p);
range (3)     iterator erase (const_iterator first, const_iterator last);
string (1)    string& replace (size_t pos,        size_t len,        const string& str);
              string& replace (const_iterator i1, const_iterator i2, const string& str);
substring (2) string& replace (size_t pos,        size_t len,        const string& str,
                 size_t subpos, size_t sublen);
c-string (3)  string& replace (size_t pos,        size_t len,        const char* s);
              string& replace (const_iterator i1, const_iterator i2, const char* s);
buffer (4)    string& replace (size_t pos,        size_t len,        const char* s, size_t n);
              string& replace (const_iterator i1, const_iterator i2, const char* s, size_t n);
fill (5)      string& replace (size_t pos,        size_t len,        size_t n, char c);
              string& replace (const_iterator i1, const_iterator i2, size_t n, char c);
range (6)     template <class InputIterator>
                   string& replace (const_iterator i1, const_iterator i2,
                   InputIterator first, InputIterator last);
initializer list (7) string& replace (const_iterator i1, const_iterator i2, initializer_list<char> il);
// Swap string values
// Exchanges the content of the container by the content of str, which is another string object. Lengths may differ.
// After the call to this member function, the value of this object is the value str had before the call, and the value of str is the value this object had before the call.
void swap (string& str);
// Delete last character
// Erases the last character of the string, effectively reducing its length by one.
void pop_back();

String operations

const char* c_str() const noexcept;     返回C-string类型的字符串,和string::data返回同样的值
allocator_type get_allocator() const noexcept;         Get allocator. Returns a copy of the allocator object associated with the string.

size_t copy (char* s, size_t len, size_t pos = 0) const;

copy将string从pos位置起,长度为len的子字符串,拷贝到s中;返回拷贝的字符数;注意这里不会自动在拷贝的内容后面添加结束符。

// std::string::find
// Searches the string for the first occurrence of the sequence specified by its argument
// return value:
//The position of the first character of the first match.
//If no matches were found, the function returns string::npos.
string (1)    size_t find (const string& str, size_t pos = 0) const noexcept;
c-string (2)  size_t find (const char* s, size_t pos = 0) const;
buffer (3)    size_t find (const char* s, size_t pos, size_type n) const;
character (4) size_t find (char c, size_t pos = 0) const noexcept;
// Searches the string for the first character that matches any of the characters specified in its arguments.
string (1)     size_t find_first_of (const string& str, size_t pos = 0) const noexcept;
c-string (2)   size_t find_first_of (const char* s, size_t pos = 0) const;
buffer (3)     size_t find_first_of (const char* s, size_t pos, size_t n) const;
character (4)  size_t find_first_of (char c, size_t pos = 0) const noexcept;

rfind() 查找最后一个匹配的位置。 find_first_of() 只需要匹配指定制定字符或者字符串中任意一个字符就行,而find()必须匹配整个给定的内容。
与find_first_of( )相反,find_last_of( ) 找最后一个匹配。当pos指定时,只找pos之前的,pos默认为string::npos
find_first_not_of寻找第一个不与指定字符或者字符串中任一字符匹配的位置。  find_last_not_of( ) 与find_last_of( )类似。
string substr (size_t pos = 0, size_t len = npos) const;

string (1)     int compare (const string& str) const noexcept;
substrings (2) int compare (size_t pos, size_t len, const string& str) const;
               int compare (size_t pos, size_t len, const string& str,
                   size_t subpos, size_t sublen = npos) const;
c-string (3)   int compare (const char* s) const;
               int compare (size_t pos, size_t len, const char* s) const;
buffer (4)     int compare (size_t pos, size_t len, const char* s, size_t n) const;

compare返回值0. >0, <0
string::npos是一个静态成员常量值,值定义为-1,是无符号类型size_t最大的可能的代表值。

非成员函数重载

void swap (string& x, string& y);
void swap (string& x, string& y);
string (1)     string operator+ (const string& lhs, const string& rhs);
c-string (2)   string operator+ (const string& lhs, const char*   rhs);
               string operator+ (const char*   lhs, const string& rhs);
character (3)  string operator+ (const string& lhs, char          rhs);
               string operator+ (char          lhs, const string& rhs);
istream& operator>> (istream& is, string& str);   // 从输入流中提取字符串存储到string中,使用空格作为分隔符
ostream& operator<< (ostream& os, const string& str);
istream& getline (istream&  is, string& str, char delim);
istream& getline (istream&& is, string& str, char delim);
istream& getline (istream&  is, string& str);
istream& getline (istream&& is, string& str);
//Get line from stream into string
Extracts characters from is and stores them into str until the delimitation character delim is found (or the newline character, '\n', for (2)).
The extraction also stops if the end of file is reached in is or if some other error occurs during the input operation.
If the delimiter is found, it is extracted and discarded, i.e. it is not stored and the next input operation will begin after it.

 
 

stl 中 next_permutation、prev_permutation的原理和使用

next_permutation实现原理
在《STL源码解析》中找到了这个函数,在此也简单叙述一下原理:
在STL中,除了next_permutation外,还有一个函数prev_permutation,两者都是用来计算排列组合的函数。前者是求出下一个排列组合,而后者是求出上一个排列组合。所谓“下一个”和“上一个”,书中举了一个简单的例子:对序列 {a, b, c},每一个元素都比后面的小,按照字典序列,固定a之后,a比bc都小,c比b大,它的下一个序列即为{a, c, b},而{a, c, b}的上一个序列即为{a, b, c},同理可以推出所有的六个序列为:{a, b, c}、{a, c, b}、{b, a, c}、{b, c, a}、{c, a, b}、{c, b, a},其中{a, b, c}没有上一个元素,{c, b, a}没有下一个元素。
next_permutation的函数原型如下:

template<class BidirectionalIterator>
bool next_permutation(
      BidirectionalIterator _First,
      BidirectionalIterator _Last
);
template<class BidirectionalIterator, class BinaryPredicate>
bool next_permutation(
      BidirectionalIterator _First,
      BidirectionalIterator _Last,
      BinaryPredicate _Comp
 );

对于第二个重载函数的第三个参数,默认比较顺序为小于。如果找到下一个序列,则返回真,否则返回假。
函数实现原理如下:
在当前序列中,从尾端往前寻找两个相邻元素,前一个记为*i,后一个记为*ii,并且满足*i < *ii。然后再从尾端寻找另一个元素*j,如果满足*i < *j,即将第i个元素与第j个元素对调,并将第ii个元素之后(包括ii)的所有元素颠倒排序,即求出下一个序列了。
代码实现如下:

template<class BidirectionalIterator>
bool next_permutation(
      BidirectionalIterator first,
      BidirectionalIterator last
)
{
    if(first == last)
        return false; //空序列
    BidirectionalIterator i = first;
    ++i;
    if(i == last)
        return false;  //一个元素,没有下一个序列了
    i = last;
    --i;
    for(;;) {
        BidirectionalIterator ii = i;
        --i;
        if(*i < *ii) {
            BidirectionalIterator j = lase;
            while(!(*i < *--j));
            iter_swap(i, j);
            reverse(ii, last);
            return true;
        }
        if(i == first) {
            reverse(first, last);  //全逆向,即为最小字典序列,如cba变为abc
            return false;
        }
    }
}

prev_permutation实现类似,就是反向查找。
例子:

#include<iostream>
#include<string>
#include<iostream>
#include<algorithm>
using namespace std;
int main(){
	    string str = "abcd";
	    do{
		        cout<<str<<endl;
    	}while(next_permutation(str.begin(), str.end()));
	    cout<<"after loop:\n"<<str<<endl;
    	return 0;
}

输出:

[zhangjun@s3 leetcode]$ ./strtest
abcd
abdc
acbd
acdb
adbc
adcb
bacd
badc
bcad
bcda
bdac
bdca
cabd
cadb
cbad
cbda
cdab
cdba
dabc
dacb
dbac
dbca
dcab
dcba
after loop:
abcd

 
最后一个排列没有下一个排列,用next_permutation会返回false,但使用这个方法后,序列会变成字典序列的第一个。如dcba变成abcd。