统计一行字符中有多少个单词

思路:如果一个字符是字母,而它之前字符不是字母,则表示新单词开始,此时单词计数加1;如果一个字符是字母,并且它之前的字符也是字母,则表示还是原来单词,单词计数不变。 使用一个标签,表示当前字符是否处于一个单词中。

#include<stdio.h>
#include<iostream>
#include<string.h>
int wordcount(char data[], int size)
{
	if(size <= 1)
		return 0;
	int i = 0, word = 0, count = 0;
	while( i < size)
	{
		while( i < size && (data[i] < 'A' ||(data[i] > 'Z'&&data[i] < 'a')||data[i] > 'z'))
			i++;
		while( i < size && ((data[i] >= 'A'&&data[i]<= 'Z')||(data[i] >= 'a'&&data[i]<= 'z')))
		{
			i ++;
			if(word == 0)
			{
				word = 1;
				count ++;
			}
		}
		word = 0;
	}
	return count;
}
int main()
{
	char data[100];
	gets(data);
	std::cout<<wordcount(data, strlen(data))<<std::endl;
}

 

一致性hash C++实现

http://martinbroadhurst.com/Consistent-Hash-Ring.html

Consistent Hash Ring

Introduction

Consistent hashing was first described in a paper, Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web (1997) by David Karger et al. It is used in distributed storage systems like Amazon Dynamo, memcached, Project Voldemort and Riak.

The problem

Consistent hashing is a very simple solution to a common problem: how can you find a server in a distributed system to store or retrieve a value identified by a key, while at the same time being able to cope with server failures and network partitions?
Simply finding a server for value is easy; just number your set of s servers from 0 to s – 1. When you want to store or retrieve a value, hash the value’s key modulo s, and that gives you the server.
The problem comes when servers fail or become unreachable through a network partition. At that point, the servers no longer fill the hash space, so the only option is to invalidate the caches on all servers, renumber them, and start again. Given that, in a system with hundreds or thousands of servers, failures are commonplace, this solution is not feasible.

The solution

In consistent hashing, the servers, as well as the keys, are hashed, and it is by this hash that they are looked up. The hash space is large, and is treated as if it wraps around to form a circle – hence hash ring. The process of creating a hash for each server is equivalent to placing it at a point on the circumference of this circle. When a key needs to be looked up, it is hashed, which again corresponds to a point on the circle. In order to find its server, one then simply moves round the circle clockwise from this point until the next server is found. If no server is found from that point to end of the hash space, the first server is used – this is the “wrapping round” that makes the hash space circular.
The only remaining problem is that in practice hashing algorithms are likely to result in clusters of servers on the ring (or, to be more precise, some servers with a disproportionately large space before them), and this will result in greater load on the first server in the cluster and less on the remainder. This can be ameliorated by adding each server to the ring a number of times in different places. This is achieved by having a replica count, which applies to all servers in the ring, and when adding a server, looping from 0 to the count – 1, and hashing a string made from both the server and the loop variable to produce the position. This has the effect of distributing the servers more evenly over the ring. Note that this has nothing to do withserver replication; each of the replicas represents the same physical server, and replication of data between servers is an entirely unrelated issue.

Implementation

I’ve written an example implementation of consistent hashing in C++. As you can imagine from the description above, it isn’t terribly complicated. Here is the main class:

template <class Node, class Data, class Hash = HASH_NAMESPACE::hash<const char*> >
class HashRing
{
public:
   typedef std::map<size_t, Node> NodeMap;
   HashRing(unsigned int replicas)
     : replicas_(replicas), hash_(HASH_NAMESPACE::hash<const char*>())
   {
   }
   HashRing(unsigned int replicas, const Hash& hash)
     : replicas_(replicas), hash_(hash)
   {
   }
   size_t AddNode(const Node& node);
   void RemoveNode(const Node& node);
   const Node& GetNode(const Data& data) const;
private:
   NodeMap ring_;
   const unsigned int replicas_;
   Hash hash_;
};
template <class Node, class Data, class Hash>
size_t HashRing<Node, Data, Hash>::AddNode(const Node& node)
{
   size_t hash;
   std::string nodestr = Stringify(node);
   for (unsigned int r = 0; r < replicas_; r++) {
      hash = hash_((nodestr + Stringify(r)).c_str());
      ring_[hash] = node;
   }
   return hash;
}
template <class Node, class Data, class Hash>
void HashRing<Node, Data, Hash>::RemoveNode(const Node& node)
{
   std::string nodestr = Stringify(node);
   for (unsigned int r = 0; r < replicas_; r++) {
      size_t hash = hash_((nodestr + Stringify(r)).c_str());
      ring_.erase(hash);
   }
}
template <class Node, class Data, class Hash>
const Node& HashRing<Node, Data, Hash>::GetNode(const Data& data) const
{
   if (ring_.empty()) {
      throw EmptyRingException();
   }
   size_t hash = hash_(Stringify(data).c_str());
   typename NodeMap::const_iterator it;
   // Look for the first node >= hash
   it = ring_.lower_bound(hash);
   if (it == ring_.end()) {
      // Wrapped around; get the first node
      it = ring_.begin();
   }
   return it->second;
}

 

A few points to note:

  • The default hash function is hash from <map>.
    In practice you probably don’t want to use this. Something like MD5 would probably be best.
  • I had to define HASH_NAMESPACE because g++ puts the non-standard hash in a different namespace than that which other compilers do.
    Hopefully this will all be resolved with the widespread availablity of std::unordered_map.
  • The Node and Data types need to have operator << defined for a std::ostream.
    This is because I write them to an ostringstream in order to “stringify” them before getting the hash.

I’ve also written an example program that simulates using a cluster of cache servers to store and retrieve some data.

Source code

You can browse the source code and example program here:

Here is a compressed tar archive containing the source code, example program and makefile:

How to run Intel MPI on Xeon Phi

Overview

The Intel® MPI Library supports the Intel® Xeon Phi™ coprocessor in 3 major ways:

  • The offload model where all MPI ranks are run on the main Xeon host, and the application utilizes offload directives to run on the Intel Xeon Phi corpocessor card,
  • The native model where all MPI ranks are run on the Intel Xeon Phi coprocessor card, and
  • The symmetric model where MPI ranks are run on both the Xeon host and the Xeon Phi coprocessor card.

This article will focus on the native and symmetric models only. If you’d like more information on the offload model, this article gives a great overview and even more details are available in the Intel® Compiler documentation.

Prerequisites

The most important thing to remember is that we’re treating the Xeon Phi coprocessor cards as simply another node in a heterogeneous cluster. To that effect, running an MPI job in either the native and symmetric modes is very similar to running a regular Xeon MPI job. On the flip side, that does require some prerequisites to be fulfilled for each coprocessor card to be completely accessible via MPI.
Uniquely accessible hosts
All coprocessor cards on the system need to have a unique IP address that’s accessible from the local host, other Xeon hosts on the system, and other Xeon Phi cards attached to those hosts.  Again, think of simply adding another node to an existing cluster.  A very simple test of this will be the ability to ssh from one Xeon Phi coprocessor (let’s call it node0-mic0) to its own Xeon host (node0), as well as ssh to any other Xeon host on the cluster (node1) and their respective Xeon Phi cards (node1-mic0).  Here’s a quick example:

[user@node0-mic0 user]$ ssh node1-mic0 hostname
node1-mic0

Access to necessary libraries
Make sure all MPI libraries are accessible from the Xeon Phi card. There are a couple of ways to do this:

  • Setup an NFS share between the Xeon host where the Intel MPI Library is installed, and the Xeon Phi corprossesor card.
  • Manually copy all Xeon Phi-specific MPI libraries to the card.  More details on which libraries to copy and where are available here.

Assuming both of those requirements have been met, you’re ready to start using the Xeon Phi corprocessors in your MPI jobs.

Running natively on the Xeon Phi corprocessor

The set of steps to run on the Xeon Phi coprocessor card exclusively can be boiled down to the following:
1. Set up the environment
Use the appropriate scripts to set your runtime environment. The following assumes all Intel® Software Tools are installed in the /opt/intel directory.

# Set your compiler
[user@host] $ source /opt/intel/composer_xe_<version>/bin/compilervars.sh intel64
#Set your MPI environment
[user@host] $ source /opt/intel/impi/<version>/bin64/mpivars.sh

2. Compile for the Xeon Phi coprocessor card
Use the -mmic option for the Intel Compiler to build your MPI sources for the card.

[user@host] $ mpiicc -mmic -o test_hello.MIC test.c

3. Copy the Xeon Phi executables to the card
Transfer the executable that you just created to the card for execution.

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

This step is not required if your host and card are NFS-shared. Also note that we’re renaming this executable during the copy process. This helps us use the same mpirun command for both native and symmetric modes.
4. Launch the application
Simply use the mpirun command to start the executable remotely on the card. Note that if you’re planning on using a Xeon Phi coprocessor in your MPI job, you have to let us know by setting the I_MPI_MIC environment variable. This is a required step.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
node0-mic0
[user@host] $ mpirun –f mpi_hosts –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0-mic0
Hello world: rank 1 of 2 running on node0-mic0

Running symmetrically on both the Xeon host and the Xeon Phi coprocessor

You’re now trying to utilize both the Xeon hosts on your cluster, and the Xeon Phi coprocessor cards attached to them.
Step 1.
will be the same here
2. Compile for the Xeon Phi coprocessor card and for the Xeon host
You’re now going to have compile two different sets of binaries:

# for the Xeon Phi comprocessor
[user@host] $ mpiicc -mmic -o test_hello.MIC test.c
# for the Xeon host
[user@host] $ mpiicc -o test_hello test.c

3. Copy the Xeon Phi executables to the card
Here, we still have to transfer the Xeon Phi coprocessor-compiled executables to the card.  And again, we’re renaming the executable during the transfer:

[user@host] $ scp ./test_hello.MIC node0-mic0:~/test_hello

Now, this will not work if your $HOME directory (where the executables live) is NFS-shared between host and card.  For more tips on what to do in NFS-sharing cases, check out this article.
4. Launch the application
Finally, you run the MPI job.  Your only difference here would be edits in your hosts file as you now have to add the Xeon hosts to the list.

[user@host] $ export I_MPI_MIC=enable
[user@host] $ cat mpi_hosts
node0
node0-mic0
[user@host] $  mpirun –f mpi_hosts –perhost 1 –n 2 ~/test_hello
Hello world: rank 0 of 2 running on node0
Hello world: rank 1 of 2 running on node0-mic0

https://software.intel.com/en-us/articles/how-to-run-intel-mpi-on-xeon-phi
https://software.intel.com/en-us/articles/using-the-intel-mpi-library-on-intel-xeon-phi-coprocessor-systems
https://software.intel.com/en-us/articles/using-xeon-phi-prefixes-and-extensions-for-intel-mpi-jobs-in-nfs-shared-environment
http://www.hpc.mcgill.ca/index.php/81-doc-pages/256-using-xeon-phis-on-guillimin

tcpdump用法

第一种是关于类型的关键字,主要包括host,net,port, 例如 host 210.27.48.2,指明210.27.48.2是一台主机,net 202.0.0.0 指明 202.0.0.0是一个网络地址,port 23指明端口号是23。如果没有指定类型,缺省的类型是host.
第二种是确定传输方向的关键字,主要包括src , dst ,dst or src, dst and src, 这些关键字指明了传输的方向。举例说明,src 210.27.48.2 ,指明ip包中源地址是210.27.48.2 , dst net 202.0.0.0 指明目的网络地址是202.0.0.0 。如果没有指明方向关键字,则缺省是src or dst关键字。
第三种是协议的关键字,主要包括fddi,ip,arp,rarp,tcp,udp等类型。Fddi指明是在FDDI(分布式光纤数据接口网络)上的特定的网络协议,实际上它是”ether”的别名,fddi和ether具有类似的源地址和目的地址,所以可以将fddi协议包当作ether的包进行处理和分析。其他的几个关键字就是指明了监听的包的协议内容。如果没有指定任何协议,则tcpdump将会监听所有协议的信息包。
除了这三种类型的关键字之外,其他重要的关键字如下:gateway, broadcast,less,greater,还有三种逻辑运算,取非运算是 ‘not ‘ ‘! ‘, 与运算是’and’,’&&’;或运算是’or’ ,’││’;这些关键字可以组合起来构成强大的组合条件来满足人们的需要,下面举几个例子来说明。
普通情况下,直接启动tcpdump将监视第一个网络界面上所有流过的数据包。
# tcpdump
tcpdump: listening on fxp0
11:58:47.873028 202.102.245.40.netbios-ns > 202.102.245.127.netbios-ns: udp 50
11:58:47.974331 0:10:7b:8:3a:56 > 1:80:c2:0:0:0 802.1d ui/C len=43
0000 0000 0080 0000 1007 cf08 0900 0000
0e80 0000 902b 4695 0980 8701 0014 0002
000f 0000 902b 4695 0008 00
11:58:48.373134 0:0:e8:5b:6d:85 > Broadcast sap e0 ui/C len=97
ffff 0060 0004 ffff ffff ffff ffff ffff
0452 ffff ffff 0000 e85b 6d85 4008 0002
0640 4d41 5354 4552 5f57 4542 0000 0000
0000 00
使用-i参数指定tcpdump监听的网络界面,这在计算机具有多个网络界面时非常有用,
使用-c参数指定要监听的数据包数量,
使用-w参数指定将监听到的数据包写入文件中保存
A想要截获所有210.27.48.1 的主机收到的和发出的所有的数据包:
#tcpdump host 210.27.48.1
B想要截获主机210.27.48.1 和主机210.27.48.2 或210.27.48.3的通信,使用命令:(在命令行中适用   括号时,一定要#tcpdump host 210.27.48.1 and (210.27.48.2 or 210.27.48.3 )
C如果想要获取主机210.27.48.1除了和主机210.27.48.2之外所有主机通信的ip包,使用命令:
#tcpdump ip host 210.27.48.1 and ! 210.27.48.2
D如果想要获取主机210.27.48.1接收或发出的telnet包,使用如下命令:
#tcpdump tcp port 23 host 210.27.48.1
E 对本机的udp 123 端口进行监视 123 为ntp的服务端口
# tcpdump udp port 123
F 系统将只对名为hostname的主机的通信数据包进行监视。主机名可以是本地主机,也可以是网络的任何一台计算机。下面的命令可以读取主机hostname发送的所有数据:
#tcpdump -i eth0 src host hostname
G 下面的命令可以监视所有送到主机hostname的数据包:
#tcpdump -i eth0 dst host hostname
H 我们还可以监视通过指定网关的数据包:
#tcpdump -i eth0 gateway Gatewayname
I 如果你还想监视编址到指定端口的TCP或UDP数据包,那么执行以下命令:
#tcpdump -i eth0 host hostname and port 80
J 如果想要获取主机210.27.48.1除了和主机210.27.48.2之外所有主机通信的ip包,使用命令:
#tcpdump ip host 210.27.48.1 and ! 210.27.48.2
K 想要截获主机210.27.48.1 和主机210.27.48.2 或210.27.48.3的通信,使用命令:(在命令行中适用括号时,一定要#tcpdump host 210.27.48.1 and (210.27.48.2 or 210.27.48.3 )
L 如果想要获取主机210.27.48.1除了和主机210.27.48.2之外所有主机通信的ip包,使用命令:
#tcpdump ip host 210.27.48.1 and ! 210.27.48.2
M 如果想要获取主机210.27.48.1接收或发出的telnet包,使用如下命令:
#tcpdump tcp port 23 host 210.27.48.1
第三种是协议的关键字,主要包括fddi,ip ,arp,rarp,tcp,udp等类型除了这三种类型的关键字之外,其他重要的关键字如下:gateway, broadcast,less, greater,还有三种逻辑运算,取非运算是 ‘not ‘ ‘! ‘, 与运算是’and’,’&&’;或运算 是’or’ ,’||’;
第二种是确定传输方向的关键字,主要包括src , dst ,dst or src, dst and src ,如果我们只需要列出送到80端口的数据包,用dst port;如果我们只希望看到返回80端口的数据包,用src port。
#tcpdump –i eth0 host hostname and dst port 80 目的端口是80
或者
#tcpdump –i eth0 host hostname and src port 80 源端口是80 一般是提供http的服务的主机如果条件很多的话 要在条件之前加and 或 or 或 not
#tcpdump -i eth0 host ! 211.161.223.70 and ! 211.161.223.71 and dst port 80
如果在ethernet 使用混杂模式 系统的日志将会记录
May 7 20:03:46 localhost kernel: eth0: Promiscuous mode enabled.
May 7 20:03:46 localhost kernel: device eth0 entered promiscuous mode
May 7 20:03:57 localhost kernel: device eth0 left promiscuous mode
tcpdump 对截获的数据并没有进行彻底解码,数据包内的大部分内容是使用十六进制的形式直接打印输出的。显然这不利于分析网络故障,通常的解决办法是先使用带-w参数的tcpdump截获数据并保存到文件中,然后再使用其他程序进行解码分析。当然也应该定义过滤规则,以避免捕获的数据包填满整个硬盘。