精华内容
下载资源
问答
  • Table of Contents 1tcpserver 接收 SYN 概述 2tcpserver 接收 SYN处理过程 2.1tcp层的数据包输入接口 tcp_v4_rcv ...2.4.1 SYN请求队列(半连接)已满判断inet_csk_reqsk_queue_is_full 2.4.2 acc...

    Table of Contents

    1 tcp server 接收 SYN 概述

    2 tcp server 接收 SYN 处理过程

    2.1 tcp层的数据包输入接口 tcp_v4_rcv

    2.2 tcp_v4_do_rcv()

    2.3 tcp_rcv_state_process()

    2.4 tcp处理SYN请求接口 tcp_v4_conn_request()(核心)

    2.4.1 SYN请求队列(半连接)已满判断 inet_csk_reqsk_queue_is_full

    2.4.2 accept连接队列(全连接)已满判断 sk_acceptq_is_full

    2.5 连接请求块的分配和初始化 

    2.5.1 连接请求块的分配reqsk_alloc / inet_reqsk_alloc

    2.5.2 连接请求块的初始化

    2.6 将连接请求块加入SYN请求队列 inet_csk_reqsk_queue_hash_add


    1 tcp server 接收 SYN 概述

    1. 根据5元组信息查考从 tcp_hashinfo 查找本端套接字
    2. 判断本端套接字请求队列满,包括:半连接,全连接
    3. 发送 SYN+ACK 报文给客户端
    4. 将新建的请求套接字加入到 SYN 队列中,并启动 SYN+ACK 超时重传定时器(初始值为3s)

    注意:linux 内核协议栈在收到SYN包后,并不会将状态迁移至 SYN_RECV 状态,该状态是在收到客户端发来的ACK报文后才会新建一个sock,并将该sock的状态设置成 TCP_SYN_RECV ,在此之后调用 tcp_rcv_state_process 将状态迁移至TCP_ESTABLISHED

    2 tcp server 接收 SYN 处理过程

    tcp_v4_rcv
    --tcp_v4_do_rcv
        --tcp_rcv_state_process
    	    --tcp_v4_conn_request
    		    --inet_csk_reqsk_queue_is_full
    		    --sk_acceptq_is_full
    		    --inet_reqsk_alloc
    		    --tcp_v4_send_synack
    		    --inet_csk_reqsk_queue_hash_add

    2.1 tcp层的数据包输入接口 tcp_v4_rcv

    1. 校验tcp报文的合法性
    2. 根据5元组信息从 tcp_hashinfo 查找本端套接字
    3. 调用tcp_v4_do_rcv() 对数据包进行处理
    int tcp_v4_rcv(struct sk_buff *skb)
    {
    	struct tcphdr *th;
    	struct sock *sk;
    	int ret;
    
    	//获取TCP首部指针
    	th = tcp_hdr(skb);
            //获取IP首部指针
            iph = ip_hdr(skb);
    	//从TCP的哈希表中寻找应该由哪个套接字来处理这个数据段(根据输入数据段的tcp/ip头部信息)
    	//对于处理SYN请求段的场景,这里找到的就是监听套接字
    	sk = __inet_lookup(skb->dev->nd_net, &tcp_hashinfo, iph->saddr,
    			th->source, iph->daddr, th->dest, inet_iif(skb));
    	if (!sk)
    		goto no_tcp_socket;
    
    process:
    	//这里涉及TCP接收时为了性能考虑使用的三个队列,暂不关注,直接看tcp_v4_do_rcv()
    	if (!sock_owned_by_user(sk)) {
    		if (!tcp_prequeue(sk, skb))
    			//调用tcp_v4_do_rcv()对数据包进行处理
    			ret = tcp_v4_do_rcv(sk, skb);
    	} else
    		sk_add_backlog(sk, skb);
    	
    	bh_unlock_sock(sk);
    	sock_put(sk);
    	return ret;
    }
    

    2.2 tcp_v4_do_rcv()

    1. 调用 tcp_v4_hnd_req 查找请求套接字,没有找到,返回sk
    2. 调用 tcp_rcv_state_process 处理SYN请求报文
    int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
    {
    	struct sock *rsk;
    
    	if (sk->sk_state == TCP_LISTEN) {
    		//返回NULL:出错
    		//nsk == sk:没有找到新的TCB,所以收到的是第一次握手的SYN(这篇笔记就是这种情况)
    		//NSK != SK: 找到了新的TCB,所以收到的是第三次握手的ACK
    		struct sock *nsk = tcp_v4_hnd_req(sk, skb);
    		if (!nsk)
    			goto discard;
    		//ACK包由tcp_child_process处理
    		if (nsk != sk) {
    			if (tcp_child_process(sk, nsk, skb)) {
    				rsk = nsk;
    				goto reset;
    			}
    			return 0;
    		}
    	}
    
    	if (tcp_rcv_state_process(sk, skb, tcp_hdr(skb), skb->len)) {
    		//如果返回非0,表示收到了不期望的数据包,此时会向对端发送RST报文
    		rsk = sk;
    		goto reset;
    	}
    	return 0;
    }
    

    2.3 tcp_rcv_state_process()

    1. 调用 tcp_v4_conn_request 处理 SYN 连接请求
    /*
    sk: 接收该报文的TCP套接字
    skb:输入数据报文
    th:指向该报文的TCP头部指针
    len:数据报文长度
    */
    int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
    			  struct tcphdr *th, unsigned len)
    {
    	struct tcp_sock *tp = tcp_sk(sk);
    	struct inet_connection_sock *icsk = inet_csk(sk);
    	int queued = 0;
    
    	switch (sk->sk_state) {
    	case TCP_LISTEN:
    		//此函数只处理SYN报文段,如果ACK置为,说明收到的是非预期的报文,
    		//返回1会导致向对端回复RST报文
    		if (th->ack)
    			return 1;
    		//收到RST报文,只是忽略该报文
    		if (th->rst)
    			goto discard;
    		
    		if (th->syn) {
    			//收到了SYN报文,交由TCP的tcp_v4_conn_request()处理,
    			//该指针在传输控制块初始化过程中被指定,见tcp_v4_init_sock
    			if (icsk->icsk_af_ops->conn_request(sk, skb) < 0)
    				return 1;
    
    			/* Now we have several options: In theory there is
    			 * nothing else in the frame. KA9Q has an option to
    			 * send data with the syn, BSD accepts data with the
    			 * syn up to the [to be] advertised window and
    			 * Solaris 2.1 gives you a protocol error. For now
    			 * we just ignore it, that fits the spec precisely
    			 * and avoids incompatibilities. It would be nice in
    			 * future to drop through and process the data.
    			 *
    			 * Now that TTCP is starting to be used we ought to
    			 * queue this data.
    			 * But, this leaves one open to an easy denial of
    			 * service attack, and SYN cookies can't defend
    			 * against this problem. So, we drop the data
    			 * in the interest of security over speed unless
    			 * it's still in use.
    			 */
    			//上面是关于第一个SYN包是否可以携带数据的讨论,当期版本的实现是不允许其携带报文的
    			kfree_skb(skb);
    			return 0;
    		}
    		goto discard;
    	}
    }
    

    2.4 tcp处理SYN请求接口 tcp_v4_conn_request()(核心)

    该函数要做的最主要的事情就是创建连接请求套接字对象,即struct tcp_request_sock,然后将其加入到监听套接字的SYN请求队列(半连接队列 listen_sock.syn_table)中。总结下该函数的核心操作:

    1. 检查SYN请求队列和accept连接队列是否还允许接收该SYN请求,如果已经无法接收,那么丢弃该SYN请求段(这里不考虑syn_cookie),但是不会给客户端回RST,这样后续如果客户端重试并且服务器端队列有空余了,就可以继续处理该请求;
    2. 分配连接请求块struct tcp_request_sock对象;
    3. 解析处理SYN请求段中的TCP选项(暂不分析);
    4. 根据收到的选项初始化新分配的连接请求块;
    5. 生成SYN+ACK报文要携带的seq,即服务器端的初始序列号;
    6. 向客户段发送SYN+ACK报文(见《TCP之服务器端发送SYN+ACK报文》);
    7. 将连接请求块加入到监听套接字的SYN请求队列中并启动SYN+ACK超时定时器。
    int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
    {
    	struct inet_request_sock *ireq;
    	struct tcp_options_received tmp_opt;
    	struct request_sock *req;
    	//记录SYN请求段中的源和目的地址
    	__be32 saddr = ip_hdr(skb)->saddr;
    	__be32 daddr = ip_hdr(skb)->daddr;
    	__u32 isn = TCP_SKB_CB(skb)->when;
    	struct dst_entry *dst = NULL;
        
    	//SYN COOKIE技术相关内容,忽略
    #ifdef CONFIG_SYN_COOKIES
    	int want_cookie = 0;
    #else
    #define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
    #endif
    	//对于发送给广播和组播地址的SYN报文丢弃,TCP不支持广播,这里应该是出于可靠性的考虑
    	if (((struct rtable *)skb->dst)->rt_flags &
    	    (RTCF_BROADCAST | RTCF_MULTICAST))
    		goto drop;
    
    	//如果SYN请求队列已满,那么丢弃(不考虑SYN Cookie)请求,这种情况客户端会重传SYN请求
    	/* TW buckets are converted to open requests without
    	 * limitations, they conserve resources and peer is
    	 * evidently real one.
    	 */
    	//这里为什么要判断isn,不理解...
    	if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
    #ifdef CONFIG_SYN_COOKIES
    		if (sysctl_tcp_syncookies) {
    			want_cookie = 1;
    		} else
    #endif
    		goto drop;
    	}
    
    	//如果accept接收队列已满,并且SYN请求队列中至少有一个请求还没有重传过SYN+ACK包,则丢弃该新的SYN请求.
    	//个人理解这样设计的考虑是:因为SYN请求队列中有这种“年轻的SYN请求“,而且当前accept队列已满,那么这种
    	//年轻的SYN请求很可能很快就会完成三次握手,进而需要添加到accept队列中,所以此时如果接受该新的SYN请求,
    	//那么很可能会导致由于无法加入到accept队列而导致已经完成三次握手的TCP连接失败
    	/* Accept backlog is full. If we have already queued enough
    	 * of warm entries in syn queue, drop request. It is better than
    	 * clogging syn queue with openreqs with exponentially increasing
    	 * timeout.
    	 */
    	if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
    		goto drop;
    
    	//分配struct tcp_reqeust_sock对象,并将tcp_request_sock_ops赋值给其rsk_ops,
    	//后续连接建立过程中会调用该结构指定的函数,
    	req = reqsk_alloc(&tcp_request_sock_ops);
    	if (!req)
    		goto drop;
    
    #ifdef CONFIG_TCP_MD5SIG
    	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
    #endif
    
    	//解析SYN包携带的TCP选项,这里先不关注TCP选项相关内容
    	tcp_clear_options(&tmp_opt);
    	tmp_opt.mss_clamp = 536;
    	tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
    	tcp_parse_options(skb, &tmp_opt, 0);
    
    	//SYN Cookie相关,忽略
    	if (want_cookie) {
    		tcp_clear_options(&tmp_opt);
    		tmp_opt.saw_tstamp = 0;
    	}
    
    	//时间戳选项处理
    	if (tmp_opt.saw_tstamp && !tmp_opt.rcv_tsval) {
    		/* Some OSes (unknown ones, but I see them on web server, which
    		 * contains information interesting only for windows'
    		 * users) do not send their stamp in SYN. It is easy case.
    		 * We simply do not advertise TS support.
    		 */
    		tmp_opt.saw_tstamp = 0;
    		tmp_opt.tstamp_ok  = 0;
    	}
    	tmp_opt.tstamp_ok = tmp_opt.saw_tstamp;
    
    	//根据SYN请求段中的字段和选项来初始化连接请求块
    	tcp_openreq_init(req, &tmp_opt, skb);
    
    	if (security_inet_conn_request(sk, skb, req))
    		goto drop_and_free;
    
    	//记录该套接字的源地址和目的地址,这里的saddr和daddr分别是skb中的源IP和目的IP字段,所以相反赋值
    	ireq = inet_rsk(req);
    	ireq->loc_addr = daddr;
    	ireq->rmt_addr = saddr;
    	//将SYN请求段中的IP选项部分保存到连接请求块中
    	ireq->opt = tcp_v4_save_options(sk, skb);
    	if (!want_cookie)
    		TCP_ECN_create_request(req, tcp_hdr(skb));
    
    	//根据不同情况生成服务器端的初始发送序号
    	if (want_cookie) {
    #ifdef CONFIG_SYN_COOKIES
    		syn_flood_warning(skb);
    #endif
    		isn = cookie_v4_init_sequence(sk, skb, &req->mss);
    	} else if (!isn) {
    		struct inet_peer *peer = NULL;
    
    		/* VJ's idea. We save last timestamp seen
    		 * from the destination in peer table, when entering
    		 * state TIME-WAIT, and check against it before
    		 * accepting new connection request.
    		 *
    		 * If "isn" is not zero, this request hit alive
    		 * timewait bucket, so that all the necessary checks
    		 * are made in the function processing timewait state.
    		 */
    		if (tmp_opt.saw_tstamp &&
    		    tcp_death_row.sysctl_tw_recycle &&
    		    (dst = inet_csk_route_req(sk, req)) != NULL &&
    		    (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
    		    peer->v4daddr == saddr) {
    			if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
    			    (s32)(peer->tcp_ts - req->ts_recent) >
    							TCP_PAWS_WINDOW) {
    				NET_INC_STATS_BH(LINUX_MIB_PAWSPASSIVEREJECTED);
    				dst_release(dst);
    				goto drop_and_free;
    			}
    		}
    		/* Kill the following clause, if you dislike this way. */
    		else if (!sysctl_tcp_syncookies &&
    			 (sysctl_max_syn_backlog - inet_csk_reqsk_queue_len(sk) <
    			  (sysctl_max_syn_backlog >> 2)) &&
    			 (!peer || !peer->tcp_ts_stamp) &&
    			 (!dst || !dst_metric(dst, RTAX_RTT))) {
    			/* Without syncookies last quarter of
    			 * backlog is filled with destinations,
    			 * proven to be alive.
    			 * It means that we continue to communicate
    			 * to destinations, already remembered
    			 * to the moment of synflood.
    			 */
    			LIMIT_NETDEBUG(KERN_DEBUG "TCP: drop open "
    				       "request from %u.%u.%u.%u/%u\n",
    				       NIPQUAD(saddr),
    				       ntohs(tcp_hdr(skb)->source));
    			dst_release(dst);
    			goto drop_and_free;
    		}
    		isn = tcp_v4_init_sequence(skb);
    	}
    	//将确定的初始序列号记录到TCP控制块中
    	tcp_rsk(req)->snt_isn = isn;
    	
    	//发送SYN+ACK报文
    	if (tcp_v4_send_synack(sk, req, dst))
    		goto drop_and_free;
    
    	if (want_cookie) {
    		reqsk_free(req);
    	} else {
    		//将连接请求块加入到SYN请求队列中,并启动SYN+ACK超时重传定时器(初始值为3s)
    		inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
    	}
    	return 0;
    
    drop_and_free:
    	reqsk_free(req);
    drop:
    	return 0;
    }
    

    2.4 连接请求队列状态判断

    这里要看的accept连接队列和SYN请求队列是否已满的判断。

    2.4.1 SYN请求队列(半连接)已满判断 inet_csk_reqsk_queue_is_full

    static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk)
    {
    	return reqsk_queue_is_full(&inet_csk(sk)->icsk_accept_queue);
    }
    
    static inline int reqsk_queue_is_full(const struct request_sock_queue *queue)
    {
    	//如果当前已经收到SYN请求的套接字数目(qlen)大于nr_tables_entries,
    	//则认为SYN请求队列已满,这里巧妙的运用了移位运算而不是比较运算
    	return queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;
    }
    

    2.4.2 accept连接队列(全连接)已满判断 sk_acceptq_is_full

    static inline int sk_acceptq_is_full(struct sock *sk)
    {
    	//直接比较当前已完成三次握手的套接字数目和允许的最大值,这可以看出listen()
    	//调用中backlog参数(它会被赋值给sk_max_ack_backlog)的作用
    	return sk->sk_ack_backlog > sk->sk_max_ack_backlog;
    }
    

    2.5 连接请求块的分配和初始化 

    2.5.1 连接请求块的分配reqsk_alloc / inet_reqsk_alloc

    static inline struct request_sock *(const struct request_sock_ops *ops)
    {
    	//分配一个连接请求块,这里实际上是分配的struct tcp_request_sock结构
    	struct request_sock *req = kmem_cache_alloc(ops->slab, GFP_ATOMIC);
    	//将操作函数赋值给连接请求块的ops成员
    	if (req != NULL)
    		req->rsk_ops = ops;
    
    	return req;
    }
    

    调用reqsk_alloc()时传入的ops是tcp_request_sock_ops,其定义如下:

    struct request_sock_ops tcp_request_sock_ops __read_mostly = {
    	.family		=	PF_INET,
    	//创建的对象为struct tcp_request_sock
    	.obj_size	=	sizeof(struct tcp_request_sock),
    	.rtx_syn_ack	=	tcp_v4_send_synack,
    	.send_ack	=	tcp_v4_reqsk_send_ack,
    	.destructor	=	tcp_v4_reqsk_destructor,
    	.send_reset	=	tcp_v4_send_reset,
    };
    

    这里ops->slab是在AF_INET协议族初始化的时候创建的,代码如下:

    struct proto tcp_prot = {
    	...
    	.rsk_prot		= &tcp_request_sock_ops,
        ...
    };
    
    static int __init inet_init(void)
    {
    	...
    	rc = proto_register(&tcp_prot, 1);
    	if (rc)
    		goto out;
        ...
    }
    
    int proto_register(struct proto *prot, int alloc_slab)
    {
    	...
        prot->rsk_prot->slab = kmem_cache_create(request_sock_slab_name,
                             prot->rsk_prot->obj_size, 0,
                             SLAB_HWCACHE_ALIGN, NULL);
        ...
    }
    

    2.5.2 连接请求块的初始化

    连接请求块的初始化依赖于SYN请求段中的TCP选项,所以是在完成TCP选项解析后执行的,代码如下:

    static inline void tcp_openreq_init(struct request_sock *req,
    				    struct tcp_options_received *rx_opt,
    				    struct sk_buff *skb)
    {
    	struct inet_request_sock *ireq = inet_rsk(req);
    
    	req->rcv_wnd = 0;		/* So that tcp_send_synack() knows! */
    	tcp_rsk(req)->rcv_isn = TCP_SKB_CB(skb)->seq;
    	req->mss = rx_opt->mss_clamp;
    	req->ts_recent = rx_opt->saw_tstamp ? rx_opt->rcv_tsval : 0;
    	ireq->tstamp_ok = rx_opt->tstamp_ok;
    	ireq->sack_ok = rx_opt->sack_ok;
    	ireq->snd_wscale = rx_opt->snd_wscale;
    	ireq->wscale_ok = rx_opt->wscale_ok;
    	ireq->acked = 0;
    	ireq->ecn_ok = 0;
    	ireq->rmt_port = tcp_hdr(skb)->source;
    }
    

    2.6 将连接请求块加入SYN请求队列 inet_csk_reqsk_queue_hash_add

    void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
    				   unsigned long timeout)
    {
    	struct inet_connection_sock *icsk = inet_csk(sk);
    	//获取SYN请求队列
    	struct listen_sock *lopt = icsk->icsk_accept_queue.listen_opt;
    	//根据连接请求块的对端IP地址、端口号、初始哈希值计算一个哈希值
    	const u32 h = inet_synq_hash(inet_rsk(req)->rmt_addr, inet_rsk(req)->rmt_port,
    				     lopt->hash_rnd, lopt->nr_table_entries);
    	//将连接请求块插入SYN请求队列中,并且将超时时间设置到该连接请求块中
    	reqsk_queue_hash_req(&icsk->icsk_accept_queue, h, req, timeout);
    	//更新SYN请求队列中的计数信息:qlen、qlen_yong,并启动SYN+ACK重传定时器
    	inet_csk_reqsk_queue_added(sk, timeout);
    }
    
    static inline void reqsk_queue_hash_req(struct request_sock_queue *queue,
    					u32 hash, struct request_sock *req,
    					unsigned long timeout)
    {
    	struct listen_sock *lopt = queue->listen_opt;
    
    	//设置超时参数
    	req->expires = jiffies + timeout;
    	//初始化SYN+ACK报文重传次数为0
    	req->retrans = 0;
    	req->sk = NULL;
    	//将新的连接请求块插入到SYN请求队列的首部
    	req->dl_next = lopt->syn_table[hash];
    	write_lock(&queue->syn_wait_lock);
    	lopt->syn_table[hash] = req;
    	write_unlock(&queue->syn_wait_lock);
    }
    
    static inline void inet_csk_reqsk_queue_added(struct sock *sk,
    					      const unsigned long timeout)
    {
    	//更新listen_ops的计数信息。如果函数返回0,表示之前SYN请求队列为空,
    	//这种情况需要复位SYN+ACK重传定时器
    	if (reqsk_queue_added(&inet_csk(sk)->icsk_accept_queue) == 0)
    		inet_csk_reset_keepalive_timer(sk, timeout);
    }
    
    static inline int reqsk_queue_added(struct request_sock_queue *queue)
    {
    	struct listen_sock *lopt = queue->listen_opt;
    	const int prev_qlen = lopt->qlen;
    
    	//更新qlne和qlen_young
    	lopt->qlen_young++;
    	lopt->qlen++;
    	//返回SYN请求队列之前的长度
    	return prev_qlen;
    }
    展开全文
  • SQL数据库连接超时时间

    万次阅读 2018-10-26 13:53:00
    问题: 1:System.InvalidOperationException: 超时时间已到。超时时间已到,但是... (provider: TCP Provider, error: 0 - 由于系统缓冲区空间不足或队列已满,不能执行套接字上的操作。) 3:已成功与服务器建...

    问题:

    1:System.InvalidOperationException: 超时时间已到。超时时间已到,但是尚未从池中获取连接。出现这种情况可能是因为所有池连接均在使用,并且达到了最大池大小。

    2:在向服务器发送请求时发生传输级错误。 (provider: TCP Provider, error: 0 - 由于系统缓冲区空间不足或队列已满,不能执行套接字上的操作。)

    3:已成功与服务器建立连接,但是在登录前的握手期间发生错误。 (provider: SSL Provider, error: 0 - 等待的操作过时。) ---> System.ComponentModel.Win32Exception (0x80004005): 等待的操作过时。

    4:在从服务器接收结果时发生传输级错误。 (provider: TCP Provider, error: 0 - 指定的网络名不再可用。) ---> System.ComponentModel.Win32Exception (0x80004005): 指定的网络名不再可用。

    5:连接超时时间已到。在登录后阶段超过了此超时时间。连接可能在等待服务器完成登录过程并响应时超时;或者在尝试创建多个活动连接时超时。 尝试连接到此服务器时花费的持续时间是 - [Pre-Login] initialization=2;handshake=5;[Login] initialization=0;authentication=0;[Post-Login] complete=14025; ---> System.ComponentModel.Win32Exception (0x80004005): 等待的操作过时。

    解决方法:

    设置最大超时时间

    server=192.168.0.1;User ID=sa;Password=123;database=;Min Pool Size=0;Max Pool Size=30000;Pooling=true;

     

    一般这种出现于高并发请求的情况,以下代码可以查看SQL的连接数,这个可以作为监测的一种手段。

     SELECT [program_name] ,[spid],* FROM [sys].[sysprocesses] WHERE [spid]>50  and [program_name]='.Net SqlClient Data Provider' and dbid= db_id('') 

     

    展开全文
  • dpvs synproxy

    2020-08-19 17:37:38
    SYN cookies技术就可以使服务器在半连接队列已满的情况下仍能处理新的SYN请求 当半连接队列满时,SYN cookies并不丢弃SYN请求,而是通过加密技术来标识半连接状态。在TCP实现中,当收到客户端的SYN请求时,...
    • TCP协议开辟了一个比较大的内存空间请求连接队列来存储连接请求块,当SYN请求不断增加,请求连接数目到达上限时,会致使系统丢弃SYN连接请求。SYN cookies技术就可以使服务器在半连接队列已满的情况下仍能处理新的SYN请求

    • 当半连接队列满时,SYN cookies并不丢弃SYN请求,而是通过加密技术来标识半连接状态。在TCP实现中,当收到客户端的SYN请求时,服务器需回复SYN+ACK包给客户端,然后客户端再发送确认包给服务器。通常服务器的初始序列号是由服务器按照一定的规律计算得到或采用随机数,而在SYN cookies中,服务器的初始序列号是由客户端IP地址,客户端端口,服务器IP地址和服务器端口,接收到的客户端初始序列号以及其他一些安全数值进行hash运算,并加密后得到的,称之为cookie。当服务器遭受SYN攻击使得请求连接队列满时,服务器并不拒绝新的SYN请求,而是回复一个初始化序列号为cookie的SYN包给客户端,如果收到客户端的ACK段,服务器将客户端的ACK序列号减1得到的值,与用上述那些元素hash运算得到的值比较,如果相等,直接完成三次握手,注意:此时并不必查看此连接是否属于请求连接队列

    • linux内核启用SYN cookies是通过在启动环境中设置以下命令来完成

      echo 1 > /proc/sys/net/ipv4/tcp_syncookies
      
    • 三次握手示例图

       

    • synproxy说明

      1. client发送syn,LB代理了第一次握手,不转发给rs. LB返回syn+ack数据包时,seq由syn cookies算法生成,并且将rcv_wnd设置为0,不允许在握手阶段携带数据,由此得知不支持tcp fast open
      2. 当client返回ack时,反解seq,如果与syn cookies算法匹配,那么就是正常流量。此时LB与后端RS开启三次握手,并透传win size,由于经过LB代理,还需要记录seq差值delta
      3. 数据交互通信,lb除了正常的full-nat工作,还要补偿seq delta
      4. 连接关闭,正常清理

     client第一次握手

    • __dp_vs_pre_routing

      static int __dp_vs_pre_routing(void *priv, struct rte_mbuf *mbuf,
                                     const struct inet_hook_state *state, int af)
      {
         ...
          /* Synproxy: defence synflood */
          //如果是传输层协议是TCP,synproxy处理,此处处理client端第一次握手(syn)包的处理
          if (IPPROTO_TCP == iph.proto) {
              int v = INET_ACCEPT;
              if (0 == dp_vs_synproxy_syn_rcv(af, mbuf, &iph, &v))
                  return v;
          }
      
          return INET_ACCEPT;
      }
      
    • dp_vs_synproxy_syn_rcv

      • 处理client侧第一次握手数据包(syn包)
      /* Syn-proxy step 1 logic: receive client's Syn.
       * Check if synproxy is enabled for this skb, and send syn/ack back
       *
       * Synproxy is enabled when:
       * 1) mbuf is a syn packet,
       * 2) and the service is synproxy-enable,
       * 3) and ip_vs_todrop return fasle (not supported now)
       *
       * @return 0 means the caller should return at once and use
       * verdict as return value, return 1 for nothing.
       */
      int dp_vs_synproxy_syn_rcv(int af, struct rte_mbuf *mbuf,
              const struct dp_vs_iphdr *iph, int *verdict)
      {
          int ret;
          struct dp_vs_service *svc = NULL;
          struct tcphdr *th, _tcph;
          struct dp_vs_synproxy_opt tcp_opt;
          struct netif_port *dev;
          struct ether_hdr *eth;
          struct ether_addr ethaddr;
          //th指向tcp首部
          th = mbuf_header_pointer(mbuf, iph->len, sizeof(_tcph), &_tcph);
          if (unlikely(NULL == th))
              goto syn_rcv_out;
          //第一次握手只有syn包,并有访问svc,开启了syn proxy防护
          if (th->syn && !th->ack && !th->rst && !th->fin &&
                  (svc = dp_vs_service_lookup(af, iph->proto, &iph->daddr, th->dest, 0,
                                              NULL, NULL, NULL, rte_lcore_id())) &&
                  (svc->flags & DP_VS_SVC_F_SYNPROXY)) {
              /* if service's weight is zero (non-active realserver),
               * do noting and drop the packet */
               //如果后端服务svc权重为0,没有可用后端,返回 INET_DROP
              if (svc->weight == 0) {
                  dp_vs_estats_inc(SYNPROXY_NO_DEST);
                  goto syn_rcv_out;
              }
      
              /* drop packet from blacklist */
              //如果在黑名单中,那么退出,返回 INET_DROP
              if (dp_vs_blklst_lookup(iph->af, iph->proto, &iph->daddr,
                          th->dest, &iph->saddr)) {
                  goto syn_rcv_out;
              }
          } else {
              return 1;
          }
      
          /* mbuf will be reused and ether header will be set.
           * FIXME: to support non-ether packets. */
          if (mbuf->l2_len != sizeof(struct ether_hdr))
              goto syn_rcv_out;
      
          /* update statistics */
          //更新统计信息
          dp_vs_estats_inc(SYNPROXY_SYN_CNT);
      
          /* set tx offload flags */
          //校验
          assert(mbuf->port <= NETIF_MAX_PORTS);
          //获取net_device层设备,并做校验
          dev = netif_port_get(mbuf->port);
          if (unlikely(!dev)) {
              RTE_LOG(ERR, IPVS, "%s: device eth%d not found\\n",
                      __func__, mbuf->port);
              goto syn_rcv_out;
          }
          //根据物理设备的硬件负载功能,设置mbuf相应标志位
          if (likely(dev && (dev->flag & NETIF_PORT_FLAG_TX_TCP_CSUM_OFFLOAD))) {
              if (af == AF_INET)
                  mbuf->ol_flags |= (PKT_TX_TCP_CKSUM | PKT_TX_IP_CKSUM | PKT_TX_IPV4);
              else
                  mbuf->ol_flags |= (PKT_TX_TCP_CKSUM | PKT_TX_IPV6);
          }
      
          /* reuse mbuf */
          //复用mbuf并回复syn+ack,为什么说是复用呢?因为对mbuf修改后,直接当做syn+ack回复包返回给了client
          syn_proxy_reuse_mbuf(af, mbuf, th, &tcp_opt);
      
          /* set L2 header and send the packet out
           * It is noted that "ipv4_xmit" should not used here,
           * because mbuf is reused. */
          //设置L2层的header,源和目的对换
          eth = (struct ether_hdr *)rte_pktmbuf_prepend(mbuf, mbuf->l2_len);
          if (unlikely(!eth)) {
              RTE_LOG(ERR, IPVS, "%s: no memory\\n", __func__);
              goto syn_rcv_out;
          }
          memcpy(&ethaddr, &eth->s_addr, sizeof(struct ether_addr));
          memcpy(&eth->s_addr, &eth->d_addr, sizeof(struct ether_addr));
          memcpy(&eth->d_addr, &ethaddr, sizeof(struct ether_addr));
          //调用netif_xmit发送数据包
          if (unlikely(EDPVS_OK != (ret = netif_xmit(mbuf, dev)))) {
              RTE_LOG(ERR, IPVS, "%s: netif_xmit failed -- %s\\n",
                      __func__, dpvs_strerror(ret));
          /* should not set verdict to INET_DROP since netif_xmit
           * always consume the mbuf while INET_DROP means mbuf'll
           * be free in INET_HOOK.*/
          }
          *verdict = INET_STOLEN;
          return 0;
      
      syn_rcv_out:
          /* drop and destroy the packet */
          *verdict = INET_DROP;
          return 0;
      }
      
    • syn_proxy_reuse_mbuf

      • 设置tcp选项
      • 计算syn+ack包的seq,syn cookies计算
      • 设置syn+ack包的seq和ack_seq
      • 交换ip和tcp首部的源,目的信息
      • 计算ip首部和tcp首部校验和
      /* Reuse mbuf for syn proxy, called by syn_proxy_syn_rcv().
       * do following things:
       * 1) set tcp options,
       * 2) compute seq with cookie func,
       * 3) set tcp seq and ack_seq,
       * 4) exchange ip addr and tcp port,
       * 5) compute iphdr and tcp check (HW xmit checksum offload not support for syn).
       */
      static void syn_proxy_reuse_mbuf(int af, struct rte_mbuf *mbuf,
                                       struct tcphdr *th,
                                       struct dp_vs_synproxy_opt *opt)
      {
          uint32_t isn;
          uint16_t tmpport;
          int      iphlen;
      
          //获取ip首部长度
          if (AF_INET6 == af)
          {
              iphlen = sizeof(struct ip6_hdr);
          }
          else
          {
              iphlen = ip4_hdrlen(mbuf);
          }
          //长度校验,确保首部长度正确
          if (mbuf_may_pull(mbuf, iphlen + (th->doff << 2)) != 0)
          {
              return;
          }
      
          /* deal with tcp options */
          //解析并且设置tcp options,包括mss,window size,timestamp
          syn_proxy_parse_set_opts(mbuf, th, opt);
      
          /* get cookie */
          //根据syn cookies算法生成syn+ack数据包的seq
          if (AF_INET6 == af)
          {
              isn = syn_proxy_cookie_v6_init_sequence(mbuf, th, opt);
          }
          else
          {
              isn = syn_proxy_cookie_v4_init_sequence(mbuf, th, opt);
          }
      
          /* set syn-ack flag */
          //设置syn|ack标志
          ((uint8_t *)th)[13] = 0x12;
      
          /* exchage ports */
          //交换dest,source端口
          tmpport    = th->dest;
          th->dest   = th->source;
          th->source = tmpport;
          /* set window size to zero */
          //设置接收窗口为0,不允许握手阶段携带数据信息
          th->window = 0;
          /* set seq(cookie) and ack_seq */
          //设置seq和ack_seq,其中ack_seq是客户端序号加1,而返回的syn seq就是刚刚计算出来的cookie
          th->ack_seq = htonl(ntohl(th->seq) + 1);
          th->seq     = htonl(isn);
      
          /* exchage addresses */
          //交换源和目的ip地址信息并重新计算校验和
          if (AF_INET6 == af)
          {
              struct in6_addr tmpaddr;
              struct ip6_hdr *ip6h = ip6_hdr(mbuf);
      
              tmpaddr        = ip6h->ip6_src;
              ip6h->ip6_src  = ip6h->ip6_dst;
              ip6h->ip6_dst  = tmpaddr;
              ip6h->ip6_hlim = dp_vs_synproxy_ctrl_synack_ttl;
      
              if (likely(mbuf->ol_flags & PKT_TX_TCP_CKSUM))
              {
                  mbuf->l3_len = (void *)th - (void *)ip6h;
                  mbuf->l4_len = ntohs(ip6h->ip6_plen) + sizeof(struct ip6_hdr) - mbuf->l3_len;
                  th->check    = ip6_phdr_cksum(ip6h, mbuf->ol_flags, mbuf->l3_len, IPPROTO_TCP);
              }
              else
              {
                  if (mbuf_may_pull(mbuf, mbuf->pkt_len) != 0)
                  {
                      return;
                  }
                  tcp6_send_csum((struct ipv6_hdr *)ip6h, th);
              }
          }
          else
          {
              uint32_t      tmpaddr;
              struct iphdr *iph = (struct iphdr *)ip4_hdr(mbuf);
      
              tmpaddr    = iph->saddr;
              iph->saddr = iph->daddr;
              iph->daddr = tmpaddr;
              iph->ttl   = dp_vs_synproxy_ctrl_synack_ttl;
              iph->tos   = 0;
      
              /* compute checksum */
              if (likely(mbuf->ol_flags & PKT_TX_TCP_CKSUM))
              {
                  mbuf->l3_len = iphlen;
                  mbuf->l4_len = ntohs(iph->tot_len) - iphlen;
                  th->check    = ip4_phdr_cksum((struct ipv4_hdr *)iph, mbuf->ol_flags);
              }
              else
              {
                  if (mbuf_may_pull(mbuf, mbuf->pkt_len) != 0)
                  {
                      return;
                  }
                  tcp4_send_csum((struct ipv4_hdr *)iph, th);
              }
              //如果硬件不支持计算csum,调用ip4_send_csum生成checksum
              if (likely(mbuf->ol_flags & PKT_TX_IP_CKSUM))
              {
                  iph->check = 0;
              }
              else
              {
                  ip4_send_csum((struct ipv4_hdr *)iph);
              }
          }
      }
      
    • syn_proxy_parse_set_opts

      /* Replace tcp options in tcp header, called by syn_proxy_reuse_mbuf() */
      static void syn_proxy_parse_set_opts(struct rte_mbuf *mbuf, struct tcphdr *th,
                                           struct dp_vs_synproxy_opt *opt)
      {
          /* mss in received packet */
          uint16_t        in_mss;
          uint32_t *      tmp;
          unsigned char * ptr;
          //计算tcp选项长度
          int             length   = (th->doff * 4) - sizeof(struct tcphdr);
          uint16_t        user_mss = dp_vs_synproxy_ctrl_init_mss;
          struct timespec tsp_now;
      
          memset(opt, '\\0', sizeof(struct dp_vs_synproxy_opt));
          opt->mss_clamp = 536;
          ptr            = (unsigned char *)(th + 1);
      
          while (length > 0)
          {
              unsigned char *tmp_opcode = ptr;
              int            opcode     = *ptr++;
              int            opsize;
      
              switch (opcode)
              {
              //选项结束,直接返回
              case TCPOPT_EOL:
                  return;
              //NOP选项,只作填充用,因此选项长度减1,进入下一个循环处理下一个选项
              case TCPOPT_NOP:
                  length--;
                  continue;
      
              default:
                  opsize = *ptr++;
                  //如果不是选项表结束标志也不是空操作,则选取选项长度,并检测其合法性
                  if (opsize < 2)  /* silly options */
                  {
                      return;
                  }
                  //选项长度校验
                  if (opsize > length)
                  {
                      return; /* don't parse partial options */
                  }
                  switch (opcode)
                  {
                  case TCPOPT_MAXSEG:
                      //用来通告最大段长度,最大段长度选项格式如下
                      //kind=2|len=4|最大段长度
                      //该选项只能出现在SYN段片段中
                      if (opsize == TCPOLEN_MAXSEG)
                      {
                          in_mss = ntohs(*(uint16_t *)ptr);
                          if (in_mss)
                          {
                              //如果系统设置的mss小于对端通告的mss,使用较小值回复
                              if (user_mss < in_mss)
                              {
                                  in_mss = user_mss;
                              }
                              opt->mss_clamp = in_mss;
                          }
                          //字节序转换
                          *(uint16_t *)ptr = htons(opt->mss_clamp);
                      }
                      break;
                  //窗口选项
                  case TCPOPT_WINDOW:
                      /**
                       * kind=3|len=3|位移数
                       * 去窗口扩大因子选项中的位移数,将标识SYN段中包含窗口扩大因子选项的wscale_ok置为1,
                       * 如果选项中位移数大于14则警告
                       */
                      if (opsize == TCPOLEN_WINDOW)
                      {
                          if (dp_vs_synproxy_ctrl_wscale)
                          {
                              opt->wscale_ok  = 1;
                              opt->snd_wscale = *(uint8_t *)ptr;
                              if (opt->snd_wscale > DP_VS_SYNPROXY_WSCALE_MAX)
                              {
                                  RTE_LOG(INFO, IPVS, "tcp_parse_options: Illegal window "
                                          "scaling value %d > %d received.",
                                          opt->snd_wscale, DP_VS_SYNPROXY_WSCALE_MAX);
                                  opt->snd_wscale = DP_VS_SYNPROXY_WSCALE_MAX;
                              }
                              *(uint8_t *)ptr = (uint8_t)dp_vs_synproxy_ctrl_wscale;
                          }
                          else
                          {
                              //不支持以NOP选项填充
                              memset(tmp_opcode, TCPOPT_NOP, TCPOLEN_WINDOW);
                          }
                      }
                      break;
                  //时间戳选项
                  case TCPOPT_TIMESTAMP:
                      if (opsize == TCPOLEN_TIMESTAMP)
                      {
                          if (dp_vs_synproxy_ctrl_timestamp)
                          {
                              memset(&tsp_now, 0, sizeof(tsp_now));
                              clock_gettime(CLOCK_REALTIME, &tsp_now);
                              opt->tstamp_ok = 1;
                              tmp            = (uint32_t *)ptr;
                              *(tmp + 1)     = *tmp;
                              *tmp           = htonl((uint32_t)(TCP_OPT_TIMESTAMP(tsp_now)));
                          }
                          else
                          {
                              memset(tmp_opcode, TCPOPT_NOP, TCPOLEN_TIMESTAMP);
                          }
                      }
                      break;
                  
                  case TCPOPT_SACK_PERMITTED:
                       //允许SACK选项,只能出现在SYN段中,将sack_ok置为1,标识syn中允许sack选项.
                      if (opsize == TCPOLEN_SACK_PERMITTED)
                      {
                          if (dp_vs_synproxy_ctrl_sack)
                          {
                              opt->sack_ok = 1;
                          }
                          else
                          {
                              memset(tmp_opcode, TCPOPT_NOP, TCPOLEN_SACK_PERMITTED);
                          }
                      }
                      break;
                  }
                  ptr    += opsize - 2;
                  length -= opsize;
              }
          }
      }
      

     client第三次握手包应答

    • __dp_vs_in

      • client侧第三次握手包(ACK),在__dp_vs_pre_routing中肯定会返回ACCEPT,继续在__dp_vs_in中处理
      • 查找连接时不会命中,调用tcp传输层tcp_conn_sched函数进行新连接的调度
      static int __dp_vs_in(void *priv, struct rte_mbuf *mbuf,
                            const struct inet_hook_state *state, int af)
      {
      		....
      		//对于新建的连接,肯定是没有会话的,conn_sched根据请求选择一个后端real server建立连接
          if (unlikely(!conn))
          {
              /* try schedule RS and create new connection */
              //调用proto中conn_sched接口选择一个后端rs建立连接,如果创建连接失败,返回verdict
              if (prot->conn_sched(prot, &iph, mbuf, &conn, &verdict) != EDPVS_OK)
              {
                  /* RTE_LOG(DEBUG, IPVS, "%s: fail to schedule.\\n", __func__); */
                  return(verdict);
              }
      
              /* only SNAT triggers connection by inside-outside traffic. */
              //snat模式,则是内部服务器访问外部服务,内网服务器--->dpvs--->外网服务器(baidu),所以设置dir=DPVS_CONN_DIR_OUTBOUND
              if (conn->dest->fwdmode == DPVS_FWD_MODE_SNAT)
              {
                  dir = DPVS_CONN_DIR_OUTBOUND;
              }
              else
              {
                  //其余模式设置dir=DPVS_CONN_DIR_INBOUND
                  dir = DPVS_CONN_DIR_INBOUND;
              }
          }
      		...
      }
      
    • tcp_conn_sched

      static int tcp_conn_sched(struct dp_vs_proto *proto,
                                const struct dp_vs_iphdr *iph,
                                struct rte_mbuf *mbuf,
                                struct dp_vs_conn **conn,
                                int *verdict)
      {
      		...
      		/* Syn-proxy step 2 logic: receive client's 3-handshacke ack packet */
      
          /* When synproxy disabled, only SYN packets can arrive here.
           * So don't judge SYNPROXY flag here! If SYNPROXY flag judged, and syn_proxy
           * got disbled and keepalived reloaded, SYN packets for RS may never be sent. */
          //如果是syn cookies 连接建立第三次握手数据包,则返回EDPVS_PKTSTOLEN
          if (dp_vs_synproxy_ack_rcv(iph->af, mbuf, th, proto, conn, iph, verdict) == 0)
          {
              /* Attention: First ACK packet is also stored in conn->ack_mbuf */
              return(EDPVS_PKTSTOLEN);
          }
      		...
      }
      
    • dp_vs_synproxy_ack_rcv

      • syn cookies校验
      • dp_vs_schedule 新建立连接后端调度,选择一个real server
      • syn_proxy_send_rs_syn进行LB与RS的第一次握手
      /* Syn-proxy step 2 logic: receive client's Ack
       * Receive client's 3-handshakes ack packet, do cookie check and then
       * send syn to rs after creating a session */
      int dp_vs_synproxy_ack_rcv(int af, struct rte_mbuf *mbuf,
                                 struct tcphdr *th, struct dp_vs_proto *pp,
                                 struct dp_vs_conn **cpp,
                                 const struct dp_vs_iphdr *iph, int *verdict)
      {
          int res;
          struct dp_vs_synproxy_opt opt;
          struct dp_vs_service *    svc;
          int res_cookie_check;
      
          /* Do not check svc syn-proxy flag, as it may be changed after syn-proxy step 1. */
          
          if (!th->syn && th->ack && !th->rst && !th->fin &&
              (svc = dp_vs_service_lookup(af, iph->proto, &iph->daddr,
                                          th->dest, 0, NULL, NULL, NULL, rte_lcore_id())))
          {
              if (dp_vs_synproxy_ctrl_defer &&
                  !syn_proxy_ack_has_data(mbuf, iph, th))
              {
                  /* Update statistics */
                  dp_vs_estats_inc(SYNPROXY_NULL_ACK);
      
                  /* We get a pure ack when expecting ack packet with payload, so
                   * have to drop it */
                  *verdict = INET_DROP;
                  return(0);
              }
              //syn cookies验证,如果不匹配,那么就是攻击或是无效流量,将包丢弃。如果成功,执行 syn proxy 第二阶段,lb 调用
              //dp_vs_schedule 与后端 real server 建立连接
              if (AF_INET6 == af)
              {
                  res_cookie_check = syn_proxy_v6_cookie_check(mbuf,
                                                               ntohl(th->ack_seq) - 1, &opt);
              }
              else
              {
                  res_cookie_check = syn_proxy_v4_cookie_check(mbuf,
                                                               ntohl(th->ack_seq) - 1, &opt);
              }
              if (!res_cookie_check)
              {
                  /* Update statistics */
                  dp_vs_estats_inc(SYNPROXY_BAD_ACK);
                  /* Cookie check failed, drop the packet */
                  RTE_LOG(DEBUG, IPVS, "%s: syn_cookie check failed seq=%u\\n", __func__,
                          ntohl(th->ack_seq) - 1);
                  *verdict = INET_DROP;
                  return(0);
              }
      
              /* Update statistics */
              dp_vs_estats_inc(SYNPROXY_OK_ACK);
      
              /* Let the virtual server select a real server for the incoming connetion,
               * and create a connection entry */
               //dp_vs_schedule 新建立连接后端调度,选择一个real server
              *cpp = dp_vs_schedule(svc, iph, mbuf, 1, 0);
              if (unlikely(!*cpp))
              {
                  RTE_LOG(WARNING, IPVS, "%s: ip_vs_schedule failed\\n", __func__);
      
                  /* FIXME: What to do when virtual service is available but no destination
                   * available for a new connetion: send an icmp UNREACHABLE ? */
                  *verdict = INET_DROP;
                  return(0);
              }
      
              /* Do nothing but print a error msg when fail, because session will be
               * correctly freed in dp_vs_conn_expire */
              //syn_proxy_send_rs_syn 完成 lb 与 real server 建连
              if (EDPVS_OK != (res = syn_proxy_send_rs_syn(af, th, *cpp, mbuf, pp, &opt)))
              {
                  RTE_LOG(ERR, IPVS, "%s: syn_proxy_send_rs_syn failed -- %s\\n",
                          __func__, dpvs_strerror(res));
              }
      
              /* Count in the ack packet (STOLEN by synproxy) */
              dp_vs_stats_in(*cpp, mbuf);
      
              /* Active session timer, and dec refcnt.
               * Also steal the mbuf, and let caller return immediately */
              dp_vs_conn_put(*cpp);
              *verdict = INET_STOLEN;
              return(0);
          }
      
          return(1);
      }
      
    • syn_proxy_send_rs_syn

      /* Create syn packet and send it to rs.
       * We also store syn mbuf in cp if syn retransmition is turned on. */
      static int syn_proxy_send_rs_syn(int af, const struct tcphdr *th,
                                       struct dp_vs_conn *cp, struct rte_mbuf *mbuf,
                                       struct dp_vs_proto *pp, struct dp_vs_synproxy_opt *opt)
      {
          int tcp_hdr_size;
          struct rte_mbuf *   syn_mbuf, *syn_mbuf_cloned;
          struct rte_mempool *pool;
          struct tcphdr *     syn_th;
      
          if (!cp->packet_xmit)
          {
              RTE_LOG(WARNING, IPVS, "%s: packet_xmit is null\\n", __func__);
              return(EDPVS_INVAL);
          }
      
          /* Allocate mbuf from device mempool */
          pool = get_mbuf_pool(cp, DPVS_CONN_DIR_INBOUND);
          if (unlikely(!pool))
          {
              //RTE_LOG(WARNING, IPVS, "%s: %s\\n", __func__, dpvs_strerror(EDPVS_NOROUTE));
              return(EDPVS_NOROUTE);
          }
          //从内存池中分配syn_mbuf,用于发送到后端real server
          syn_mbuf = rte_pktmbuf_alloc(pool);
          if (unlikely(!syn_mbuf))
          {
              //RTE_LOG(WARNING, IPVS, "%s: %s\\n", __func__, dpvs_strerror(EDPVS_NOMEM));
              return(EDPVS_NOMEM);
          }
          //设置路由缓存为null
          syn_mbuf->userdata = NULL; /* make sure "no route info" */
      
          /* Reserve space for tcp header */
          //为tcp层保留空间,包括选项,通过prepend向mbuf的headroom添加数据
          tcp_hdr_size = (sizeof(struct tcphdr) + TCPOLEN_MAXSEG
                          + (opt->tstamp_ok ? TCPOLEN_TSTAMP_APPA : 0)
                          + (opt->wscale_ok ? TCP_OLEN_WSCALE_ALIGNED : 0)
                          /* SACK_PERM is in the palce of NOP NOP of TS */
                          + ((opt->sack_ok && !opt->tstamp_ok) ? TCP_OLEN_SACKPERMITTED_ALIGNED : 0));
          syn_th = (struct tcphdr *)rte_pktmbuf_prepend(syn_mbuf, tcp_hdr_size);
          if (!syn_th)
          {
              rte_pktmbuf_free(syn_mbuf);
              //RTE_LOG(WARNING, IPVS, "%s:%s\\n", __func__, dpvs_strerror(EDPVS_NOROOM));
              return(EDPVS_NOROOM);
          }
      
          /* Set up tcp header */
          memset(syn_th, 0, tcp_hdr_size);
          syn_th->source              = th->source;
          syn_th->dest                = th->dest;
          syn_th->seq                 = htonl(ntohl(th->seq) - 1);
          syn_th->ack_seq             = 0;
          *(((uint16_t *)syn_th) + 6) = htons(((tcp_hdr_size >> 2) << 12) | /*TH_SYN*/ 0x02);
          /* FIXME: what window should we use */
          syn_th->window  = htons(5000);
          syn_th->check   = 0;
          syn_th->urg_ptr = 0;
          syn_th->urg     = 0;
          //构造syn包的tcp选项
          syn_proxy_syn_build_options((uint32_t *)(syn_th + 1), opt);
          //IP首部的构造
          if (AF_INET6 == af)
          {
              struct ip6_hdr *ack_ip6h;
              struct ip6_hdr *syn_ip6h;
      
              /* Reserve space for ipv6 header */
              syn_ip6h = (struct ip6_hdr *)rte_pktmbuf_prepend(syn_mbuf,
                                                               sizeof(struct ip6_hdr));
              if (!syn_ip6h)
              {
                  rte_pktmbuf_free(syn_mbuf);
                  //RTE_LOG(WARNING, IPVS, "%s:%s\\n", __func__, dpvs_strerror(EDPVS_NOROOM));
                  return(EDPVS_NOROOM);
              }
      
              ack_ip6h = (struct ip6_hdr *)ip6_hdr(mbuf);
      
              syn_ip6h->ip6_vfc  = 0x60; /* IPv6 */
              syn_ip6h->ip6_src  = ack_ip6h->ip6_src;
              syn_ip6h->ip6_dst  = ack_ip6h->ip6_dst;
              syn_ip6h->ip6_plen = htons(tcp_hdr_size);
              syn_ip6h->ip6_nxt  = NEXTHDR_TCP;
              syn_ip6h->ip6_hlim = IPV6_DEFAULT_HOPLIMIT;
      
              syn_mbuf->l3_len = sizeof(*syn_ip6h);
          }
          else
          {
              struct iphdr *ack_iph;
              struct iphdr *syn_iph;
      
              /* Reserve space for ipv4 header */
              syn_iph = (struct iphdr *)rte_pktmbuf_prepend(syn_mbuf, sizeof(struct ipv4_hdr));
              if (!syn_iph)
              {
                  rte_pktmbuf_free(syn_mbuf);
                  //RTE_LOG(WARNING, IPVS, "%s:%s\\n", __func__, dpvs_strerror(EDPVS_NOROOM));
                  return(EDPVS_NOROOM);
              }
      
              ack_iph = (struct iphdr *)ip4_hdr(mbuf);
              *((uint16_t *)syn_iph) = htons((4 << 12) | (5 << 8) | (ack_iph->tos & 0x1E));
              syn_iph->tot_len       = htons(syn_mbuf->pkt_len);
              syn_iph->frag_off      = htons(IPV4_HDR_DF_FLAG);
              syn_iph->ttl           = 64;
              syn_iph->protocol      = IPPROTO_TCP;
              syn_iph->saddr         = ack_iph->saddr;
              syn_iph->daddr         = ack_iph->daddr;
      
              syn_mbuf->l3_len = sizeof(*syn_iph);
      
              /* checksum is done by fnat_in_handler */
              syn_iph->check = 0;
          }
      
          /* Save syn_mbuf if syn retransmission is on */
          //syn_retry,主动连接时的超时重传次数,如果大于零,将构造的数据报缓存起来
          if (dp_vs_synproxy_ctrl_syn_retry > 0)
          {
              syn_mbuf_cloned = mbuf_copy(syn_mbuf, pool);
              if (unlikely(!syn_mbuf_cloned))
              {
                  rte_pktmbuf_free(syn_mbuf);
                  //RTE_LOG(WARNING, IPVS, "%s:%s\\n", __func__, dpvs_strerror(EDPVS_NOMEM));
                  return(EDPVS_NOMEM);
              }
      
              syn_mbuf_cloned->userdata = NULL;
              cp->syn_mbuf = syn_mbuf_cloned;
              sp_dbg_stats32_inc(sp_syn_saved);
              rte_atomic32_set(&cp->syn_retry_max, dp_vs_synproxy_ctrl_syn_retry);
          }
      
          /* TODO: Save info for fast_response_xmit */
      
          /* Count in the syn packet */
          dp_vs_stats_in(cp, mbuf);
      
          /* If xmit failed, syn_mbuf will be freed correctly */
          //调用packet_xmit发送,此处为dp_vs_xmit_fnat
          cp->packet_xmit(pp, cp, syn_mbuf);
      
          return(EDPVS_OK);
      }
      

     rs端syn+ack应答

    • __dp_vs_in

      • 方向为DPVS_CONN_DIR_OUTBOUND
      • 此时能够查找到连接,最终会进入dp_vs_synproxy_synack_rcv逻辑
      static int __dp_vs_in(void *priv, struct rte_mbuf *mbuf,
                            const struct inet_hook_state *state, int af)
      {
      		if (conn->flags & DPVS_CONN_F_SYNPROXY)
          {
              if (dir == DPVS_CONN_DIR_INBOUND)
              {
                  /* Filter out-in ack packet when cp is at SYN_SENT state.
                   * Drop it if not a valid packet, store it otherwise */
                  if (0 == dp_vs_synproxy_filter_ack(mbuf, conn, prot,
                                                     &iph, &verdict))
                  {
                      dp_vs_stats_in(conn, mbuf);
                      dp_vs_conn_put(conn);
                      return(verdict);
                  }
      
                  /* "Reuse" synproxy sessions.
                   * "Reuse" means update syn_proxy_seq struct
                   * and clean ack_mbuf etc. */
                  if (0 != dp_vs_synproxy_ctrl_conn_reuse)
                  {
                      if (0 == dp_vs_synproxy_reuse_conn(af, mbuf, conn, prot,
                                                         &iph, &verdict))
                      {
                          dp_vs_stats_in(conn, mbuf);
                          dp_vs_conn_put(conn);
                          return(verdict);
                      }
                  }
              }
              else
              {
                  /* Syn-proxy 3 logic: receive syn-ack from rs */
                  if (dp_vs_synproxy_synack_rcv(mbuf, conn, prot,
                                                iph.len, &verdict) == 0)
                  {
                      dp_vs_stats_out(conn, mbuf);
                      dp_vs_conn_put(conn);
                      return(verdict);
                  }
              }
          }
      }
      
    • dp_vs_synproxy_synack_rcv

      /* Syn-proxy step 3 logic: receive rs's Syn/Ack.
       * Update syn_proxy_seq.delta and send stored ack mbufs to rs. */
      int dp_vs_synproxy_synack_rcv(struct rte_mbuf *mbuf, struct dp_vs_conn *cp,
                                    struct dp_vs_proto *pp, int th_offset, int *verdict)
      {
          struct tcphdr _tcph, *th;
          struct dp_vs_synproxy_ack_pakcet *tmbuf, *tmbuf2;
          struct list_head   save_mbuf;
          struct dp_vs_dest *dest         = cp->dest;
          unsigned           conn_timeout = 0;
      
          //th指向tcp首部起始位置
          th = mbuf_header_pointer(mbuf, th_offset, sizeof(_tcph), &_tcph);
          if (unlikely(!th))
          {
              *verdict = INET_DROP;
              return(0);
          }
      
      #ifdef CONFIG_DPVS_IPVS_DEBUG
          RTE_LOG(DEBUG, IPVS, "%s: seq = %u ack_seq = %u %c%c%c cp->is_synproxy = %u "
                  "cp->state = %u\\n", __func__, ntohl(th->seq), ntohl(th->ack_seq),
                  (th->syn) ? 'S' : '-',
                  (th->ack) ? 'A' : '-',
                  (th->rst) ? 'R' : '-',
                  cp->flags & DPVS_CONN_F_SYNPROXY, cp->state);
      #endif
      
          INIT_LIST_HEAD(&save_mbuf);
          //判断应答包状态,必须是syn和ack包,并且开启了synproxy,当前conn连接处于DPVS_TCP_S_SYN_SENT状态
          if ((th->syn) && (th->ack) && (!th->rst) &&
              (cp->flags & DPVS_CONN_F_SYNPROXY) &&
              (cp->state == DPVS_TCP_S_SYN_SENT))
          {
              //更新syn_proxy_seq.delta 序列号差值
              cp->syn_proxy_seq.delta = ntohl(cp->syn_proxy_seq.isn) - ntohl(th->seq);
              //连接状态进入ESTABLISHED
              cp->state = DPVS_TCP_S_ESTABLISHED;
              //获取连接超时时间
              conn_timeout = dp_vs_get_conn_timeout(cp);
              if (unlikely((conn_timeout != 0) && (cp->proto == IPPROTO_TCP)))
              {
                  cp->timeout.tv_sec = conn_timeout;
              }
              else
              {
                  cp->timeout.tv_sec = pp->timeout_table[cp->state];
              }
              dpvs_time_rand_delay(&cp->timeout, 1000000);
              //更新dest上的连接统计信息
              if (dest)
              {
                  rte_atomic32_inc(&dest->actconns);
                  rte_atomic32_dec(&dest->inactconns);
                  cp->flags &= ~DPVS_CONN_F_INACTIVE;
              }
      
              /* Save tcp sequence for fullnat/nat, inside to outside */
              //保存序号 rs_end_seq 和 rs_end_ack
              if (DPVS_FWD_MODE_NAT == cp->dest->fwdmode ||
                  DPVS_FWD_MODE_FNAT == cp->dest->fwdmode)
              {
                  cp->rs_end_seq = htonl(ntohl(th->seq) + 1);
                  cp->rs_end_ack = th->ack_seq;
      #ifdef CONFIG_DPVS_IPVS_DEBUG
                  RTE_LOG(DEBUG, IPVS, "%s: packet from rs, seq = %u, ack_seq = %u, port %u => %u\\n",
                          __func__, ntohl(th->seq), ntohl(th->ack_seq),
                          ntohs(th->source), ntohs(th->dest));
      #endif
              }
      
              /* TODO: ip_vs_synproxy_save_fast_xmit_info ? */
      
              /* Free stored syn mbuf, no need for retransmition any more */
              //syn_mbuf上保存了lb->rs发起连接请求的数据报,此时连接正常完成,需要释放
              if (cp->syn_mbuf)
              {
                  rte_pktmbuf_free(cp->syn_mbuf);
                  cp->syn_mbuf = NULL;
                  sp_dbg_stats32_dec(sp_syn_saved);
              }
              //在全局 ack_mbuf 链表中删除自己的 ack_mbuf 引用
              if (list_empty(&cp->ack_mbuf))
              {
                  /*
                   * FIXME: Maybe a bug here, print err msg and go.
                   * Attention: cp->state has been changed and we
                   * should still DROP the syn/ack mbuf.
                   */
                  RTE_LOG(ERR, IPVS, "%s: got ack_mbuf NULL pointer: ack-saved = %u\\n",
                          __func__, cp->ack_num);
                  *verdict = INET_DROP;
                  return(0);
              }
      
              /* Window size has been set to zero in the syn-ack packet to Client.
               * If get more than one ack packet here,
               * it means client has sent a window probe after one RTO.
               * The probe will be forward to RS and RS will respond a window update.
               * So DPVS has no need to send a window update.
               */
              //设置窗口
              if (cp->ack_num == 1)
              {
                  syn_proxy_send_window_update(tuplehash_out(cp).af, mbuf, cp, pp, th);
              }
      
              list_for_each_entry_safe(tmbuf, tmbuf2, &cp->ack_mbuf, list)
              {
                  list_del_init(&tmbuf->list);
                  cp->ack_num--;
                  list_add_tail(&tmbuf->list, &save_mbuf);
              }
              assert(cp->ack_num == 0);
      				//调用packet_xmit将缓存发送至rs侧的数据包发送至rs,其中包括第三次握手的ack数据包
              list_for_each_entry_safe(tmbuf, tmbuf2, &save_mbuf, list)
              {
                  list_del_init(&tmbuf->list);
                  /* syn_mbuf will be freed correctly if xmit failed */
                  //调用packet_xmit将其发送至rs
                  cp->packet_xmit(pp, cp, tmbuf->mbuf);
                  /* free dp_vs_synproxy_ack_pakcet */
                  rte_mempool_put(this_ack_mbufpool, tmbuf);
                  sp_dbg_stats32_dec(sp_ack_saved);
              }
              //这个ack连接数据报不需要发送给client侧,所以此处返回drop
              *verdict = INET_DROP;
              return(0);
          }
          else if ((th->rst) &&
                   (cp->flags & DPVS_CONN_F_SYNPROXY) &&
                   (cp->state == DPVS_TCP_S_SYN_SENT))
          {
              RTE_LOG(DEBUG, IPVS, "%s: get rst from rs, seq = %u ack_seq = %u\\n",
                      __func__, ntohl(th->seq), ntohl(th->ack_seq));
      
              /* Count the delta of seq */
              //如果是rst包,设置连接状态为DPVS_TCP_S_CLOSE
              cp->syn_proxy_seq.delta = ntohl(cp->syn_proxy_seq.isn) - ntohl(th->seq);
              cp->state          = DPVS_TCP_S_CLOSE;
              cp->timeout.tv_sec = pp->timeout_table[cp->state];
              dpvs_time_rand_delay(&cp->timeout, 1000000);
              th->seq = htonl(ntohl(th->seq) + 1);
              //syn_proxy_seq_csum_update ?
      
              return(1);
          }
          return(1);
      }
      
    展开全文
  • ASP.NET MVC 中为什么需要使用异步呢? IIS有一个线程池来处理用户的请求,当一个新的请求...并且该线程不能对另一个请求提供服务,如果请求队列已满,则 Web 服务器会拒绝请求并处于 HTTP 503繁忙状态。如果是处...

    ASP.NET MVC 中为什么需要使用异步呢?

    IIS有一个线程池来处理用户的请求,当一个新的请求过来时,将调度池中的线程以处理该请求,然而,但并发量很高的情况下,池中的线程已经不能够满足这么多的请求时候,池中的每一个线程都处于忙的状态则在处理请求时将阻塞处理请求的线程,并且该线程不能对另一个请求提供服务,如果请求队列已满,则 Web 服务器会拒绝请求并处于 HTTP 503繁忙状态。如果是处理一些高延迟,例如网络操作,这样的线程大多数只是等待状态大部分时间是不做任何事情的,这样的线程就可以使用异步编程更好的充分利用。

    例如:如果某个请求生成一个需要两秒钟来完成的网络调用,则该请求无论是同步执行还是异步执行都需要两秒钟。 但是,在异步调用的过程中,服务器在等待第一个请求完成的过程中不会阻塞对其他请求的响应。 因此,当有许多请求调用长时间运行的操作时,异步请求可以防止出现请求排队的情况。在.NET 4.5中最大线程池为 5000 .NET 4.5中也增加了 await与async关键字来简化异步编程。

    同步还是异步?

    通常,在满足以下条件时使用同步:

    • 操作很简单或运行时间很短。

    • 简单性比效率更重要。

    • 此操作主要是 CPU 操作而不是包含大量的磁盘或网络开销的操作。 对 CPU 绑定操作使用异步操作方法未提供任何好处并且还导致更多的开销。

    通常,在满足以下条件时使用异步:

    • 操作是网络绑定的或 I/O 绑定的而不是 CPU 绑定的。

    • 测试显示阻塞操作对于网站性能是一个瓶颈,并且通过对这些阻塞调用使用异步操作方法,IIS 可对更多的请求提供服务。

    • 并行性比代码的简单性更重要。

    • 您希望提供一种可让用户取消长时间运行的请求的机制。

    展开全文
  • TCP连接的建立(二)

    千次阅读 2014-11-17 22:46:37
    SYN cookies技术就可以使服务器在半连接队列已满的情况下仍能处理新的SYN请求。 当半连接队列满时,SYN cookies并不丢弃SYN请求,而是通过加密技术来标识半连接状态。在TCP实现中,当收到客户端的SY
  • jboss ajp connector 属性

    2012-06-11 15:22:13
    队列已满时,任何收到的请求都将被拒绝。缺省值为 10. address 对于有多个IP地址的服务器,该属性设置使用哪个地址监听指定的端口。缺省情况下,设置的端口用于绑定服务器的所有IP地址。设置为127.0.0.1表示...
  • 如果没有关闭应用程序池,那就要看看是否请求到达时应用程序池队列已满。每个网站都有其最大的负载量,当访问请求达到这个值的时候就会出现503错误,解决办法可以加大请求队列,默认值为1000。 还有一种可能
  • HTTP connector使用sendfile处理大的静态文件(所有这些大文件都使用高性能的内核级调用通过异步方式发送),使用socket poller实现keepalive,提升服务器的...当队列已满时,任何收到的请求都将被拒绝。缺省值为 10.
  • 这个时候问题就产生了:如果此时的服务器端的排队队列已满服务器资源正处于忙碌的状态,那么该请求会驻留在服务器的线程中,换句话说,这个新产生的请求并不会对服务器端产生真正的负载,但很遗憾的是,该请求的...
  • loadrunner中的响应时间

    2018-03-12 09:40:53
    这个时候问题就产生了:如果此时的服务器端的排队队列已满服务器资源正处于忙碌的状态,那么该请求会驻留在服务器的线程中,换句话说,这个新产生的请求并不会对服务器端产生真正的负载,但很遗憾的是,该请求的...
  • 10.6.16错误响应 无效句柄 请求中的句柄是无效的,它在服务器中不存在 不允许读取 ... 准备队列已满 属性不存在 属性非大对象 密钥长度不足 属性值长度无效 ...
  • ChannelOption用到的socket的标准参数

    千次阅读 2016-12-05 11:28:00
    BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。 ChannelOption....
  • BACKLOG用于构造ServerSocket对象,标识当前服务器请求处理线程全部时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或者设置值为小于1,则java将会使用默认值50。 ChannelOption.SO_...
  • socket的标准参数

    2017-09-02 15:06:25
    BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。 Channel...
  • BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。  ChannelOption.SO_...
  • Java netty的option(ChannelOption.SO_BACKLOG, backLog)什么意思

    万次阅读 多人点赞 2017-06-04 16:55:02
    BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。 Cha
  • netty的option

    2017-08-23 08:13:53
    BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。 ChannelOption....
  • ChannelOption属性说明

    2017-11-20 18:58:22
    BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。 ChannelOp
  • 这个都是socket的标准参数...BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。
  • ChannelOption.SO_BACKLOG,1024BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用...
  • jdk提供的线程池,当核心线程数已满,但是最大线程池数未满,来一个任务时,会先将任务加入阻塞队列队列满之后才会创建线程来处理任务,这是比较适合cpu密集型任务的,但是像tomcat这种服务器程序就不太适合这种...
  • BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。 ChannelOption.SO_...
  • BACKLOG用于构造服务端套接字ServerSocket对象,标识当服务器请求处理线程全时,用于临时存放完成三次握手的请求的队列的最大长度。如果未设置或所设置的值小于1,Java将使用默认值50。 2NIO server端配置 //...

空空如也

空空如也

1 2 3
收藏数 47
精华内容 18
关键字:

服务器请求队列已满