epoll 惊群效应实测

惊群效应

惊群简单来说就是多个进程或者线程在等待同一个事件,当事件发生时,所有线程和进程都会被内核唤醒。唤醒后通常只有一个进程获得了该事件并进行处理,其他进程发现获取事件失败后又继续进入了等待状态,在一定程度上降低了系统性能。
常见的惊群问题有两种:
Accept惊群问题,多个accept的进程同时被唤醒,该问题已于 linux2.6 解决,本文不再讨论
Epoll惊群问题,虽然accept惊群问题已被内核解决,但epoll仍旧会触发fd的可读状态,触发读事件

epoll 惊群测试

测试思路

  • 主进程创建socket
  • 从进程通过把该socket注册为epoll的可读事件,需要在fork之后创建epoll,否则多个进程会公用同一个epoll,进程不能识别其他进程产生的fd
  • 注册listen fd的可读状态,并触发accept
  • 观察输出信息
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
#include <sys/epoll.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <fcntl.h>


const int MAX_PROC_NUM = 4;
const int MAX_EVENTS = 128;
const int PORT = 8081;
const int BUF_SIZE = 1024;
const int CLIENT_SIZE = 128;
int g_pids[MAX_PROC_NUM] = {0};

void sig_handler(int signo) {
    int i;
    for (i = 0; i < MAX_PROC_NUM; i++) {
        kill(g_pids[i], SIGKILL);
    }
}


int child_procedure(int listenfd) {
    struct epoll_event ev;
    struct epoll_event events[MAX_EVENTS];
    int epfd = 0;
    int pid = 0;
    char buf[BUF_SIZE];
    int cnt = 0;
    int i = 0;

    pid = getpid();

    epfd = epoll_create(CLIENT_SIZE + 1);
    if (0 == epfd) {
        printf("Create epoll failed\n");
        return -1;
    }

    ev.events = EPOLLIN | EPOLLET;
    ev.data.fd = listenfd;

    if (epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &ev) < 0) {
        printf("Add ev failed\n");
        return -1; 
    }   

    printf("Epoll init finished, process pid: %d\n", pid);

    while (1) {
        cnt = epoll_wait(epfd, events, MAX_EVENTS, 0);
        if (cnt <= 0) {
            continue;
        }

        for (i = 0; i < cnt; i++) {
            if (events[i].data.fd == listenfd) {
                // 新连接请求
                int newfd;
                printf("Process %d receive a connection request\n", pid);
                newfd = accept(listenfd, NULL, 0);
                if(newfd <=0) {
                    printf("Process %d accept failed\n", pid);
                    continue;
                }
                fcntl(newfd, F_SETFL, fcntl(newfd, F_GETFD, 0)|O_NONBLOCK); 
                ev.events = EPOLLIN | EPOLLET;
                ev.data.fd = newfd;
                epoll_ctl(epfd, EPOLL_CTL_ADD, newfd, &ev);

            } else {
                int n = read(events[i].data.fd, buf, 1024);
                printf("Process %d receive a msg, length %d\n", pid, n);
                if (n != 0) {
                    write(events[i].data.fd, buf, n);
                }
                close(events[i].data.fd);
                epoll_ctl(epfd, EPOLL_CTL_DEL, events[i].data.fd, NULL);
            }

        }

    }
    return 0;
}

int main()
{
    int listenfd = 0;
    int cnt = 0;
    int i = 0;
    struct sockaddr_in servaddr;
    
    // signal(SIGINT, sig_handler);
    // signal(SIGKILL, sig_handler);

    listenfd = socket(AF_INET, SOCK_STREAM, 0);

    servaddr.sin_family = AF_INET;
    servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    servaddr.sin_port = htons(PORT);
   
    fcntl(listenfd, F_SETFL, fcntl(listenfd, F_GETFD, 0)|O_NONBLOCK);
    
    if (bind(listenfd, (struct sockaddr *) &servaddr, sizeof(struct sockaddr)) == -1) {
        printf("bind error\n");
        return -1;
    }

    if (listen(listenfd, CLIENT_SIZE) == -1) {
        printf("Listen failed\n");
        return -1;
    }

    for (i = 0; i < MAX_PROC_NUM; i++) {
        g_pids[i] = fork();
        if (0 == g_pids[i]) {
            // 子进程
            break;
        }
    }

    if (i == MAX_PROC_NUM) {
        // 注册信号,父进程退出,子进程一起kill掉
        signal(SIGINT, sig_handler);
        signal(SIGKILL, sig_handler);

        // 父进程,阻塞
        while(1) {
            sleep(100);
        }
    } else {
        // 子进程,进入子进程流程
        return child_procedure(listenfd);
    }
    
    return 0;

}



测试结果

Epoll init finished, process pid: 20628
Epoll init finished, process pid: 20629
Epoll init finished, process pid: 20630
Epoll init finished, process pid: 20631
Process 20630 receive a connection request
Process 20631 receive a connection request
Process 20628 receive a connection request
Process 20629 receive a connection request
Process 20630 receive a msg, length 169
Process 20631 accept failed
Process 20628 accept failed
Process 20629 accept failed

根据结果显示,fork了4个进程,4个进程都收到了可读事件导致被唤醒,实际只能有1个进程accept该fd

惊群问题改进

同一个listen fd添加到多个epoll中,其中一个典型应用就是nginx,nginx增加了一把锁,同一时刻只有一个进程在wait状态,这样就保证了同一个可读事件不会触发给多个进程,为了减少加锁的时间,采用先将事件放入队列,处理完accept立即解锁,收发包并不占用这个全局锁,这把锁不单单用于解决惊群问题,还是进程间负载均衡重要的一环
详细可参考:https://blog.csdn.net/initphp/article/details/52266844

linux ext2、ext3无损升级ext4的方法

英文参考文章:https://www.ghacks.net/2010/08/11/convert-ext23-to-ext4/

注意备份,注意备份,注意备份

勤快的可以看下英文原文

1. 前提条件:
kernel版本2.6.28-11以上,查看方法:

uname -a

2. 停用磁盘:

umount -l /dev/sda1

如果系统磁盘,需要使用liveCD

3. convert
ext2->ext4

tune2fs -O extents,uninit_bg,dir_index,has_journal /dev/sda1

ext3->ext4

tune2fs -O extents,uninit_bg,dir_index /dev/sda1

如果执行失败,提示busy的话,有进程占用,fsck /dev/sda1,如果不可以,注释掉/etc/fstab中这个分区的mount,重启

4. 磁盘检查
e2fsck -pf /dev/sda1
如果报错,看提示,有可能提示手动执行,那么就执行e2fsck -f /dev/sda1,一路y

5. 修改/etc/fstab修改/dev/sda1的格式

6. refresh grub,引用原文,我改的/home目录使用的分区,不用这一步,自己理解这个
Now you need to refresh grub. Depending upon how your boot partition is will determine how you do this. If your boot partition is SEPARATE, do the following:

sudo bash
mkdir /mnt/boot
mount /dev/sda1 /mnt/boot
grub-install /dev/sda –root-directory=/mnt –recheck

If your boot partition is NOT separate, do the following:

sudo bash
mount /dev/sda1 /mnt
grub-install /dev/sda –root-directory=/mnt –recheck

golang报错lfstackPack redeclared in this block解决办法

/usr/local/go/src/runtime/lfstack_amd64.go:16: lfstackPack redeclared in this block
previous declaration at /usr/local/go/src/runtime/lfstack_64bit.go:37
/usr/local/go/src/runtime/lfstack_amd64.go:20: lfstackUnpack redeclared in this block
previous declaration at /usr/local/go/src/runtime/lfstack_64bit.go:41
/usr/local/go/src/runtime/os_linux_generic.go:13: _SS_DISABLE redeclared in this block
previous declaration at /usr/local/go/src/runtime/os2_linux_generic.go:12
/usr/local/go/src/runtime/os_linux_generic.go:14: _NSIG redeclared in this block
previous declaration at /usr/local/go/src/runtime/os2_linux_generic.go:13
/usr/local/go/src/runtime/os_linux_generic.go:15: _SI_USER redeclared in this block
previous declaration at /usr/local/go/src/runtime/os2_linux_generic.go:14
/usr/local/go/src/runtime/os_linux_generic.go:16: _SIG_BLOCK redeclared in this block
previous declaration at /usr/local/go/src/runtime/os2_linux_generic.go:15
/usr/local/go/src/runtime/os_linux_generic.go:17: _SIG_UNBLOCK redeclared in this block
previous declaration at /usr/local/go/src/runtime/os2_linux_generic.go:16
/usr/local/go/src/runtime/os_linux_generic.go:18: _SIG_SETMASK redeclared in this block
previous declaration at /usr/local/go/src/runtime/os2_linux_generic.go:17
/usr/local/go/src/runtime/os_linux_generic.go:19: _RLIMIT_AS redeclared in this block
previous declaration at /usr/local/go/src/runtime/os2_linux_generic.go:18
/usr/local/go/src/runtime/os_linux_generic.go:25: sigset redeclared in this block
previous declaration at /usr/local/go/src/runtime/os2_linux_generic.go:24
/usr/local/go/src/runtime/os_linux_generic.go:25: too many errors

这个错误目前还没发现根本原因,在segment fault找到的解决办法
rm -rf /usr/local/go
后重新解压压缩包
tar -C /usr/local -xzf go$VERSION.$OS-$ARCH.tar.gz

__thread,记一次不成功的socket连接

背景:公司某产品自动更新特征库需求,创建临时线程,通过socket建链并下载更新特征库,前期这个需求是同事做的,由于方案问题,做了一部分做不下去了抛给了我
基于openstack的底层重度封装产品,不同头文件引用,暴露到产品层的也有多套socket连接(呵呵)
某某头文件里是:

#ifndef socket
#define socket xx_socket
#endif

又或是某某头文件里面写的:

int socket(int domain, int type, int protocol);

引用到不同头文件,又是截然不同的结果,网络经过封装,底层看不到需要使用的逻辑IP, <sys/socket.h>中的socket行不通,需要使用xx封装的xx_socket,经过尝试后,当调用到xx_socket后发现进程出现段错误复位,调用栈中的函数代码不可见,排查入参无异常,disass反汇编该调用函数和其他已有函数,xx_socket跳转的地址相同,排除引用错误的原因。
会看调用栈,栈顶函数getCompCSI(),获取组件CSI编码,有可能是线程级变量存储,gdb断点组件流程开线程之前,p getCompCSI(),结果返回正常,进入临时线程中继续p,引发复位,问题定位,线程级变量在新开线程没有初始化原因导致。

__thread

引用gcc.gnu.org中的介绍:

5.48 Thread-Local Storage

Thread-local storage (TLS) is a mechanism by which variables are allocated such that there is one instance of the variable per extant thread. The run-time model GCC uses to implement this originates in the IA-64 processor-specific ABI, but has since been migrated to other processors as well. It requires significant support from the linker (ld), dynamic linker (ld.so), and system libraries (libc.so and libpthread.so), so it is not available everywhere.

At the user level, the extension is visible with a new storage class keyword: __thread. For example:

     __thread int i;
     extern __thread struct state s;
     static __thread char *p;

The __thread specifier may be used alone, with the extern or static specifiers, but with no other storage class specifier. When used with extern or static, __thread must appear immediately after the other storage class specifier.

The __thread specifier may be applied to any global, file-scoped static, function-scoped static, or static data member of a class. It may not be applied to block-scoped automatic or non-static data member.

When the address-of operator is applied to a thread-local variable, it is evaluated at run-time and returns the address of the current thread’s instance of that variable. An address so obtained may be used by any thread. When a thread terminates, any pointers to thread-local variables in that thread become invalid.

No static initialization may refer to the address of a thread-local variable.

In C++, if an initializer is present for a thread-local variable, it must be a constant-expression, as defined in 5.19.2 of the ANSI/ISO C++ standard.

一个简单的用例,自己跑下它:

#include <stdio.h>
#include <pthread.h>

__thread int iTestVal = 1;

void* thread1(void *arg)
{
        printf("thread1, iTestVal = %d\n", iTestVal);
        iTestVal = 2;
        sleep(1);
        printf("thread1, after delay iTestVal = %d\n", iTestVal);
}

void* thread2(void *arg)
{
        printf("thread2, iTestVal = %d\n", iTestVal);
        sleep(1);
        printf("thread2, after delay iTestVal = %d\n", iTestVal);
}


int main()
{
        pthread_t pid1, pid2;
        pthread_create(&pid1, NULL, thread1, NULL);
        pthread_create(&pid2, NULL, thread2, NULL);
        pthread_join(pid1, NULL);
        pthread_join(pid2, NULL);

        return 0;
}

结果:

[root@TubbyFlashy-VM test]# gcc -lpthread main.c 
[root@TubbyFlashy-VM test]# ./a.out 
thread2, iTestVal = 1
thread1, iTestVal = 1
thread2, after delay iTestVal = 1
thread1, after delay iTestVal = 2

gitlab 的备份还原与迁移

gitlab 的备份还原与迁移

本文章适合于一键安装的gitlab,文章转载自 段错误 http://segmentfault.com/blog/venmos/1190000002439923

Gitlab 创建备份

使用Gitlab一键安装包安装Gitlab非常简单, 同样的备份恢复与迁移也非常简单. 使用一条命令即可创建完整的Gitlab备份:

gitlab-rake gitlab:backup:create
使用以上命令会在/var/opt/gitlab/backups目录下创建一个名称类似为1393513186_gitlab_backup.tar的压缩包, 这个压缩包就是Gitlab整个的完整部分, 其中开头的1393513186是备份创建的日期.

Gitlab 修改备份文件默认目录

你也可以通过修改/etc/gitlab/gitlab.rb来修改默认存放备份文件的目录:

gitlab_rails['backup_path'] = '/mnt/backups'
/mnt/backups修改为你想存放备份的目录即可, 修改完成之后使用gitlab-ctl reconfigure命令重载配置文件即可.

Gitlab 自动备份

也可以通过crontab使用备份命令实现自动备份:

sudo su -
crontab -e

加入以下, 实现每天凌晨2点进行一次自动备份:

0 2 * * * /opt/gitlab/bin/gitlab-rake gitlab:backup:create
Gitlab 恢复

同样, Gitlab的从备份恢复也非常简单:

# 停止相关数据连接服务
gitlab-ctl stop unicorn
gitlab-ctl stop sidekiq

# 从1393513186编号备份中恢复
gitlab-rake gitlab:backup:restore BACKUP=1393513186

# 启动Gitlab
sudo gitlab-ctl start
Gitlab迁移

迁移如同备份与恢复的步骤一样, 只需要将老服务器/var/opt/gitlab/backups目录下的备份文件拷贝到新服务器上的/var/opt/gitlab/backups即可(如果你没修改过默认备份目录的话). 但是需要注意的是新服务器上的Gitlab的版本必须与创建备份时的Gitlab版本号相同. 比如新服务器安装的是最新的7.60版本的Gitlab, 那么迁移之前, 最好将老服务器的Gitlab 升级为7.60在进行备份.

其他

最新版本的Gitlab已经修复了HTTPS设备的BUG, 现在使用官方HTTPS配置即可轻松启用HTTPS.

Centos7 安装openvpn

下载OPENVPN

wget http://swupdate.openvpn.org/as/openvpn-as-2.0.10-CentOS7.x86_64.rpm

安装

rpm -Uvh openvpn-as-2.0.10-CentOS7.x86_64.rpm  

看到成功信息,更改openvpn管理密码 ,

passwd openvpn

进入

https://IP:943/admin 

管理地址

DELL PowerEdge 1950服务器升级更新BIOS

DELL服务器提供了一大堆相关软件及光盘映像ISO,SUU,DSUU等一堆东西,令人眼花缭乱,谷歌了几十篇相关更新BIOS的方法,试了几种,失败了几种,最终在一种方法中成功了

DELL PowerEdge其他系列服务器同样适用,其他的可能更简单些,因为1950在官方source找不到了,只能手动添加固件。

软件:(Data Center Version) Dell Repository Manager v2.0.0 简称DRM

到Support.dell.com 下载.BIN驱动文件和.BIN.sign文件 ,主板BIOS,raid固件等同样适用

打开DRM ,Data Center的快捷方式

1

debian建立及配置Apache2+svn和403Forbidden的解决方法

建立及配置参考以下文章,403forbidden问题见文章末尾

Debian Linux Apache2 + SVN 配置

作者: reistlin

来源: http://www.reistlin.com/blog/195
更新时间: 2009.12
版权声明: 原创文章.转载请保留作者信息和原文完整.谢绝任何方式的摘要

debian.gif

一,环境:

Debian Liunx(5.x / 6.x) + apache2(2.2.16)+ Subversion(1.6.12)

二,配置:

1,apt-get 安装 apache2, openssl(https) 和 svn

reistlin:~# apt-get install apache2 apache2-mpm-worker openssl subversion libapache2-svn

2,新建 svn 目录(/home/svn),配置目录所有者(www-data)以及权限

reistlin:~# mkdir /home/svn
reistlin:~# chown www-data:www-data -R /home/svn/
reistlin:~# chmod 770 -R /home/svn/

3,创建 svn 用户密码配置文件:/etc/apache2/dav_svn.passwd

reistlin:~# /usr/bin/htpasswd -c /etc/apache2/dav_svn.passwd admin
New password:
Re-type new password:
Adding password for user admin

密码文件默认加密方法:CRYPT encryption,密码文件格式:用户名:密码
基于安全考虑,建议加密方法使用 SHA encryption:htpasswd -s 用户名

reistlin:~# cat /etc/apache2/dav_svn.passwd
 
admin:{SHA}0DPiKuNIrrVmD8IUCuw1hQxNqZc=
reistlin:{SHA}QL0AFWMIX8NRZTKeof9cXsvbvu8=
test1:{SHA}qUqP5cyxm6YcTAhz05Hph5gvu9M=
test2:{SHA}Y2fEjdGT1W6nsLqtJbGUVeUp9e4=

4,创建 svn 目录权限配置文件:/etc/apache2/dav_svn.authz

reistlin:~# cat /etc/apache2/dav_svn.authz
 
[groups]
admin=admin,reistlin
guest=test1,test2
 
[reistlin:/]    # 版本库 reistlin 权限配置
*=              # 默认禁止所有用户访问
@admin=rw       # admin 组有 rw 权限
test1=r         # 用户 test1 有 r 权限

5,配置 /etc/apache2/mods-available/dav_svn.conf

reistlin:~# vim /etc/apache2/mods-available/dav_svn.conf

创建 svn location,指定 svn 目录,认证方式,认证信息;
指定 dav_svn.passwd 用户密码配置文件路径;
指定 dav_svn.authz 目录权限配置文件路径。

<Location /svn>
  DAV svn
  SVNParentPath /home/svn
  AuthType Basic
  AuthName "Subversion Repository"
  AuthUserFile /etc/apache2/dav_svn.passwd
  AuthzSVNAccessFile /etc/apache2/dav_svn.authz
  Require valid-user
</Location>

6,创建 svn 版本库(reistlin)

reistlin:~# su - www-data
reistlin:~$ svnadmin create /home/svn/reistlin

7,配置完成,重新启动 apache2 服务

reistlin:~$ su - root
reistlin:~# /etc/init.d/apache2 restart

8,启动浏览器,访问 http://localhost/svn/reistlin

svn_http.gif

三,管理:

1,新建用户(htpasswd SHA 加密方法,参数:-s)

reistlin:~$ sudo /usr/bin/htpasswd -s /etc/apache2/dav_svn.passwd 用户名

2,删除用户(vi/vim 编辑)

reistlin:~$ sudo vim /etc/apache2/dav_svn.passwd

查找指定用户名:/用户名
删除用户指定行:dd
保存退出::wq

3,Subversion 客户端

[TortoiseSVN](开源软件:支持英文/简体中文/繁体中文)

403Forbidden的主要原因

403的出现是由于权限不正确造成的,配置只需按照以上文章操作即可,

关于 /etc/apache2/dav_svn.authz 的配置

以原文中

[groups]
admin=admin,reistlin
guest=test1,test2
 
[reistlin:/]    # 版本库 reistlin 权限配置
*=              # 默认禁止所有用户访问
@admin=rw       # admin 组有 rw 权限
test1=r         # 用户 test1 有 r 权限
为例,dav_svn.passwd 一定要有 admin,reistlin,test1,test2几个用户,以及下面的@admin组一定要在groups中存在,我的就是由于authz 写的并不严谨,上下并不对应造成的403Forbidden,重新写了authz文件即可

Centos安装pptp vpn软件,附阿里云解决方法

家里的网络是网通的,而朋友和学校的网络是电信的,而网络加速器要收费,正好手里面有个阿里云的vps,为了省网游加速器的钱,所以就自己折腾了下vpn,通过vpn登陆游戏确实能起到网游加速器的效果

 

约定:
以下中,蓝色部分是执行命令红色部分是配置文件修改或添加的内容

环境:
CentOS 6.0 i386 (32位) 版本

[root@www.linuxyw.com ~]# cat /etc/redhat-release 
CentOS Linux release 6.0 (Final)
[root@www.linuxyw.com ~]# uname -a
Linux host-224160 2.6.32-71.el6.i686 #1 SMP Fri Nov 12 04:17:17 GMT 2010 i686 i686 i386 GNU/Linux

软件:
配置vpn服务需要安装的软件包有4个
dkms-2.2.0.3-1.fc16.noarch
kernel-2.6.32-71.el6.i686
ppp-2.4.5-5.el6.i686
pptpd-1.3.4-2.el6.i686