nmap os detection原理及golang实现

操作系统识别

造轮子也是一次深入理解它原理的过程，造完轮子后感觉到所有代码尽在我掌握之中，之后大规模扫描测试就可以以最有效率，发最小包，绕过防火墙的方式进行集成，也能轻易的进行扩展。

成果图，实现了对一个主机在5s内能够识别出其操作系统，能预测开机时间(部分系统有误差),用无状态扫描技术去识别大量主机的操作系统时，平均时间会更低。

造轮子其实是非常不容易的，虽然nmap是开源项目，但与操作系统识别有关的代码就超过6000行，nmap官网有文档专门有一节讲述了nmap操作系统指纹的技术实现 https://nmap.org/book/osdetect.html ，它们都不太容易看懂，我是文档和源码交互着看了几十遍才大概明白整个流程。

基本原理

Nmap OS 指纹识别通过向目标机器的Open和Close 端口发送多达 16 个 TCP、UDP 和 ICMP 探针来工作

其中 TCP探针13个，UDP探针 1 个，ICMP探针2个。

osscan.ccosscan2.cc

osscan.cc：主要负责os指纹的解析、对比函数，可直接看如下的函数定义。

/* Parses a single fingerprint from the memory region given.  If a
 non-null fingerprint is returned, the user is in charge of freeing it
 when done.  This function does not require the fingerprint to be 100%
 complete since it is used by scripts such as scripts/fingerwatch for
 which some partial fingerprints are OK. */
FingerPrint *parse_single_fingerprint(const char *fprint_orig);

/* These functions take a file/db name and open+parse it, returning an
   (allocated) FingerPrintDB containing the results.  They exit with
   an error message in the case of error. */
FingerPrintDB *parse_fingerprint_file(const char *fname);

/* Compares 2 fingerprints -- a referenceFP (can have expression
   attributes) with an observed fingerprint (no expressions).  If
   verbose is nonzero, differences will be printed.  The comparison
   accuracy (between 0 and 1) is returned).  MatchPoints is
   a special "fingerprints" which tells how many points each test is worth. */
double compare_fingerprints(const FingerPrint *referenceFP, const FingerPrint *observedFP,
                            const FingerPrintDef *MatchPoints, int verbose);

osscan2.cc：主要负责发送探针以及根据返回内容生成指纹，下面是定义的一些主要函数，send开头为相关探针的发送函数，process为中间处理指纹函数，make为最终生成指纹函数。

  /* Probe send functions. */
  void sendTSeqProbe(HostOsScanStats *hss, int probeNo);
  void sendTOpsProbe(HostOsScanStats *hss, int probeNo);
  void sendTEcnProbe(HostOsScanStats *hss);
  void sendT1_7Probe(HostOsScanStats *hss, int probeNo);
  void sendTUdpProbe(HostOsScanStats *hss, int probeNo);
  void sendTIcmpProbe(HostOsScanStats *hss, int probeNo);
  /* Response process functions. */
  bool processTSeqResp(HostOsScanStats *hss, const struct ip *ip, int replyNo);
  bool processTOpsResp(HostOsScanStats *hss, const struct tcp_hdr *tcp, int replyNo);
  bool processTWinResp(HostOsScanStats *hss, const struct tcp_hdr *tcp, int replyNo);
  bool processTEcnResp(HostOsScanStats *hss, const struct ip *ip);
  bool processT1_7Resp(HostOsScanStats *hss, const struct ip *ip, int replyNo);
  bool processTUdpResp(HostOsScanStats *hss, const struct ip *ip);
  bool processTIcmpResp(HostOsScanStats *hss, const struct ip *ip, int replyNo);

  /* Generic sending functions used by the above probe functions. */
  int send_tcp_probe(HostOsScanStats *hss,
                     int ttl, bool df, u8* ipopt, int ipoptlen,
                     u16 sport, u16 dport, u32 seq, u32 ack,
                     u8 reserved, u8 flags, u16 window, u16 urp,
                     u8 *options, int optlen,
                     char *data, u16 datalen);
  int send_icmp_echo_probe(HostOsScanStats *hss,
                           u8 tos, bool df, u8 pcode,
                           unsigned short id, u16 seq, u16 datalen);
  int send_closedudp_probe(HostOsScanStats *hss,
                           int ttl, u16 sport, u16 dport);

  void makeTSeqFP(HostOsScanStats *hss);
  void makeTOpsFP(HostOsScanStats *hss);
  void makeTWinFP(HostOsScanStats *hss);

nmap将指纹分为以下几类

SEQ：基于探针进行序列分析的指纹结果
OPS：基于探针接受到的TCP选项
WIN：基于探针接受到的响应窗口大小 (TCP Windows Size)
T系列：基于探针响应的TCP数据包各种测试值的结果
ECN：ECN探针返回结果
ECN 是一种通过允许路由器在开始不得不丢弃数据包之前发出拥塞问题信号来提高 Internet 性能的方法。它记录在RFC 3168中.
当生成许多包通过路由器时会导致其负载变大，这称之为拥塞。其结果就是系统会变慢以降低拥堵，以便路由器不会发生丢包。这个包仅为了得到目标系统的响应而发送。因为不同的操作系统以不同的方式处理这个包，所以返回的特定值可以用来判断操作系统。
IE：ICMP响应的数据包测试值结果
U1：UDP响应的数据包测试值结果

nmap OS探测时，会向目标主机的一个Open状态TCP端口，一个Close状态TCP端口，一个关闭的UDP端口发送数据包，以及一个ICMP数据包。

os_scan_ipv4

/* Performs the OS detection for IPv4 hosts. This method should not be called
 * directly. os_scan() should be used instead, as it handles chunking so
 * you don't do too many targets in parallel */
int OSScan::os_scan_ipv4(std::vector &Targets) {
  ...

  /* Initialize the pcap session handler in HOS */
  begin_sniffer(&HOS, Targets);
  
  ...
  // 准备测试，删除旧信息，初始化变量
  startRound(&OSI, &HOS, itry);
  // 执行顺序产生测试（发送6个TCP探测包，每隔100ms一个）
  doSeqTests(&OSI, &HOS);
  // 执行TCP、UDP、ICMP探测包测试
  doTUITests(&OSI, &HOS);
  // 对结果做指纹对比，获取OS扫描信息
  endRound(&OSI, &HOS, itry);
  // 将超时未匹配的主机移动
  expireUnmatchedHosts(&OSI, &unMatchedHosts);
  }

TCP探针

TCP将发送13个数据包，这些数据包分为三类，一类是对tcp option的测试，一类是对tcp/ip 其他字段的测试，最后是ECN测试。

可以直接看nmap构造tcp option数据包源码

/* 8 options:
 *  0~5: six options for SEQ/OPS/WIN/T1 probes.
 *  6:   ECN probe.
 *  7-12:   T2~T7 probes.
 *
 * option 0: WScale (10), Nop, MSS (1460), Timestamp, SackP
 * option 1: MSS (1400), WScale (0), SackP, T(0xFFFFFFFF,0x0), EOL
 * option 2: T(0xFFFFFFFF, 0x0), Nop, Nop, WScale (5), Nop, MSS (640)
 * option 3: SackP, T(0xFFFFFFFF,0x0), WScale (10), EOL
 * option 4: MSS (536), SackP, T(0xFFFFFFFF,0x0), WScale (10), EOL
 * option 5: MSS (265), SackP, T(0xFFFFFFFF,0x0)
 * option 6: WScale (10), Nop, MSS (1460), SackP, Nop, Nop
 * option 7-11: WScale (10), Nop, MSS (265), T(0xFFFFFFFF,0x0), SackP
 * option 12: WScale (15), Nop, MSS (265), T(0xFFFFFFFF,0x0), SackP
 */
static struct {
  u8* val;
  int len;
} prbOpts[] = {
  {(u8*) "\x03\x03\x0A\x01\x02\x04\x05\xb4\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x04\x02", 20},
  {(u8*) "\x02\x04\x05\x78\x03\x03\x00\x04\x02\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x00", 20},
  {(u8*) "\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x01\x01\x03\x03\x05\x01\x02\x04\x02\x80", 20},
  {(u8*) "\x04\x02\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x03\x03\x0A\x00", 16},
  {(u8*) "\x02\x04\x02\x18\x04\x02\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x03\x03\x0A\x00", 20},
  {(u8*) "\x02\x04\x01\x09\x04\x02\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00", 16},
  {(u8*) "\x03\x03\x0A\x01\x02\x04\x05\xb4\x04\x02\x01\x01", 12},
  {(u8*) "\x03\x03\x0A\x01\x02\x04\x01\x09\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x04\x02", 20},
  {(u8*) "\x03\x03\x0A\x01\x02\x04\x01\x09\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x04\x02", 20},
  {(u8*) "\x03\x03\x0A\x01\x02\x04\x01\x09\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x04\x02", 20},
  {(u8*) "\x03\x03\x0A\x01\x02\x04\x01\x09\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x04\x02", 20},
  {(u8*) "\x03\x03\x0A\x01\x02\x04\x01\x09\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x04\x02", 20},
  {(u8*) "\x03\x03\x0f\x01\x02\x04\x01\x09\x08\x0A\xff\xff\xff\xff\x00\x00\x00\x00\x04\x02", 20}
};

发包每次的TCP Windows窗口大小也不一样

/* TCP Window sizes. Numbering is the same as for prbOpts[] */
u16 prbWindowSz[] = { 1, 63, 4, 4, 16, 512, 3, 128, 256, 1024, 31337, 32768, 65535 };

前六个数据包为 SEQ/OPS/WIN/T1 探针，第7个为ECN探针，后面的是T探针。

TH_CWR|TH_ECE|TH_SYNUrgent

如果nmap未找到关闭的TCP端口，将随机取值

 closedTCPPort = (get_random_uint() % 14781) + 30000;

ICMP探针

发送两个 ICMP 探针。

ICMPv4CodeNetAdminProhibited0x00

IP_TOS_RELIABILITY

UDP探针

0x10420x43

如果nmap未找到关闭的UDP端口，将随机取值

closedUDPPort = (get_random_uint() % 14781) + 30000;

指纹生成

这是造轮子过程中最麻烦的部分，需要将这些指纹结果一一实现。

GCD

tcp前六个探测包中，tcp seq数值的差值作为一个数组，这个数组及有5个元素。取这个数组的最大公约数。

ISR

取探针返回包 SEQ的差除以发送时间的毫秒差即 SEQ的发送速率，再得出探针每个速率的平均值即seq_rate，最后通过一个公式得出最后的值即ISR。

seq_rate = log(seq_rate) / log(2.0);
seq_rate = (unsigned int) (seq_rate * 8 + 0.5);

SP

代码和文档都难懂，弄一个简化版的代码就看懂了

seq_stddev = 0
for i =0;i-1;i++{
  seq_stddev += （（SEQ[i]的发送速率 - SEQ平均速率）  / GCD 最大公约数）的平方
}
seq_stddev  /= responseNum-2
seq_stddev = sqrt(seq_stddev);
seq_stddev = log(seq_stddev) / log(2.0);
sp = (int) (seq_stddev * 8 + 0.5);

以我仅学过的线性代数，这个可以弄成一个公式的，但是我不会在markdown上展示，算了~

TICIII

从TCP Open端口，Tcp Close端口，ICMP协议中，计算ip id的生成算法

ZRDIIRIRIBII

SS

根据前面推测出的IP ID增长方式，记录目标是否在 TCP 和 ICMP 协议之间共享其 IP ID 序列。

 /* SS: Shared IP ID sequence boolean */
  if ((tcp_ipid_seqclass == IPID_SEQ_INCR ||
        tcp_ipid_seqclass == IPID_SEQ_BROKEN_INCR ||
        tcp_ipid_seqclass == IPID_SEQ_RPI) &&
       (icmp_ipid_seqclass == IPID_SEQ_INCR ||
        icmp_ipid_seqclass == IPID_SEQ_BROKEN_INCR ||
        icmp_ipid_seqclass == IPID_SEQ_RPI)) {
    /* Both are incremental. Thus we have "SS" test. Check if they
       are in the same sequence. */
    u32 avg = (hss->ipid.tcp_ipids[good_tcp_ipid_num - 1] - hss->ipid.tcp_ipids[0]) / (good_tcp_ipid_num - 1);
    if (hss->ipid.icmp_ipids[0] < hss->ipid.tcp_ipids[good_tcp_ipid_num - 1] + 3 * avg) {
      test.setAVal("SS", "S");
    } else {
      test.setAVal("SS", "O");
    }
  }

TS

这个能预测出开机时间！

TSSEQ

TS

TSUTS00-5.6670-150``150-350``TSA

这个结果可以推断出开机时间

/* Now we look at TCP Timestamp sequence prediction */
  /* Battle plan:
     1) Compute average increments per second, and variance in incr. per second
     2) If any are 0, set to constant
     3) If variance is high, set to random incr. [ skip for now ]
     4) if ~10/second, set to appropriate thing
     5) Same with ~100/sec
  */
  if (hss->si.ts_seqclass == TS_SEQ_UNKNOWN && hss->si.responses >= 2) {
    time_t uptime = 0;
    avg_ts_hz = 0.0;
    for (i = 0; i < hss->si.responses - 1; i++) {
      double dhz;

      dhz = (double) ts_diffs[i] / (time_usec_diffs[i] / 1000000.0);
      /*       printf("ts incremented by %d in %li usec -- %fHZ", ts_diffs[i], time_usec_diffs[i], dhz); */
      avg_ts_hz += dhz / (hss->si.responses - 1);
    }

    if (avg_ts_hz > 0 && avg_ts_hz < 5.66) { /* relatively wide range because sampling time so short and frequency so slow */
      hss->si.ts_seqclass = TS_SEQ_2HZ;
      uptime = hss->si.timestamps[0] / 2;
    }
    else if (avg_ts_hz > 70 && avg_ts_hz < 150) {
      hss->si.ts_seqclass = TS_SEQ_100HZ;
      uptime = hss->si.timestamps[0] / 100;
    }
    else if (avg_ts_hz > 724 && avg_ts_hz < 1448) {
      hss->si.ts_seqclass = TS_SEQ_1000HZ;
      uptime = hss->si.timestamps[0] / 1000;
    }
    else if (avg_ts_hz > 0) {
      hss->si.ts_seqclass = TS_SEQ_OTHER_NUM;
      uptime = hss->si.timestamps[0] / (unsigned int)(0.5 + avg_ts_hz);
    }

    if (uptime > 63072000) {
      /* Up 2 years?  Perhaps, but they're probably lying. */
      if (o.debugging) {
        /* long long is probably excessive for number of days, but sick of
         * truncation warnings and finding the right format string for time_t
         */
        log_write(LOG_STDOUT, "Ignoring claimed %s uptime of %lld days",
        hss->target->targetipstr(), (long long) (uptime / 86400));
      }
      uptime = 0;
    }
    hss->si.lastboot = hss->seq_send_times[0].tv_sec - uptime;
  }
  
  

switch (hss->si.ts_seqclass) {

  case TS_SEQ_ZERO:
    test.setAVal("TS", "0");
    break;
  case TS_SEQ_2HZ:
  case TS_SEQ_100HZ:
  case TS_SEQ_1000HZ:
  case TS_SEQ_OTHER_NUM:
    /* Here we "cheat" a little to make the classes correspond more
       closely to common real-life frequencies (particularly 100)
       which aren't powers of two. */
    if (avg_ts_hz <= 5.66) {
      /* 1 would normally range from 1.4 - 2.82, but we expand that
         to 0 - 5.66, so we won't ever even get a value of 2.  Needs
         to be wide because our test is so fast that it is hard to
         match slow frequencies exactly.  */
      tsnewval = 1;
    } else if (avg_ts_hz > 70 && avg_ts_hz <= 150) {
      /* mathematically 7 would be 90.51 - 181, but we change to 70-150 to
         better align with common freq 100 */
      tsnewval = 7;
    } else if (avg_ts_hz > 150 && avg_ts_hz <= 350) {
      /* would normally be 181 - 362.  Now aligns better with 200 */
      tsnewval = 8;
    } else {
      /* Do a log base2 rounded to nearest int */
      tsnewval = (unsigned int)(0.5 + log(avg_ts_hz) / log(2.0));
    }

    test.setAVal("TS", hss->target->FPR->cp_hex(tsnewval));
    break;
  case TS_SEQ_UNSUPPORTED:
    test.setAVal("TS", "U");
    break;
  }

OO1–O6

TCP 数据包的Options。指纹保留了原始顺序，以及选项的值。有些操作系统没有实现这些选项或者实现不全。

Option NameCharacterArgument (if any)End of Options List (EOL)L

No operation (NOP)N

Maximum Segment Size (MSS)MThe value is appended. Many systems echo the value used in the corresponding probe.Window Scale (WS)WThe actual value is appended.Timestamp (TS)TThe T is followed by two binary characters representing the TSval and TSecr values respectively. The characters are 0 if the field is zero and 1 otherwise.Selective ACK permitted (SACK)S

int HostOsScan::get_tcpopt_string(const struct tcp_hdr *tcp, int mss, char *result, int maxlen) const {
  char *p;
  const char *q;
  u16 tmpshort;
  u32 tmpword;
  int length;
  int opcode;

  p = result;
  length = (tcp->th_off * 4) - sizeof(struct tcp_hdr);
  q = ((char *)tcp) + sizeof(struct tcp_hdr);

  /*
   * Example parsed result: M5B4ST11NW2
   *   MSS, Sack Permitted, Timestamp with both value not zero, Nop, WScale with value 2
   */

  /* Be aware of the max increment value for p in parsing,
   * now is 5 = strlen("Mxxxx") <-> MSS Option
   */
  while (length > 0 && (p - result) < (maxlen - 5)) {
    opcode = *q++;
    if (!opcode) { /* End of List */
      *p++ = 'L';
      length--;
    } else if (opcode == 1) { /* No Op */
      *p++ = 'N';
      length--;
    } else if (opcode == 2) { /* MSS */
      if (length < 4)
        break; /* MSS has 4 bytes */
      *p++ = 'M';
      q++;
      memcpy(&tmpshort, q, 2);
      /*  if (ntohs(tmpshort) == mss) */
      /*    *p++ = 'E'; */
      sprintf(p, "%hX", ntohs(tmpshort));
      p += strlen(p); /* max movement of p is 4 (0xFFFF) */
      q += 2;
      length -= 4;
    } else if (opcode == 3) { /* Window Scale */
      if (length < 3)
        break; /* Window Scale option has 3 bytes */
      *p++ = 'W';
      q++;
      snprintf(p, length, "%hhX", *((u8*)q));
      p += strlen(p); /* max movement of p is 2 (max WScale value is 0xFF) */
      q++;
      length -= 3;
    } else if (opcode == 4) { /* SACK permitted */
      if (length < 2)
        break; /* SACK permitted option has 2 bytes */
      *p++ = 'S';
      q++;
      length -= 2;
    } else if (opcode == 8) { /* Timestamp */
      if (length < 10)
        break; /* Timestamp option has 10 bytes */
      *p++ = 'T';
      q++;
      memcpy(&tmpword, q, 4);
      if (tmpword)
        *p++ = '1';
      else
        *p++ = '0';
      q += 4;
      memcpy(&tmpword, q, 4);
      if (tmpword)
        *p++ = '1';
      else
        *p++ = '0';
      q += 4;
      length -= 10;
    }
  }

  if (length > 0) {
    /* We could reach here for one of the two reasons:
     *  1. At least one option is not correct. (Eg. Should have 4 bytes but only has 3 bytes left).
     *  2. The option string is too long.
     */
    *result = '\0';
    return -1;
  }

  *p = '\0';
  return p - result;
}

WW1W6

TCP.Windows

YN

有返回包为Y，否则为N

DF

#define IP_DF 0x4000 /* don't fragment flag */

IP.Flag && IP_DF

DFI

两个ICMP请求中计算 IP.Flag && IP_DF

都为真则为Y
一真一假则为S
都为假为N
否则为0

IP.TTL

IP 初始生存时间猜测 ( TG)

根据ttl算tg

int get_initial_ttl_guess(u8 ttl) {
  if (ttl <= 32)
    return 32;
  else if (ttl <= 64)
    return 64;
  else if (ttl <= 128)
    return 128;
  else
    return 255;
}

CC

ECNCC

  /* Explicit Congestion Notification support test */
  if ((tcp->th_flags & TH_ECE) && (tcp->th_flags & TH_CWR))
    /* echo back */
    test.setAVal("CC", "S");
  else if (tcp->th_flags & TH_ECE)
    /* support */
    test.setAVal("CC", "Y");
  else if (!(tcp->th_flags & TH_CWR))
    /* not support */
    test.setAVal("CC", "N");
  else
    test.setAVal("CC", "O");

一些tcp怪癖 (俺也不知道现在还有用没)

 /* TCP miscellaneous quirks test */
  p = quirks_buf;
  if (tcp->th_x2) {
    /* Reserved field of TCP is not zero */
    assert(p + 1 < quirks_buf + sizeof(quirks_buf));
    *p++ = 'R';
  }
  if (!(tcp->th_flags & TH_URG) && tcp->th_urp) {
    /* URG pointer value when urg flag not set */
    assert(p + 1 < quirks_buf + sizeof(quirks_buf));
    *p++ = 'U';
  }
  *p = '\0';
  test.setAVal("Q", hss->target->FPR->cp_dup(quirks_buf, p - quirks_buf));

测试返回包中tcp.seq和发送时tcp.ack的关系

 /* Seq test values:
     Z   = zero
     A   = same as ack
     A+  = ack + 1
     O   = other
  */
  if (ntohl(tcp->th_seq) == 0)
    test.setAVal("S", "Z");
  else if (ntohl(tcp->th_seq) == tcpAck)
    test.setAVal("S", "A");
  else if (ntohl(tcp->th_seq) == tcpAck + 1)
    test.setAVal("S", "A+");
  else
    test.setAVal("S", "O");

  /* ACK test values:
     Z   = zero
     S   = same as syn
     S+  = syn + 1
     O   = other
  */
  if (ntohl(tcp->th_ack) == 0)
    test.setAVal("A", "Z");
  else if (ntohl(tcp->th_ack) == tcpSeqBase)
    test.setAVal("A", "S");
  else if (ntohl(tcp->th_ack) == tcpSeqBase + 1)
    test.setAVal("A", "S+");
  else
    test.setAVal("A", "O");

 /* Flags. They must be in this order:
     E = ECN Echo
     U = Urgent
     A = Acknowledgement
     P = Push
     R = Reset
     S = Synchronize
     F = Final
  */
  struct {
    u8 flag;
    char c;
  } flag_defs[] = {
    { TH_ECE, 'E' },
    { TH_URG, 'U' },
    { TH_ACK, 'A' },
    { TH_PUSH, 'P' },
    { TH_RST, 'R' },
    { TH_SYN, 'S' },
    { TH_FIN, 'F' },
  };
  assert(sizeof(flag_defs) / sizeof(flag_defs[0]) < sizeof(flags_buf));
  p = flags_buf;
  for (i = 0; i < (int) (sizeof(flag_defs) / sizeof(flag_defs[0])); i++) {
    if (tcp->th_flags & flag_defs[i].flag)
      *p++ = flag_defs[i].c;
  }
  *p = '\0';
  test.setAVal("F", hss->target->FPR->cp_dup(flags_buf, p - flags_buf));

RD

 /* Rst Data CRC32 */
  length = (int) ntohs(ip->ip_len) - 4 * ip->ip_hl -4 * tcp->th_off;
  if ((tcp->th_flags & TH_RST) && length>0) {
    test.setAVal("RD", hss->target->FPR->cp_hex(nbase_crc32(((u8 *)tcp) + 4 * tcp->th_off, length)));
  } else {
    test.setAVal("RD", "0");
  }

IPL

ip.length

UN

IP字段中 ID和SEQ字段整体的 uin32位数值

RIPL

UDP测试中，对icmp返回包中ip结构的长度校验，返回长度的十六进制

  /* OK, lets check the returned IP length, some systems @$@ this
     up */
  if (ntohs(ip2->ip_len) == 328)
    test.setAVal("RIPL", "G");
  else
    test.setAVal("RIPL", hss->target->FPR->cp_hex(ntohs(ip2->ip_len)));

RID

UDP测试中，对icmp返回包中ip结构的id校验

  /* Now lets see how they treated the ID we sent ... */
  if (ntohs(ip2->ip_id) == hss->upi.ipid)
    test.setAVal("RID", "G"); /* The good "expected" value */
  else
    test.setAVal("RID", hss->target->FPR->cp_hex(ntohs(ip2->ip_id)));

RIPCK

UDP测试中，对icmp返回包中ip完整性校验

  /* Let us see if the IP checksum we got back computes */

  /* Thanks to some machines not having struct ip member ip_sum we
     have to go with this BS */
  checksumptr = (unsigned short *)   ((char *) ip2 + 10);
  checksum = *checksumptr;

  if (checksum == 0) {
    test.setAVal("RIPCK", "Z");
  } else {
    *checksumptr = 0;
    if (in_cksum((unsigned short *)ip2, 20) == checksum) {
      test.setAVal("RIPCK", "G"); /* The "expected" good value */
    } else {
      test.setAVal("RIPCK", "I"); /* They modified it */
    }
    *checksumptr = checksum;
  }

RUCK

  /* UDP checksum */
  if (udp->uh_sum == hss->upi.udpck)
    test.setAVal("RUCK", "G"); /* The "expected" good value */
  else
    test.setAVal("RUCK", hss->target->FPR->cp_hex(ntohs(udp->uh_sum)));

RUD

  /* Finally we ensure the data is OK */
  datastart = ((unsigned char *)udp) + 8;
  dataend = (unsigned char *)  ip + ntohs(ip->ip_len);

  while (datastart < dataend) {
    if (*datastart != hss->upi.patternbyte)
      break;
    datastart++;
  }
  if (datastart < dataend)
    test.setAVal("RUD", "I"); /* They modified it */
  else
    test.setAVal("RUD", "G");

CD

  /* ICMP Code value. Test values:
   * [Value]. Both set Code to the same value [Value];
   * S. Both use the Code that the sender uses;
   * O. Other.
   */
  value1 = icmp1->icmp_code;
  value2 = icmp2->icmp_code;
  if (value1 == value2) {
    if (value1 == 0)
      test.setAVal("CD", "Z");
    else
      test.setAVal("CD", hss->target->FPR->cp_hex(value1));
  }
  else if (value1 == 9 && value2 == 0)
    /* both the same as in the corresponding probe */
    test.setAVal("CD", "S");
  else
    test.setAVal("CD", "O");

指纹匹配

nmap os指纹文件在https://github.com/nmap/nmap/blob/master/nmap-os-db

nmap指纹解析

Fingerprint关键字定义一个新的指纹，紧随其后的是指纹名字。

Class行用于指定该指纹所属的类别，依次指定该系统的vendor（生产厂家）,OS family（系统类别）,OS generation（第几代操作系统）,and device type（设备类型）。

接下来是CPE行，此行非常重要，使用CPE（CommonPlatformEnumeration，通用平台枚举）格式描述该系统的信息。以标准的CPE格式来描述操作系统类型，便于Nmap与外界信息的交换，比如可以很快从网上开源数据库查找到CPE描述的操作系统具体信息。

SEQ OPS WIN

表达式

指纹可以是表达式类型的，nmap支持的表达式

大于：>
小于：<
范围： 1-3
或关系： GCD=1-6|64|256

指纹解析代码

parse_fingerprint_file#FingerprintClass CPE()

精简后的代码如下

FingerPrintDB *parse_fingerprint_file(const char *fname) {

  fp = fopen(fname, "r");

top:
  while (fgets(line, sizeof(line), fp)) {
    lineno++;
    /* Read in a record */
    // 换行 和 # 开头的跳过
    if (*line == '' || *line == '#')
      continue;

fparse:
    if (strncmp(line, "Fingerprint", 11) == 0) {
      // 指纹的开始行,创建新指纹
      current = new FingerPrint;
    } else if (strncmp(line, "MatchPoints", 11) == 0) {
      // 这是MatchPoint
    } else {
      error("Parse error on line %d of nmap-os-db file: %s", lineno, line);
      continue;
    }

    DB->prints.push_back(current);
    
    p = line + 12;
    while (*p && isspace((int) (unsigned char) *p))
      p++;

    q = strpbrk(p, "#");
    while (isspace((int) (unsigned char) *(--q)));

    current->match.OS_name = cp_strndup(p, q - p + 1); // 当前指纹os name
    current->match.line = lineno; // 当前指纹行数

    /* Now we read the fingerprint itself */
    while (fgets(line, sizeof(line), fp)) {
      lineno++;
      if (*line == '#')
        continue;
      if (*line == '')
        break;

      q = strchr(line, '');

      if (0 == strncmp(line, "Fingerprint ",12)) {
        goto fparse;
      } else if (strncmp(line, "Class ", 6) == 0) {
        parse_classline(current, line, q, lineno);
      } else if (strncmp(line, "CPE ", 4) == 0) {
        parse_cpeline(current, line, q, lineno);
      } else {
        p = line;
        q = strchr(line, '(');
        FingerTest test(FPstr(p, q), *DB->MatchPoints);
        p = q+1;
        q = strchr(p, ')');
        if (!test.str2AVal(p, q)) {
          error("Parse error on line %d of nmap-os-db file: %s", lineno, line);
          goto top;
        }
        current->setTest(test);
      }
    }
  }

  fclose(fp);
  return DB;
}

指纹匹配算法

MatchPoints

# This first element provides the number of points every fingerprint
# test is worth.  Tests like TTL or Don't fragment are worth less
# (insectionidually) because there are so many of them and the values are
# often correlated with each other.  Meanwhile, elements such as TS
# (TCP timestamp) which are only used once, get more points.  Points
# are used when there are no perfect matches to determine which OS
# fingerprint matches a target machine most closely.
MatchPoints
SEQ(SP=25%GCD=75%ISR=25%TI=100%CI=50%II=100%SS=80%TS=100)
OPS(O1=20%O2=20%O3=20%O4=20%O5=20%O6=20)
WIN(W1=15%W2=15%W3=15%W4=15%W5=15%W6=15)
ECN(R=100%DF=20%T=15%TG=15%W=15%O=15%CC=100%Q=20)
T1(R=100%DF=20%T=15%TG=15%S=20%A=20%F=30%RD=20%Q=20)
T2(R=80%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T3(R=80%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T4(R=100%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T5(R=100%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T6(R=100%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
T7(R=80%DF=20%T=15%TG=15%W=25%S=20%A=20%F=30%O=10%RD=20%Q=20)
U1(R=50%DF=20%T=15%TG=15%IPL=100%UN=100%RIPL=100%RID=100%RIPCK=100%RUCK=100%RUD=100)
IE(R=50%DFI=40%T=15%TG=15%CD=100)

nmap通过逐行对比指纹，指纹正确加上权重的分数，最后对每个指纹计算一个概率，即成功分数/总数，输出概率高的指纹。

nmap默认的概率阈值是0.85，即概率小于这个数，则认为指纹不准确了。

表达式解析

具体的指纹匹配函数，包含解析表达式。val是扫描生成的指纹数值，expr是内置OS库中指纹数值。

 |<>-

/* Compare an observed value (e.g. "45") against an OS DB expression (e.g.
   "3B-47" or "8|A" or ">10"). Return true iff there's a match. The syntax uses
     < (less than)
     > (greater than)
     | (or)
     - (range)
   No parentheses are allowed. */
static bool expr_match(const char *val, const char *expr) {
  const char *p, *q, *q1;  /* OHHHH YEEEAAAAAHHHH!#!@#$!% */
  char *endptr;
  unsigned int val_num, expr_num, expr_num1;
  bool is_numeric;

  p = expr;

  val_num = strtol(val, &endptr, 16);
  is_numeric = !*endptr;
  // TODO: this could be a lot faster if we compiled fingerprints to a bytecode
  // instead of re-parsing every time.
  do {
    q = strchr(p, '|');
    if (is_numeric && (*p == '<' || *p == '>')) {
      expr_num = strtol(p + 1, &endptr, 16);
      if (endptr == q || !*endptr) {
        if ((*p == '<' && val_num < expr_num)
            || (*p == '>' && val_num > expr_num)) {
          return true;
        }
      }
    } else if (is_numeric && ((q1 = strchr(p, '-')) != NULL)) {
      expr_num = strtol(p, &endptr, 16);
      if (endptr == q1) {
        expr_num1 = strtol(q1 + 1, &endptr, 16);
        if (endptr == q || !*endptr) {
          assert(expr_num1 > expr_num);
          if (val_num >= expr_num && val_num <= expr_num1) {
            return true;
          }
        }
      }
    } else {
      if ((q && !strncmp(p, val, q - p)) || (!q && !strcmp(p, val))) {
        return true;
      }
    }
    if (q)
      p = q + 1;
  } while (q);

  return false;
}

用Golang造轮子

上述只是原理，真正写代码调试的时候会遇到更多困难。为什么我发的包和nmap不一样，为什么指纹计算的结果和nmap不一样，为什么指纹匹配不到。有几个小技巧可以缓解一下这些问题。

nmap 有个参数可以显示发送的包 --packet-trace，再将-vv 详细输出打开，事先找好Open和Close的TCP端口，用以下命令

nmap -p80,20 --packet-trace -vv -O 123.123.123.123

能看到具体发送的包和返回的包

以及最终输出的指纹

发包和收包用的是 https://github.com/google/gopacket

它基于libpcap，支持多个系统，并且可以发原始数据包。之前写过 https://github.com/boy-hack/ksubdomain ，所以对这个库也很熟悉。

nmap也是基于libpcap发包的。

数据链路层

因为这个包太底层，以至于要自己组装协议，数据链路层，网络层，传输层，都要自己组装。

对于数据链路层，需要知道自己的网卡Mac以及目标网卡Mac，对于外网，目标网卡就是路由器的Mac，对于内网，目标网卡就是arp探测到的mac。

数据链路层

这次我采用一个古老又复杂的方法，是 https://github.com/v-byte-cpu/sx 这个项目提供的灵感，但它也没有很好实现。

先确定目标IP
调用系统路由表，确定目标IP该使用的网卡，此时就能获得本地网卡的mac地址和网关IP。
调用系统arp缓存，查找目标ip是否有，如果有，就能获得目标地址的网卡mac（无论是内网主机的mac还是网卡的mac，缓存中都有）
如果系统arp缓存没有找到，则使用arp协议，像内网段广播获得网关的mac，如果目标IP是同一个网段，则广播查找目标IP的Mac。

其他

其他部分暂时没有踩坑，按照nmap源码的实现来就行了。