Wednesday, September 2, 2009

SAS I

To set up a library/folder on SAS:
Libname NAME ‘location of hardrive’
Ie/
libname hw2 'C:/temp';

file, import wizard
format is always a CSV format
Comma Separated Values

member-name it blah
Data cannot be open in something else



Most commands are called proc (procedure)
data=libname.filename;

Proc contents data=hw2.data;
Run;

proc means data=hw2.data;
run;

that means only on numerical data get the means
n = number on nonmissing values
very good to find errors

click running person or press F8



proc means data=hw2.data;
title "Question 1"; ----this titled the means page
*this is a comment; -- green means not part of code
run; -- when this is blue, command is ready to run.

proc means data=hw2.data;
title "Question 1";
*this is a comment;
var purch child; -- this is so only these variables will show up
run;

this copies data from data to data2
data hw2.data2;
set hw2.data;
run;


if you have a zip code, make it text. database fields

proc means data=libname.filenam mean skew;
run;
mean finds the average
skew finds how spread out they are

median - 50 % on each side

nw side of town - house in lawrence, doug compton , has huge mansion and giraffes zebras elephant!
\
\
ELEPHANT\____________________


FS folks? HYVEE
__________________________
6th st


wak

Data commands:
1. manipulate: modify variabled
  • transform variable types - character to numeric
  • create variable as fxn of other variables
  • create indicator variables repping unique variables of a character variable (e.g. ks=1 when state =ks)


libname hw2 'e:/';


proc means data=hw2.blah;
run;


proc sort data=hw2.blah;
by state;
run;


proc means data=hw2.blah skew;
by state;
run;

this one does stuff
data blah.new;
set hw2.blah;
run;

this one makes a column at the end where x = 1

you could do like year=2009 and then everything in that file would be from that year. the name of the variable cant start w a number and shouldnt have spaces, bc that makes it confusing


data hw2.new;
set hw2.blah;
x=1;
run;

data hw2.new2;
set hw2.blah;
x2=2*x;
run;

if you put the same name in data (new one) and set (where u get info from)'

if you make them both the name of original data - it will overwrite it

this makes a variable in new 2, from new, and the var is r, the number is between 0 and 1. RANDOM NUMBERS.
data hw2.new2;
set hw2.new;
x=1;
x2=2*x;
r=ranuni (0);
run;

if statments

only pull out things that are Male
data hw2.new2;
set hw2.new;
x=1;
x2=2*x;
r=ranuni (0);
if gender='M';
run;

Make sure you close a data set before altering it. or it wont work.

if you only want 5000 random entries. of 50000 (10%)

generate a random number, select off that number.


data hw2.blah3;
set hw2.blah;
z=1;
x2=2*x;
r=ranuni (0);
if r<=.1; run; *this is telling it to select 10% of data, the 10% that was randomly assigned a number less than .1

newest file goes on top after data, then where u get info from

this makes a new file called newest, 10% of random sample is taken and if it's a female it says female=1. Male it says female =0


data hw2.newest;
set hw2.blah;
r=ranuni (0);
if r=<.1; if gender='F' female=1; if gender='M' female=0; run;

this makes a new file called newest, 10% of random sample is taken and if it's a female it says female=1. anything else it says female =0
data hw2.newest;
set hw2.blah;
r=ranuni (0);
if r=<.1; if gender='F' female=1; else female=0; run;

keep/drop - decide which things to keep in table

2 commands that may be useful in hw

freq - how often the variable occurs

for example - how many times each state


proc freq data=hw2.blah;
table state;
run;

MAKES A HISTOGRAM
proc univariate data=hw2.blah;
var state;
histogram state;
run;


proc ttest data=hw2.blah;
class gender;
var first;
run;

what u need in hw

answer: carefully and specifically

sas output:copy and paste relevant output you used to derive - you can abbrev output to only show relevant - get from output thingy

sas codes-used to get output

are men more familiar or not

the bigger the sample, easier to reject the null

proc ttest data=hw2.one;

class gender;

var first;

run;

t value is minimum alpha to reject the null.

im so hungry and not paying attn.

proc freq data=hw2.one;

CROSSTABS

proc freq data=hw2.one;
table gender*buyer;
run;

9-23

tuesday @5

he'll be in a chatroom

decision tree review

to do decision trees you have to use enterprise miner

solutions >analysis>enterprise miner (E)

file>new...> project

have to have already inputted data in sas first

look at 1st icon - drag to project view (input data source)
double click
select library and then file
will select random 2000 datapoints.. so you have to tell it to use all observations.
click variables. right click and select model role and reject ones you dont want to use
set buyer as target
select book categories as inputs
close sm window

drag tree to project space

click next to file, drag arrow to tree
double click on tree icon
click basic - use chi squared
significance level .2 is fine, unless you want a significance level to make tree stop
min level of observ in leaf - how many it has tyo have
max branches - binary=2
click advancced - model assessment measure - proportion of event in top 10 , how much sep should be bw buyer and non buyer

right click on tree, make it run. will make crosstabs etc

click around the optimal tree, this has 23.

view tree

...1 6...
less than 1 1-6

leaf statistics

right click and copy , edit copy , or save tree as gif

answer ?, show tree.

No comments: